Repro­du­cible Assets for Cloud­For­mation Stacks

Published On: 3. April 2023|Categories: Tech|
christin-hume-Hcfwe­­­w744z4-unsplash

When Cloud­For­mation is used to create the infra­structure or appli­cation it is important to take a look at the artifacts that are generated and deployed. When it comes to artifacts and especially Java artifacts (jar or zip files) there are some important requi­re­ments that have to be followed. Otherwise, the deployment might take longer than required, or in cases where the CDK Pipeline is used to deploy the appli­cation the pipeline can go into an infinite loop updating itself over and over again.

When a CDK stack is created or updated via cdk deploy, all parts of the appli­cation like Cloud­For­mation templates, images, scripts and Java code is packed as assets in the cloud assembly directory (usually cdk.out), published to S3 and used by the Cloud­For­mation service during deployment. An important property of each asset is its hash. In this case the hash is an sha256 hash of the content of each file. This is used in the generated template and also in the filename of the asset. For example the Java code for a Lambda function may become a file called asset.2789342b793c2d260717ac962952ef1ec03511f8a355f6abb9a5bfcd32bee712.jar. You can find these files in the cdk.out folder after the stack has been synthe­sized (cdk synth or cdk deploy).

When it comes to hashing you have to keep in mind that any change of a file will also alter the hash. For archive files (jar, zip, tar etc.) the hash is not only affected by the contents of the files in the archive (e.g. compiled Java classes). More importantly the order of the files in the archive, their file permis­sions and date will also influence the hash:

Jar contents:

Copy to Clipboard

Hash: 03fb71a84bd307794ef2737274cfc41f4c080e0a4d13827e5e8a95a853738626

Jar contents:

Copy to Clipboard

Hash: b15559db5309131b4ab0863b948b36ef3be358391d2014788ee9936ea3b8aec9

Even if the contents of the files contained in both archives is exactly the same the hash of the archive itself is different due to the different timestamps of the files.

The result is that CDK/CloudFormation will treat this asset as updated/changed and re-deploy it, even if the actual imple­men­tation did not change.

When a Lambda is deployed this may not be an issue. It will only take some more time to transmit and re-deploy the archive.

However, when it comes to a CDK pipline this will lead to an infinite loop, because the assets of the pipline stack change each time the pipeline is created:

  • Pipeline is synthe­sized
  • Asset hashes changed
  • Pipeline is updated
  • Pipeline restarts, because it has changed
  • Pipeline is synthe­sized
  • Asset hashes changed
  • Pipeline is updated
  • Pipeline restarts, because it has changed

Create Repro­du­cible Archives

The previous section outlined why it is important that the content and therefore hash of a Java archive (jar or zip) only changes when the actual imple­men­tation has changed. In order to achieve this the measures can be taken to mitigate the outlined problems:

  1. always use the same timestamp for all files
  2. preserve the file order in the archive
  3. prevent code generators from generating changing content

When using Gradle to build the project, the first two requi­re­ments (fixed timestamp and file order) are achieved by adding the following confi­gu­ration to the build.gradle.kts script:

Copy to Clipboard

The Gradle documen­tation gives a more detailed explai­nation about this topic. For Maven there is also an article about Confi­guring for Repro­du­cible Builds in the documen­tation.

When it comes to the last requi­rement it depends on the generator that is being used. For example MapStruct will include a timestamp in each generated class by default:

Copy to Clipboard

Since the date attribute will change each time the mapper imple­men­tation is generated (gradle clean build) the hash of the resulting archive will also change, even if the mapper imple­men­tation is exactly the same.

Copy to Clipboard

Then add this confi­gu­ration to all mapper inter­faces or classes:

Copy to Clipboard

With this confi­gu­ration MapStruct will omit the timestamp and re-create the same imple­men­tation class each time.

Conclusion

In this article we learned, why it is important to create repro­du­cible assets for appli­ca­tions that are deployed using CDK. We then looked at ways on how to create these assets in Java projects by confi­guring the generation of appli­cation artifacts like jar archives. Finally we looked at MapStruct as a generator that also has to be confi­gured in order to create repro­du­cible imple­men­ta­tions.

Written by: Alexander Sparkowsky Developer @ Europace