Reproducible Assets for CloudFormation Stacks

When CloudFormation is used to create the infrastructure or application it is important to take a look at the artifacts that are generated and deployed. When it comes to artifacts and especially Java artifacts (jar or zip files) there are some important requirements that have to be followed. Otherwise, the deployment might take longer than required, or in cases where the CDK Pipeline is used to deploy the application the pipeline can go into an infinite loop updating itself over and over again.
When a CDK stack is created or updated via cdk deploy
, all parts of the application like CloudFormation templates, images, scripts and Java code is packed as assets in the cloud assembly directory (usually cdk.out
), published to S3 and used by the CloudFormation service during deployment. An important property of each asset is its hash. In this case the hash is an sha256 hash of the content of each file. This is used in the generated template and also in the filename of the asset. For example the Java code for a Lambda function may become a file called asset.2789342b793c2d260717ac962952ef1ec03511f8a355f6abb9a5bfcd32bee712.jar
. You can find these files in the cdk.out
folder after the stack has been synthesized (cdk synth
or cdk deploy
).
When it comes to hashing you have to keep in mind that any change of a file will also alter the hash. For archive files (jar, zip, tar etc.) the hash is not only affected by the contents of the files in the archive (e.g. compiled Java classes). More importantly the order of the files in the archive, their file permissions and date will also influence the hash:
Jar contents:
Hash: 03fb71a84bd307794ef2737274cfc41f4c080e0a4d13827e5e8a95a853738626
Jar contents:
Hash: b15559db5309131b4ab0863b948b36ef3be358391d2014788ee9936ea3b8aec9
Even if the contents of the files contained in both archives is exactly the same the hash of the archive itself is different due to the different timestamps of the files.
The result is that CDK/CloudFormation will treat this asset as updated/changed and re-deploy it, even if the actual implementation did not change.
When a Lambda is deployed this may not be an issue. It will only take some more time to transmit and re-deploy the archive.
However, when it comes to a CDK pipline this will lead to an infinite loop, because the assets of the pipline stack change each time the pipeline is created:
- Pipeline is synthesized
- Asset hashes changed
- Pipeline is updated
- Pipeline restarts, because it has changed
- Pipeline is synthesized
- Asset hashes changed
- Pipeline is updated
- Pipeline restarts, because it has changed
- …
Create Reproducible Archives
The previous section outlined why it is important that the content and therefore hash of a Java archive (jar or zip) only changes when the actual implementation has changed. In order to achieve this the measures can be taken to mitigate the outlined problems:
- always use the same timestamp for all files
- preserve the file order in the archive
- prevent code generators from generating changing content
When using Gradle to build the project, the first two requirements (fixed timestamp and file order) are achieved by adding the following configuration to the build.gradle.kts
script:
The Gradle documentation gives a more detailed explaination about this topic. For Maven there is also an article about Configuring for Reproducible Builds in the documentation.
When it comes to the last requirement it depends on the generator that is being used. For example MapStruct will include a timestamp in each generated class by default:
Since the date attribute will change each time the mapper implementation is generated (gradle clean build
) the hash of the resulting archive will also change, even if the mapper implementation is exactly the same.
Then add this configuration to all mapper interfaces or classes:
With this configuration MapStruct will omit the timestamp and re-create the same implementation class each time.
Conclusion
In this article we learned, why it is important to create reproducible assets for applications that are deployed using CDK. We then looked at ways on how to create these assets in Java projects by configuring the generation of application artifacts like jar archives. Finally we looked at MapStruct as a generator that also has to be configured in order to create reproducible implementations.
Written by: Alexander Sparkowsky Developer @ Europace