Skip to content

Instantly share code, notes, and snippets.

@AlexAtkinson
Last active February 4, 2026 07:38
Show Gist options
  • Select an option

  • Save AlexAtkinson/37d64e4e0527f6e5f9d57280669ad055 to your computer and use it in GitHub Desktop.

Select an option

Save AlexAtkinson/37d64e4e0527f6e5f9d57280669ad055 to your computer and use it in GitHub Desktop.
DevOps Primer: Artifacting

Artifacting

RELATED: Git Branching Strategies, Versioning, Artifacting, SDLC - ALM & CICD

Artifacting is the process of packaging a project prior to release, and is essential as it mitigates many risks in both producing and consuming software products. Beyond simple archives, there are many types of packaging -- often language or framework dependent -- which have been developed to suit various use cases.

This primer demonstrates how to package files as zip and tar.gz; leverage an .artifactignore file similar to .gitignore; and generate and use a checksum file.

Tip

  • See 12factor and O'Reilly:Beyond the Twelve-Factor App for more on maturing your SDLC.
  • Have a consistent naming scheme, such as: <project name>_<version>.<extension>.
  • Set a PRODUCT_NAME variable, and use ${PWD##\*/} if the build job executes from the top level directory the project. Or use TOPDIR=$(git rev-parse --show-toplevel; PROJECT_NAME=${TOPDIR##\*/}) when executing in a git repo to automatically set the name of the artifact to the name of the repo.

Mitigated Risks & Benefits (Not all inclusive)

Just-in-time builds from commit hashes have been known-bad practice for decades, but this isn't the only risk which can be overcome with artifacting:

  • The laws of physics and their bit-flips (single-event upset).
    It cannot be assumed that the same asset can be built twice.
  • Volatility in the software supply chain. Dependency hell is real.
    It cannot be assumed that the same asset can be built twice.
  • Configuration Deltas. Idempotence and build once postures are essential.
  • Cost. Multiple build-test cycles dramatically increase tech-spend.
  • Implementing CICD early ensures early identification of related issues. Don't wait until the week you hope to deploy before you implement tooling, as there may be substantial software refactoring required.
  • Storing credentials in the repository (hardcoding) by ensuring software ingests environment variables at processes instantiation (idempotence).
  • Operational complexity (cost++) relating to '-rc', or '-beta' style pre-release identifiers. (There are scenarios where these are required.)
  • The potential for software to be modified post-release by including/publishing the file hash for verification.
    • Defense agianst partial or malformed downloads.
    • Patch-builds are their own mystical beast to handle.

Archive Artifacts

Archive type artifacts generally come in the form of a zip, or tar.gz. The following examples demonstrate how to accomplish this in linux, using modern gnu utilities.

Note

  • MacOS utilities not guaranteed to function.
    Run brew install zip unzip gnu-tar
  • Use either semver, or calver when versioning.
    Proprietary versioning standards are guaranteed risk.
    Aligning to industry standards is important as many SDLC tools expect semver. If diverging, verify that the toolchain will support your custom approach before investing heavily.
  • If both zip and tar.gz artifacts will be produced for a project, special attention must be given to the pattern matching used by each utility, as they are not the same. For example, include 'foo/', 'foo/**' for zip, and '**/foo' and '**/foo/*' for tar to fully exclude the 'foo' directory from the archive. WARN: Zip does not support '**/..' patterns.

Zip

The following script packages the current directory contents ignoring anything specified in the '.artifactignore' file.

Note

Exclusion patterns ARE NOT like those used by .gitignore.
For example, the pattern '**/foo' does not function as it does with tar or git. See here for more.

create-artifact-zip.sh

#!/usr/bin/env bash
product=$1
version=$2
tmp_dir='/tmp'
exclude_file='.artifactignore'
artifact_name="${product}-${version}.zip"

zip -x@${exclude_file} ${tmp_dir}/${artifact_name} .
Usage
./create-artifact-zip.sh <project-name> <version>

Tar.gz

The following script packages the current directory contents, ignoring anything specified in the '.artifactignore' file.

Note

Exclusion patterns are like those used by .gitignore, which can referenced here.

create-artifact-tar-gz.sh

#!/usr/bin/env bash
product=$1
version=$2
tmp_dir='/tmp'
exclude_file='.artifactignore'
artifact_name="${product}-${version}.tgz"
tar -X ${exclude_file} -zcvf ${tmp_dir}/${artifact_name} .
Usage
./create-artifact-tar.gz.sh <project-name> <version>

.artifactignore

The contents of the .artifactignore file will generally be language and project specific. Have a look at the github/gitignore repo to get started.

As an example, the .artifactignore file may contain the following:

.artifactignore
.gitignore
**/README.md
**/logs
**/*.log
**/trace.*
**/*.env
.git/
.git/**
.github/
.github/**
node_modules

Checksum

A checksum is a computational hash of a file, meaning that in theory, if it's changed then the checksum will no longer match.

Assuring consumers that the artifact they receive is complete and unmodified is a critical component of the artifacting strategy. Originally, providing an md5sum with a file was important as internet technologies could not always guarantee complete delivery of a file, but is even more important these days as bad actors may attempt to modify a file in a malicious way.

Tip

md5 is no longer secure. Use sha256.

Generation

Checksum files can be generated discretely for each file:

echo dog > pets.txt
sha256 pets.txt > pets.txt.sha256

Or for multiple files:

echo dog > pets.txt
echo wrench > tools.txt
sha256sum pets.txt >> sha256sum.txt
sha256sum tools.txt >> sha256sum.txt

Validation

Files can be validated individually:

sha256sum -c pets.txt.sha256
pets.txt: OK
echo cat > pets.txt
sha256sum -c pets.txt.sha256
pets.txt: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

Multiple files can be validated at once:

sha256sum -c sha256sums.txt
tools.txt: OK
pets.txt: OK
echo cat > pets.txt
sha256sum -c sha256sums.txt
tools.txt: OK
pets.txt: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

You may have noticed that there is operationally no difference between generating and validating a single or multiple files. While some prefer a single file with all the checksums, some rely on being able to retrieve the artifact and checksum discretely. Why not do both?

Implementation

Delivery Assurance

Generate a '.sha256', or 'sha256sums.txt' file (or both) with the artifact(s) and make it available. Clients can download and validate as needed.

Bad Actor Mitigation

Relying on Github's Release feature ensures that the checksum files cannot be modified after release without there being a record. Additionally, the checksums can be included in the release notes.

Where artifacts are made available via another method, such as a website, distributing a record of the artifacts and their checksums via an out-of-band channel is a good way to provide another layer of assurance to consumers.

Tip

A simple asset healthcheck system can be established by emitting urls and checksums to a database so that they can be validated intermittently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment