
Disclaimers :
Opinions expressed in this post (and in any of all my posts) are solely, unless otherwise specified, those of the authors, me. Those opinions absolutely do not reflect the views, policies, positions of any organizations, employers, affiliated groups.
This article is educational content. The examples are intentionally simplified for clarity. Before using these patterns in production, consult the Terraform documentation, the Helm documentation, and your platform team.
I've strived for accuracy throughout this piece. If you catch any errors, please reach out — I'd be grateful for the feedback and happy to make updates!
Hook
"We need to set up a new environment for the epidemiology team. Can you have something running by yesterday?" That question is why this article exists: how to easily repeat, redo, reiterate, duplicate, replicate, recreate, remake, reduplicate, the infrastructure we just built?
The previous episodes were focused on the theory: VPCs and compute, IAM and encryption, Kubernetes and EKS, cluster security. A colleague who missed the training would get the cluster up by hand: click through the console, type commands, copy-paste YAML files, and call it done. And it would work. Until he had to do it again for staging, for recette and again for production, and then a third time months later for a deployment in another country.
Deploying 4s for a new lab for a new country inside West Africa is something we've done many time in the last couple of months. Hopefully, deploying 4s will be done for all the countries in Africa. Let me present to you 2 tools to solve that problem at two different layers: Terraform for provisioning the infrastructure, Helm for deploying the applications that run on it.
ToC
- The problem: reproducibility at two layers
- Terraform: infrastructure as code
- Helm: Kubernetes package manager
- Where does the chart live? Chart registries
- Putting it together
The problem: reproducibility at two layers
Imagine you've just finished reading Day 5. You know how to secure an EKS cluster. Now you need to actually create one and deploy a Django REST API onto it.
You could do it manually, click through the EKS console, create a node group, write a few dozen YAML manifests, copy-paste the right kubectl apply commands. But we could do better. We should be able to easily redo what we've done.
Before we even get to the issue of reproducibility, we want to record, keep and preserve what has been done, if only so that we can review it later or go back and read through what has been achieved. As for me, throughout the workshop, I didn’t just type in commands; I wrote Bash scripts....
#!/bin/bash
# Environment Variables
AWS_REGION="eu-west-1"
CLUSTER_NAME="ptrck-foundation-training"
ACCOUNT_ID="546732826958" # ID of the EKS Admin role
ROLE_ARN="arn:aws:iam::${ACCOUNT_ID}:role/EKSAdminRole" # ARN of the EKS Admin role
CLUSTER_ARN="arn:aws:eks:${AWS_REGION}:${ACCOUNT_ID}:cluster/${CLUSTER_NAME}"
# I need to create this ECR repository
ECR_REPO="ecr-repo-ptrck"
DEMO_NS="demo-ns-ptrck"
DEMO_DEPLOYMENT="demo-dply-ptrck"
DEMO_CONTAINER="app-ptrck"
# Helper function to print things with some separation
section() {
echo ""
echo "=====> $1"
}
section "Configure AWS CLI credentials"
aws configure
# Identity & role assume
section "Verify identity"
aws sts get-caller-identity
section "Verify role exists"
aws iam get-role --role-name EKSAdminRole
section "List all roles in account (if you have permissions)"
aws iam list-roles \
--query 'Roles[].RoleName' \
--output text
section "List user Policies"
aws iam list-user-policies --user-name $(aws sts get-caller-identity --query 'Arn' --output text | cut -d'/' -f 6)
section "List attached policies"
CURRENT_USER=$(aws sts get-caller-identity --query 'Arn' --output text | cut -d'/' -f 6)
aws iam list-attached-user-policies --user-name $CURRENT_USER
section "Check current default region"
aws configure get region
section "(Re)configure AWS CLI credentials"
aws configure
echo ""
section "Assume EKS admin role & retrieve the token"
aws sts assume-role \
--role-session-name iam-eks-admin-role \
# was working nice until sunday 7PM ?!
--role-arn "$ROLE_ARN" \
--duration-seconds 43200
section "Configure the 'iam-eks-admin-role' profile with the temporary credentials"
aws configure --profile iam-eks-admin-role
section "What are the available roles ?"
aws iam list-roles
section "What are the available policies?"
aws iam list-policies
# Check if there is a K8s cluster available
section "List all EKS clusters"
aws eks list-clusters
section "List clusters in my region"
aws eks list-clusters --region "$AWS_REGION"
section "At this point, there is no clusters available whaaaaaa ?! lol"
section "Let's create our K8s cluster"
eksctl create cluster \
--name "$CLUSTER_NAME" \
--region "$AWS_REGION" \
--nodes 3 \
--node-type t2.micro \
# --with-oidc: no need to separately call eksctl utils associate-iam-oidc-provider
--with-oidc \
--enable-auto-mode
This is the reproducibility problem, and it lives at two distinct layers:
Infrastructure layer: Who created the VPC? What are the exact subnet CIDRs? Which KMS key encrypts etcd? If the answer lives in someone's memory or a Confluence page, you have a problem.
Application layer: Which version of the Django app is running in staging? How do I roll back if the new release breaks something? If the answer is "look at the kubectl commands in the README", you have a different problem.
Terraform solves the first. Helm solves the second.
Bash scripts are fine for exploration and one-off tasks, but they fall short once you need state tracking, idempotence, error handling, drift detection, and dependency management — which is exactly what infrastructure provisioning requires.
Terraform: infrastructure as code
A brief history
Terraform was created by HashiCorp and first released in 2014. The problem it solved: AWS had CloudFormation, GCP had Deployment Manager, Azure had ARM templates — but if you used more than one cloud, you had three completely different tools, three syntaxes, three mental models. Terraform unified them under a single declarative language: HCL (HashiCorp Configuration Language).

Core idea: describe the desired state of your infrastructure, and Terraform figures out how to get there. It maintains a state file tracking what it has created, so it can calculate the diff between current state and desired state on every run.
One important note: as a free software advocate, I should talk about the Terraform fork called OpenTofu. OpenTofu is CNCF project, API-compatible with Terraform — same HCL, same providers, same workflow, same everything. For most teams, the choice between them is a licensing and governance question. The examples below work identically on both. I hope.
Alternatives & why Terraform
| Tool | Approach | When to prefer it |
|---|---|---|
| AWS CloudFormation | AWS-native, JSON/YAML templates | AWS-only shops that want native integration and no external tooling |
| AWS CDK | TypeScript/Python constructs that compile to CloudFormation | Developer-heavy teams who want real programming languages for infrastructure |
| Pulumi | Multi-cloud, uses Python/TypeScript/Go directly | Teams that find HCL limiting and want full programming language expressiveness |
| Ansible | Procedural YAML playbooks | Configuration management and server setup; less suited for cloud resource provisioning |
| OpenTofu | Terraform fork, identical syntax | Teams that want open-source governance and no BSL licensing constraints |
Why we chose Terraform? It's multi-cloud (we're on AWS now, but that may change, lol), it has over 4000 providers, it has a mature module ecosystem, and it's the most widely adopted IaC tool in the industry.
Provisioning an EKS cluster
The goal: reproduce the EKS cluster from Day 4 and Day 5 — VPC, private subnets, encrypted etcd, managed node group — using Terraform. Every time. Identically.
Project structure:
infra/
├── main.tf # providers, backend, modules
├── variables.tf # input variables
├── outputs.tf # cluster name, endpoint (consumed by Helm later)
└── terraform.tfvars # environment-specific values (not committed to Git)
main.tf — the core:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Remote state: shared, locked, versioned
backend "s3" {
bucket = "genomics-platform-tfstate"
key = "eks/terraform.tfstate"
region = "eu-west-3"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
provider "aws" {
region = var.region
}
# KMS key for etcd encryption (Day 5 best practice)
resource "aws_kms_key" "eks" {
description = "EKS secrets encryption"
deletion_window_in_days = 7
}
# VPC: private subnets for nodes, public subnets for load balancers
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${var.cluster_name}-vpc"
cidr = "10.0.0.0/16"
azs = ["${var.region}a", "${var.region}b", "${var.region}c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
Environment = var.environment
}
}
# EKS cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = var.cluster_name
cluster_version = "1.30"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Private endpoint only (Day 5 best practice)
cluster_endpoint_public_access = false
# etcd encryption with the KMS key above
cluster_encryption_config = {
resources = ["secrets"]
provider_key_arn = aws_kms_key.eks.arn
}
eks_managed_node_groups = {
genomics = {
instance_types = ["m5.large"]
min_size = 2
max_size = 10
desired_size = 3
}
}
}
variables.tf:
variable "region" {
description = "AWS region"
type = string
default = "eu-west-3"
}
variable "cluster_name" {
description = "EKS cluster name"
type = string
}
variable "environment" {
description = "Deployment environment (dev, staging, prod)"
type = string
}
outputs.tf — what other tools will need:
output "cluster_name" {
value = module.eks.cluster_name
}
output "cluster_endpoint" {
value = module.eks.cluster_endpoint
}
Deploy it:
terraform init # download providers and modules
terraform plan # preview what will be created
terraform apply # create the infrastructure
Voilà.
Helm: Kubernetes package manager
A brief history
Helm was first presented at KubeCon 2015 by the team at Deis (notably Matt Butcher). The problem it solved: writing raw Kubernetes manifests for a real application means dozens of YAML files, lots of copy-paste across environments, no concept of versioning, and no clean way to roll back. Helm introduced the chart — a package of Kubernetes manifests with templating and versioning built in.
In 2016, Google and Deis merged their Kubernetes packaging work, and Helm was donated to the Kubernetes project. It became a CNCF graduated project in 2020 — the same tier as Kubernetes, Prometheus, and Envoy.
Helm 2 (the first widely adopted version) required a server-side component called Tiller running inside the cluster. Tiller had broad cluster-admin permissions and became a well-known security liability. Helm 3 (released in 2019) removed Tiller entirely — all operations now happen client-side, and permissions come from your own kubeconfig. Simpler and significantly more secure.
Alternatives & why Helm
| Tool | Approach | When to prefer it |
|---|---|---|
| Kustomize | Overlay-based patching, built into kubectl | Simple apps where you want no templating and native kubectl support |
| Plain manifests | Raw YAML + kubectl apply | Very small projects, learning environments, one-off deployments |
| Skaffold | Dev-focused workflow: build + push + deploy | Inner-loop development with fast local iteration cycles |
| Carvel / ytt | Structured YAML templating | VMware/Tanzu ecosystems |
We chose Helm because: versioned releases let you see exactly what is running in each namespace with helm list, atomic upgrades roll back automatically on failure, hooks let you run jobs before or after install (crucial for Django migrations — more on this below), and ArtifactHub hosts thousands of ready-made charts for databases, ingress controllers, monitoring stacks, and more. When you need to install cert-manager or the AWS Load Balancer Controller into your cluster, it's a one-liner.
Packaging a Django REST API
Our syndromic surveillance platform is a Django REST API that processes public health data. Here's how to package it as a Helm chart.
Chart structure:
syndromic-surveillance-api/
├── Chart.yaml # chart metadata
├── values.yaml # default configuration values
└── templates/
├── _helpers.tpl # reusable template snippets
├── deployment.yaml
├── service.yaml
└── migrate-job.yaml # Django migration hook
Chart.yaml:
apiVersion: v2
name: syndromic-surveillance-api
description: Django REST API for genomic data processing
type: application
version: 0.1.0 # chart version — bump when the chart itself changes
appVersion: "1.0.0" # application version — overridden at deploy time
values.yaml — the defaults, overridable per environment:
replicaCount: 2
image:
repository: 123456789012.dkr.ecr.eu-west-3.amazonaws.com/syndromic-surveillance-api
pullPolicy: IfNotPresent
tag: "" # overridden at deploy time: --set image.tag=v1.2.3
service:
type: ClusterIP
port: 8000
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
env:
DJANGO_SETTINGS_MODULE: "config.settings.production"
DB_HOST: "rds.syndromic-surveillance-platform.internal"
DB_NAME: "syndromic-surveillance"
templates/deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "syndromic-surveillance-api.fullname" . }}
labels:
{{- include "syndromic-surveillance-api.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "syndromic-surveillance-api.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "syndromic-surveillance-api.selectorLabels" . | nindent 8 }}
spec:
securityContext:
runAsNonRoot: true # Day 5 best practice
runAsUser: 1000
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
ports:
- containerPort: 8000
env:
{{- range $key, $val := .Values.env }}
- name: {{ $key }}
value: {{ $val | quote }}
{{- end }}
resources:
{{- toYaml .Values.resources | nindent 10 }}
readinessProbe:
httpGet:
path: /healthz/
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
When, 2 years ago, I was first getting to grips with K8s, I remember using side containers to apply Django database migrations. Nowadays, I use “init containers” to do Django database migrations. And whilst doing some research on Helm, specifically to write this article, I was very surprised to discover that this thing exists. A pre-upgrade hook that runs manage.py migrate before any new pods are started:
templates/migrate-job.yaml — I really need to try this
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "syndromic-surveillance-api.fullname" . }}-migrate
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-delete-policy": before-hook-creation
spec:
template:
spec:
restartPolicy: Never
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: migrate
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
command: ["python", "manage.py", "migrate", "--noinput"]
env:
{{- range $key, $val := .Values.env }}
- name: {{ $key }}
value: {{ $val | quote }}
{{- end }}
Phew!
Deploy it:
# First deploy
helm install syndromic-surveillance-api ./syndromic-surveillance-api \
--namespace syndromic-surveillance \
--create-namespace \
--set image.tag=v1.0.0
# Upgrade to a new version
helm upgrade syndromic-surveillance-api ./syndromic-surveillance-api \
--namespace syndromic-surveillance \
--set image.tag=v1.2.3
# Something broke — roll back to the previous release
helm rollback syndromic-surveillance-api -n syndromic-surveillance
# See what is currently running
helm list -n syndromic-surveillance
Where does the chart live? Chart registries
So far the chart lives in a local ./syndromic-surveillance-api/ directory. That works on your machine. It doesn't work for your teammates, your CI pipeline, or your staging environment.
Just like Docker images have registries (Docker Hub, ECR...), Helm charts have registries too.
OCI registries — the recommended modern approach
Since Helm 3.8 (2022), charts can be stored as OCI artifacts. This means the same registry you already use for containers images can store your charts. On AWS, a single ECR repository holds both:
# package and push the chart
helm package ./syndromic-surveillance-api
helm push syndromic-surveillance-api-0.1.0.tgz \
oci://123456789012.dkr.ecr.eu-west-3.amazonaws.com/charts
# teammates pull and install directly from ECR
helm install syndromic-surveillance-api \
oci://123456789012.dkr.ecr.eu-west-3.amazonaws.com/charts/syndromic-surveillance-api \
--version 0.1.0 \
--namespace syndromic-surveillance \
--create-namespace
What I recommend
I decided to not talk about classic Helm repositories & public Helm registries. Since we are learning about the AWS cloud, just use ECR with OCI. You are already paying for it, IAM controls access (the same permissions model from Day 2-3), no extra infrastructure to maintain, and it works for both Docker images and Helm charts. Your teammates authenticate once via aws ecr get-login-password and pull charts the same way they pull images.
Putting it together: one pipeline
Terraform provisions the platform. Helm deploys the application. Here is the full sequence, end to end:
# 1. Provision the EKS cluster
cd infra/
terraform init
terraform apply \
-var="cluster_name=syndromic-surveillance-platform" \
-var="environment=staging"
# 2. Configure kubectl to talk to the new cluster
aws eks update-kubeconfig \
--region eu-west-3 \
--name syndromic-surveillance-platform
# 3. Verify the nodes are ready
kubectl get nodes
# 4. Deploy the Django API
cd ../syndromic-surveillance-api/
helm upgrade --install syndromic-surveillance-api . \
--namespace syndromic-surveillance \
--create-namespace \
--set image.tag=v1.0.0
# 5. Verify
kubectl get pods -n syndromic-surveillance
helm list -n syndromic-surveillance
Five steps. New environment, new cluster, application running. The same sequence works for dev, staging, and prod — with a different environment variable and image.tag each time.
Two tools, one principle: write once, run anywhere. (I’ve got some old memories coming back to me, lol – it was the Javaboyz who used to talk like that way back then in 1995).
Infrastructure and deployments should be like code: version-controlled, peer-reviewed, auditable, and repeatable. There is no alternative, we should adopt this way of working.
The learning curve is real. HCL takes time to learn. Helm templates can grow verbose. But the payoff is worth it every single time. As always, the subject is huge and cannot be covered in one tiny article. Here are some links you can use to learn more:
Official documentation:
- Terraform documentation
- OpenTofu documentation
- Terraform AWS provider
- terraform-aws-modules/eks
- Helm documentation
- Helm chart hooks
- ArtifactHub — public Helm chart repository
- ChartMuseum — self-hosted Helm chart repository
- Helm OCI support documentation
Tools:
- tfsec — static analysis for Terraform security misconfigurations
- Infracost — cost estimation for Terraform plans before you apply them
- helm-docs — auto-generate documentation for Helm charts
- chart-testing — lint and test Helm charts in CI
Video tutorials: