AWS (Day 6)

provisioning infrastructure with Terraform, packaging applications with Helm

Helm

Disclaimers :

  1. Opinions expressed in this post (and in any of all my posts) are solely, unless otherwise specified, those of the authors, me. Those opinions absolutely do not reflect the views, policies, positions of any organizations, employers, affiliated groups.

  2. This article is educational content. The examples are intentionally simplified for clarity. Before using these patterns in production, consult the Terraform documentation, the Helm documentation, and your platform team.

  3. I've strived for accuracy throughout this piece. If you catch any errors, please reach out — I'd be grateful for the feedback and happy to make updates!



Hook

"We need to set up a new environment for the epidemiology team. Can you have something running by yesterday?" That question is why this article exists: how to easily repeat, redo, reiterate, duplicate, replicate, recreate, remake, reduplicate, the infrastructure we just built?

The previous episodes were focused on the theory: VPCs and compute, IAM and encryption, Kubernetes and EKS, cluster security. A colleague who missed the training would get the cluster up by hand: click through the console, type commands, copy-paste YAML files, and call it done. And it would work. Until he had to do it again for staging, for recette and again for production, and then a third time months later for a deployment in another country.

Deploying 4s for a new lab for a new country inside West Africa is something we've done many time in the last couple of months. Hopefully, deploying 4s will be done for all the countries in Africa. Let me present to you 2 tools to solve that problem at two different layers: Terraform for provisioning the infrastructure, Helm for deploying the applications that run on it.



ToC

  1. The problem: reproducibility at two layers
  2. Terraform: infrastructure as code
  3. Helm: Kubernetes package manager
  4. Where does the chart live? Chart registries
  5. Putting it together
  6. More on this topic



The problem: reproducibility at two layers

Imagine you've just finished reading Day 5. You know how to secure an EKS cluster. Now you need to actually create one and deploy a Django REST API onto it.

You could do it manually, click through the EKS console, create a node group, write a few dozen YAML manifests, copy-paste the right kubectl apply commands. But we could do better. We should be able to easily redo what we've done.

Before we even get to the issue of reproducibility, we want to record, keep and preserve what has been done, if only so that we can review it later or go back and read through what has been achieved. As for me, throughout the workshop, I didn’t just type in commands; I wrote Bash scripts....

#!/bin/bash

# Environment Variables
AWS_REGION="eu-west-1"
CLUSTER_NAME="ptrck-foundation-training"
ACCOUNT_ID="546732826958"   # ID of the EKS Admin role
ROLE_ARN="arn:aws:iam::${ACCOUNT_ID}:role/EKSAdminRole"  # ARN of the EKS Admin role
CLUSTER_ARN="arn:aws:eks:${AWS_REGION}:${ACCOUNT_ID}:cluster/${CLUSTER_NAME}"

# I need to create this ECR repository
ECR_REPO="ecr-repo-ptrck"
DEMO_NS="demo-ns-ptrck"
DEMO_DEPLOYMENT="demo-dply-ptrck"
DEMO_CONTAINER="app-ptrck"

# Helper function to print things with some separation
section() {
    echo ""
    echo "=====> $1"
}


section "Configure AWS CLI credentials"
aws configure

# Identity & role assume
section "Verify identity"
aws sts get-caller-identity

section "Verify role exists"
aws iam get-role --role-name EKSAdminRole

section "List all roles in account (if you have permissions)"
aws iam list-roles \
  --query 'Roles[].RoleName' \
  --output text

section "List user Policies"
aws iam list-user-policies --user-name $(aws sts get-caller-identity --query 'Arn' --output text | cut -d'/' -f 6)

section "List attached policies"
CURRENT_USER=$(aws sts get-caller-identity --query 'Arn' --output text | cut -d'/' -f 6)
aws iam list-attached-user-policies --user-name $CURRENT_USER

section "Check current default region"
aws configure get region

section "(Re)configure AWS CLI credentials"
aws configure

echo ""
section "Assume EKS admin role & retrieve the token"
aws sts assume-role \
    --role-session-name iam-eks-admin-role \
    # was working nice until sunday 7PM ?!
    --role-arn "$ROLE_ARN" \
    --duration-seconds 43200

section "Configure the 'iam-eks-admin-role' profile with the temporary credentials"
aws configure --profile iam-eks-admin-role

section "What are the available roles ?"
aws iam list-roles

section "What are the available policies?"
aws iam list-policies

# Check if there is a K8s cluster available
section "List all EKS clusters"
aws eks list-clusters

section "List clusters in my region"
aws eks list-clusters --region "$AWS_REGION"

section "At this point, there is no clusters available whaaaaaa ?! lol"

section "Let's create our K8s cluster"
eksctl create cluster \
  --name "$CLUSTER_NAME" \
  --region "$AWS_REGION" \
  --nodes 3 \
  --node-type t2.micro \
  # --with-oidc: no need to separately call eksctl utils associate-iam-oidc-provider
  --with-oidc \
  --enable-auto-mode

This is the reproducibility problem, and it lives at two distinct layers:

  1. Infrastructure layer: Who created the VPC? What are the exact subnet CIDRs? Which KMS key encrypts etcd? If the answer lives in someone's memory or a Confluence page, you have a problem.

  2. Application layer: Which version of the Django app is running in staging? How do I roll back if the new release breaks something? If the answer is "look at the kubectl commands in the README", you have a different problem.

Terraform solves the first. Helm solves the second.

Bash scripts are fine for exploration and one-off tasks, but they fall short once you need state tracking, idempotence, error handling, drift detection, and dependency management — which is exactly what infrastructure provisioning requires.



Terraform: infrastructure as code

A brief history

Terraform was created by HashiCorp and first released in 2014. The problem it solved: AWS had CloudFormation, GCP had Deployment Manager, Azure had ARM templates — but if you used more than one cloud, you had three completely different tools, three syntaxes, three mental models. Terraform unified them under a single declarative language: HCL (HashiCorp Configuration Language).

Standards

Core idea: describe the desired state of your infrastructure, and Terraform figures out how to get there. It maintains a state file tracking what it has created, so it can calculate the diff between current state and desired state on every run.

One important note: as a free software advocate, I should talk about the Terraform fork called OpenTofu. OpenTofu is CNCF project, API-compatible with Terraform — same HCL, same providers, same workflow, same everything. For most teams, the choice between them is a licensing and governance question. The examples below work identically on both. I hope.

Alternatives & why Terraform

ToolApproachWhen to prefer it
AWS CloudFormationAWS-native, JSON/YAML templatesAWS-only shops that want native integration and no external tooling
AWS CDKTypeScript/Python constructs that compile to CloudFormationDeveloper-heavy teams who want real programming languages for infrastructure
PulumiMulti-cloud, uses Python/TypeScript/Go directlyTeams that find HCL limiting and want full programming language expressiveness
AnsibleProcedural YAML playbooksConfiguration management and server setup; less suited for cloud resource provisioning
OpenTofuTerraform fork, identical syntaxTeams that want open-source governance and no BSL licensing constraints

Why we chose Terraform? It's multi-cloud (we're on AWS now, but that may change, lol), it has over 4000 providers, it has a mature module ecosystem, and it's the most widely adopted IaC tool in the industry.

Provisioning an EKS cluster

The goal: reproduce the EKS cluster from Day 4 and Day 5 — VPC, private subnets, encrypted etcd, managed node group — using Terraform. Every time. Identically.

Project structure:

infra/
├── main.tf           # providers, backend, modules
├── variables.tf      # input variables
├── outputs.tf        # cluster name, endpoint (consumed by Helm later)
└── terraform.tfvars  # environment-specific values (not committed to Git)

main.tf — the core:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  # Remote state: shared, locked, versioned
  backend "s3" {
    bucket         = "genomics-platform-tfstate"
    key            = "eks/terraform.tfstate"
    region         = "eu-west-3"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = var.region
}

# KMS key for etcd encryption (Day 5 best practice)
resource "aws_kms_key" "eks" {
  description             = "EKS secrets encryption"
  deletion_window_in_days = 7
}

# VPC: private subnets for nodes, public subnets for load balancers
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.cluster_name}-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true

  tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    Environment                                 = var.environment
  }
}

# EKS cluster
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = var.cluster_name
  cluster_version = "1.30"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # Private endpoint only (Day 5 best practice)
  cluster_endpoint_public_access = false

  # etcd encryption with the KMS key above
  cluster_encryption_config = {
    resources        = ["secrets"]
    provider_key_arn = aws_kms_key.eks.arn
  }

  eks_managed_node_groups = {
    genomics = {
      instance_types = ["m5.large"]
      min_size       = 2
      max_size       = 10
      desired_size   = 3
    }
  }
}

variables.tf:

variable "region" {
  description = "AWS region"
  type        = string
  default     = "eu-west-3"
}

variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
}

variable "environment" {
  description = "Deployment environment (dev, staging, prod)"
  type        = string
}

outputs.tf — what other tools will need:

output "cluster_name" {
  value = module.eks.cluster_name
}

output "cluster_endpoint" {
  value = module.eks.cluster_endpoint
}

Deploy it:

terraform init          # download providers and modules
terraform plan          # preview what will be created
terraform apply         # create the infrastructure

Voilà.



Helm: Kubernetes package manager

A brief history

Helm was first presented at KubeCon 2015 by the team at Deis (notably Matt Butcher). The problem it solved: writing raw Kubernetes manifests for a real application means dozens of YAML files, lots of copy-paste across environments, no concept of versioning, and no clean way to roll back. Helm introduced the chart — a package of Kubernetes manifests with templating and versioning built in.

In 2016, Google and Deis merged their Kubernetes packaging work, and Helm was donated to the Kubernetes project. It became a CNCF graduated project in 2020 — the same tier as Kubernetes, Prometheus, and Envoy.

Helm 2 (the first widely adopted version) required a server-side component called Tiller running inside the cluster. Tiller had broad cluster-admin permissions and became a well-known security liability. Helm 3 (released in 2019) removed Tiller entirely — all operations now happen client-side, and permissions come from your own kubeconfig. Simpler and significantly more secure.

Alternatives & why Helm

ToolApproachWhen to prefer it
KustomizeOverlay-based patching, built into kubectlSimple apps where you want no templating and native kubectl support
Plain manifestsRaw YAML + kubectl applyVery small projects, learning environments, one-off deployments
SkaffoldDev-focused workflow: build + push + deployInner-loop development with fast local iteration cycles
Carvel / yttStructured YAML templatingVMware/Tanzu ecosystems

We chose Helm because: versioned releases let you see exactly what is running in each namespace with helm list, atomic upgrades roll back automatically on failure, hooks let you run jobs before or after install (crucial for Django migrations — more on this below), and ArtifactHub hosts thousands of ready-made charts for databases, ingress controllers, monitoring stacks, and more. When you need to install cert-manager or the AWS Load Balancer Controller into your cluster, it's a one-liner.

Packaging a Django REST API

Our syndromic surveillance platform is a Django REST API that processes public health data. Here's how to package it as a Helm chart.

Chart structure:

syndromic-surveillance-api/
├── Chart.yaml            # chart metadata
├── values.yaml           # default configuration values
└── templates/
    ├── _helpers.tpl      # reusable template snippets
    ├── deployment.yaml
    ├── service.yaml
    └── migrate-job.yaml  # Django migration hook

Chart.yaml:

apiVersion: v2
name: syndromic-surveillance-api
description: Django REST API for genomic data processing
type: application
version: 0.1.0       # chart version — bump when the chart itself changes
appVersion: "1.0.0"  # application version — overridden at deploy time

values.yaml — the defaults, overridable per environment:

replicaCount: 2

image:
  repository: 123456789012.dkr.ecr.eu-west-3.amazonaws.com/syndromic-surveillance-api
  pullPolicy: IfNotPresent
  tag: ""  # overridden at deploy time: --set image.tag=v1.2.3

service:
  type: ClusterIP
  port: 8000

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 256Mi

env:
  DJANGO_SETTINGS_MODULE: "config.settings.production"
  DB_HOST: "rds.syndromic-surveillance-platform.internal"
  DB_NAME: "syndromic-surveillance"

templates/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "syndromic-surveillance-api.fullname" . }}
  labels:
    {{- include "syndromic-surveillance-api.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      {{- include "syndromic-surveillance-api.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "syndromic-surveillance-api.selectorLabels" . | nindent 8 }}
    spec:
      securityContext:
        runAsNonRoot: true  # Day 5 best practice
        runAsUser: 1000
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        ports:
        - containerPort: 8000
        env:
        {{- range $key, $val := .Values.env }}
        - name: {{ $key }}
          value: {{ $val | quote }}
        {{- end }}
        resources:
          {{- toYaml .Values.resources | nindent 10 }}
        readinessProbe:
          httpGet:
            path: /healthz/
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

When, 2 years ago, I was first getting to grips with K8s, I remember using side containers to apply Django database migrations. Nowadays, I use “init containers” to do Django database migrations. And whilst doing some research on Helm, specifically to write this article, I was very surprised to discover that this thing exists. A pre-upgrade hook that runs manage.py migrate before any new pods are started:

templates/migrate-job.yaml — I really need to try this

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "syndromic-surveillance-api.fullname" . }}-migrate
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-delete-policy": before-hook-creation
spec:
  template:
    spec:
      restartPolicy: Never
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
      containers:
      - name: migrate
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        command: ["python", "manage.py", "migrate", "--noinput"]
        env:
        {{- range $key, $val := .Values.env }}
        - name: {{ $key }}
          value: {{ $val | quote }}
        {{- end }}

Phew!

Deploy it:

# First deploy
helm install syndromic-surveillance-api ./syndromic-surveillance-api \
  --namespace syndromic-surveillance \
  --create-namespace \
  --set image.tag=v1.0.0

# Upgrade to a new version
helm upgrade syndromic-surveillance-api ./syndromic-surveillance-api \
  --namespace syndromic-surveillance \
  --set image.tag=v1.2.3

# Something broke — roll back to the previous release
helm rollback syndromic-surveillance-api -n syndromic-surveillance

# See what is currently running
helm list -n syndromic-surveillance

Where does the chart live? Chart registries

So far the chart lives in a local ./syndromic-surveillance-api/ directory. That works on your machine. It doesn't work for your teammates, your CI pipeline, or your staging environment.

Just like Docker images have registries (Docker Hub, ECR...), Helm charts have registries too.

OCI registries — the recommended modern approach

Since Helm 3.8 (2022), charts can be stored as OCI artifacts. This means the same registry you already use for containers images can store your charts. On AWS, a single ECR repository holds both:

# package and push the chart
helm package ./syndromic-surveillance-api
helm push syndromic-surveillance-api-0.1.0.tgz \
  oci://123456789012.dkr.ecr.eu-west-3.amazonaws.com/charts

# teammates pull and install directly from ECR
helm install syndromic-surveillance-api \
  oci://123456789012.dkr.ecr.eu-west-3.amazonaws.com/charts/syndromic-surveillance-api \
  --version 0.1.0 \
  --namespace syndromic-surveillance \
  --create-namespace

What I recommend

I decided to not talk about classic Helm repositories & public Helm registries. Since we are learning about the AWS cloud, just use ECR with OCI. You are already paying for it, IAM controls access (the same permissions model from Day 2-3), no extra infrastructure to maintain, and it works for both Docker images and Helm charts. Your teammates authenticate once via aws ecr get-login-password and pull charts the same way they pull images.



Putting it together: one pipeline

Terraform provisions the platform. Helm deploys the application. Here is the full sequence, end to end:

# 1. Provision the EKS cluster
cd infra/
terraform init
terraform apply \
  -var="cluster_name=syndromic-surveillance-platform" \
  -var="environment=staging"

# 2. Configure kubectl to talk to the new cluster
aws eks update-kubeconfig \
  --region eu-west-3 \
  --name syndromic-surveillance-platform

# 3. Verify the nodes are ready
kubectl get nodes

# 4. Deploy the Django API
cd ../syndromic-surveillance-api/
helm upgrade --install syndromic-surveillance-api . \
  --namespace syndromic-surveillance \
  --create-namespace \
  --set image.tag=v1.0.0

# 5. Verify
kubectl get pods -n syndromic-surveillance
helm list -n syndromic-surveillance

Five steps. New environment, new cluster, application running. The same sequence works for dev, staging, and prod — with a different environment variable and image.tag each time.



More on this topic

Two tools, one principle: write once, run anywhere. (I’ve got some old memories coming back to me, lol – it was the Javaboyz who used to talk like that way back then in 1995).

Infrastructure and deployments should be like code: version-controlled, peer-reviewed, auditable, and repeatable. There is no alternative, we should adopt this way of working.

The learning curve is real. HCL takes time to learn. Helm templates can grow verbose. But the payoff is worth it every single time. As always, the subject is huge and cannot be covered in one tiny article. Here are some links you can use to learn more:

Official documentation:

Tools:

  • tfsec — static analysis for Terraform security misconfigurations
  • Infracost — cost estimation for Terraform plans before you apply them
  • helm-docs — auto-generate documentation for Helm charts
  • chart-testing — lint and test Helm charts in CI

Video tutorials: