ITIL 4 Foundations

Training notes turned into a practical guide to ITIL v4

what?

Disclaimers

  1. Opinions expressed in this post (and in any of all my posts) are solely, unless otherwise specified, those of the author, me. Those opinions absolutely do not reflect the views, policies, positions of any organizations, employers, or affiliated groups.

  2. If you are a colleague and recognize the situation described below, remember: I'm critiquing the system, not the people.

  3. Render unto Caesar what is Caesar's: Claude helped me find up-to-date documentation links and helped beautify all the diagrams used (and not used) in this article.



Hook

I tend to stay in my lane. Focus on what I understand, the perimeter I actually control. Everything beyond my code, my systems, my terminal (the hierarchy, management, all the people I work with who don't write programs) has always been a blur to me. Vague, formless, abstract. Sometimes genuinely incomprehensible. Not because I think those people are wrong, but because whatever they do has always felt like noise I couldn't parse.

I'll grudgingly admit, though: since joining this biomedical research center, everything that happens outside my direct work has become the most interesting part of the job. Humans are fascinating creatures to observe from a safe distance. Maybe it's because my actual technical work is no longer the hardest thing on my plate. Or maybe because what happens in the corridors and meeting rooms has a direct, outsized impact on my work, on my freedom to do that work the way I'd want to.

The hardest thing is collaborating with many other people, exchanging information in a way that everyone actually understands each other. Not just hears: understands. The hardest thing is listening, really listening to what others have to say, deeply, absorbing and retaining what they meant. Then responding, expressing yourself in a way that gets through, in a way that convinces. Making the right compromises, the right calls, and moving forward together, making sure everyone gets something of what they came for.


Last Tuesday, I was enrolled in an ITIL 4 Foundations training organized by my hierarchy. The stated goals were clear: garantir la disponibilité des systèmes (ensure system availability), structurer la gestion des incidents (structure incident management), and parler le même langage (speak the same language). Perfectly reasonable goals.

In practice: sitting in a room while someone talks through concept after concept, deploying words and definitions at high velocity, is its own endurance sport. It's made harder by the fact that some concepts I thought I already knew get redefined with vocabulary I would never have chosen — and some familiar words get used in ways that are subtly but persistently disorienting.

That said, I have to be honest: ITIL 4 is not the bureaucratic nightmare its reputation suggests. If anything, I realized I'd already been practicing most of these concepts; informally, imperfectly, without knowing there was a name for them. Done right, this is a coherent framework that actually acknowledges modern realities: Agile, DevOps, SRE, Lean.

Since I took notes anyway, might as well turn them into something useful. I hope you'll learn something. Music \o/

Table of contents

  1. What is itil 4?
  2. Why should your organization care?
  3. The vocabulary
  4. Pestle: external forces
  5. The service value system (svs)
  6. The service value chain (svc)
  7. The 4 dimensions
  8. The 7 guiding principles
  9. The 34 practices
  10. Incident management
  11. Change enablement
  12. Continual improvement
  13. Governance and risk management
  14. ITIL 4 ↔ agile, devops, lean
  15. Case study
  16. Tooling
  17. ITIL v5
  18. Conclusion



What is ITIL 4?

ITIL (Information Technology Infrastructure Library) is a framework for IT Service Management (ITSM). It provides guidance on how organizations can use IT as a tool to facilitate business transformation and growth.

Key facts:

  • Origin: Developed in the 1980s by the UK government's CCTA (Central Computer and Telecommunications Agency) to standardize IT practices across public services.
  • ITIL 4 (released 2019): The current version. A major rethink compared to ITIL v3, designed to integrate with modern approaches: Agile, DevOps, Lean, SRE.
  • Owner: Currently maintained by PeopleCert, which acquired AXELOS (the previous owner) in 2021.
  • Who uses it? Organizations of all sizes, across industries. Particularly common in banking, telecom, healthcare, public sector, and large IT departments.

One thing ITIL 4 gets right that its predecessors didn't: it explicitly acknowledges that you don't need to implement all 34 practices. You pick what fits. This was not how earlier versions were often sold and it caused a lot of unnecessary process theater in organizations that implemented ITIL like a checkbox exercise.



Why should your organization care?

During the training, an interesting observation was made about three organizational thresholds where ITIL practices start to become relevant:

ThresholdPain PointWhat Starts Breaking
≥ 5 peopleMemory lossKnowledge lives in people's heads, not in runbooks
≥ 10 peopleComplexityCoordination overhead grows faster than the team
> 10 peopleComplianceAudits, regulations, and reporting requirements appear

The practical implication: you don't need all of ITIL. A 7-person startup building an internal tool doesn't need a Change Advisory Board. A bank with 200 IT staff probably does. ITIL 4's design philosophy is, understand the principles, apply the practices that solve your actual problems, and leave the rest on the shelf.



The vocabulary

Before we can discuss the framework, we need to agree on definitions. ITIL has precise meanings for words we use casually.

Who's involved?

TermDefinition
OrganizationA person or group with its own functions, responsibilities, and authorities
StakeholderAny person with an interest in an organization, product, or service
Service ProviderAn organization that delivers services to consumers
Service ConsumerAn organization or individual that receives services
CustomerDefines requirements and owns the outcomes
UserThe person who uses the service day-to-day
SponsorThe person who authorizes the budget

In practice: the customer decides what they need, the user interacts with it daily, and the sponsor pays for it.

What are we delivering?

TermDefinition
ResourceAny component used to deliver value: people, infrastructure, capital, information
ProductA configuration of resources designed to offer value to a consumer
ServiceA means of enabling value co-creation by facilitating outcomes that customers want
Service OfferingA formal description of one or more services, designed for a target consumer group

The value equation

This is arguably the most important concept in ITIL v4:

Value = Utility × Warranty

  • Utility: What the service does — fit for purpose. Does it support the customer's performance or remove a constraint?
  • Warranty: How reliably it does it — fit for use. Is it available when needed? Secure? Capable of handling the load?

A service that does the right thing but crashes every Monday morning has no value. A service that's always available but solves the wrong problem also has no value. We need both, because a zero on either side collapses the product.

And value is subjective, contextual, and co-created. The provider alone cannot create value, it's always produced in interaction with the consumer. This shifts the conversation from "we delivered the system" to "did the customer actually get what they needed?"

Who does what — for whom

Click any node to see its ITIL 4 definition

⚙ Service Provider
🧱
Resource
Raw input to build with
↓ assembles into
📦
Product
Configured for value delivery
↓ packaged as
Service
Enables consumer outcomes
↓ described in
📋
Service Offering
Formal promise to the consumer
delivers
Utility × Warranty
= Value
co-created
by both sides
requires
👥 Service Consumer
💰
Sponsor
Authorizes the budget
↓ funds
🎯
Customer
Defines requirements & owns outcomes
↓ specifies for
🖥
User
Uses the service day-to-day
↓ feeds back to
🌐
Stakeholder
Any party with an interest
← Click any node to see its definition



PESTLE: The external forces you cannot control

ITIL v4 acknowledges that organizations don't operate in a vacuum. PESTLE analysis provides a structured lens for scanning the external environment:

FactorWhat It CoversIT Relevance
PoliticalGovernment policies, regulationsData sovereignty laws, public sector mandates
EconomicBudget cycles, cost pressuresCloud cost optimization, license negotiation
SocialUser expectations, workforce trendsRemote work adoption, digital literacy gaps
TechnologicalEmerging tech, vendor landscapeAI integration, cloud migration, security threats
LegalCompliance requirementsGDPR, HIPAA, sector-specific regulations
EnvironmentalSustainability, green ITData center energy, hardware lifecycle
pestle

PESTLE is not unique to ITIL, it's a general strategic analysis tool. ITIL v4 borrows it because service management decisions are always made inside an organizational and environmental context. We never want to build a technically excellent system that doesn't survive its first compliance audit or budget cut.



The big picture: Service Value System (SVS)

The Service Value System (SVS) is ITIL v4's model for how an organization creates value. Think of it as the engine diagram for your IT service operations.

It takes opportunities and demands as input, runs them through a system of governance, practices, and activities, and outputs value for all stakeholders.

The SVS is composed of five components:

  1. Guiding Principles — universal recommendations that apply to all decisions
  2. Governance — the framework for directing and controlling the organization
  3. Service Value Chain (SVC) — the set of interconnected activities that turn demand into value
  4. Management Practices — the 34 practices that provide organizational capability
  5. Continual Improvement — the persistent drive to improve everything

ITIL 4 — Service Value System (SVS)

Click any component to learn more

💡
Input
Opportunity
📋
Input
Demand
✦ Guiding Principles — universally applicable ✦
🏛
Governance
Direct · Evaluate · Monitor
Service Value Chain
6 interconnected activities that convert demand into value
PlanEngageDesign & TransitionObtain / BuildDeliver & SupportImprove
🔧
34 Practices
General (14) · Service (17) · Technical (3)
↻ Continual Improvement — embedded at every level ↺
Output
Value for all stakeholders
← Click a component above to see its role in the SVS
Guiding Principles
Governance
Service Value Chain
34 Practices
Continual Improvement

NOTA BENE: Guiding Principles and Continual Improvement are not steps in a sequence. They're a mindset that applies to all activities, at all times. You don't "do continual improvement" after you're done. You do it while you're doing everything else.



Inside the SVS: the Service Value Chain (SVC)

The Service Value Chain is the operational model at the core of the SVS. It defines 6 activities that can be combined in different ways. The combination for a given service or request is called a value stream.

ActivityPurpose
PlanShared understanding of vision, direction, and current state
EngageUnderstand stakeholder needs; maintain transparency and relationships
Design & TransitionEnsure products and services meet stakeholder expectations
Obtain / BuildEnsure service components are available when and where needed
Deliver & SupportEnsure services are delivered and supported per agreed specifications
ImproveContinual improvement across all products, services, and practices

The SVC is not linear. An incident resolution primarily activates Engage, Deliver & Support, and Improve. A new service launch activates all six. A routine service request might only need Engage and Deliver & Support.

Service Value Chain (SVC)

Six interconnected activities — combined differently for each value stream

🗺
Plan
Direction & strategy
🤝
Engage
Stakeholders & needs
📐
Design & Transition
Meet expectations
🔨
Obtain / Build
Components ready
🚀
Deliver & Support
Run & resolve
📈
Improve
Always, for everything
↔ Activities are non-linear: each service interaction combines them differently
← Click an activity above to see what it covers



The 4 dimensions

Every component of the SVS should be considered from 4 perspectives. Ignoring any of them is one of the most reliable ways to make a project fail.

DimensionWhat It CoversClassic Failure Mode
Organizations and PeopleOrg structure, culture, roles, responsibilities"We deployed the tool but nobody knows how to use it"
Information and TechnologyData, knowledge, tools, infrastructure"The system works but we can't query the data we need"
Partners and SuppliersExternal providers, contracts, integrations"Our vendor went down and we had no fallback"
Value Streams and ProcessesHow work flows through the organization"Everyone's busy but nothing ships"

These 4 dimensions are interdependent. Migrating to a new monitoring tool (Information & Technology) means retraining people (Organizations & People), renegotiating vendor contracts (Partners & Suppliers), and redesigning on-call workflows (Value Streams & Processes).



The 7 guiding principles

These are universal principles that guide decision-making in any circumstance. They apply regardless of which practices you're implementing.

1. Focus on value
Every activity should link back to value for stakeholders. Before starting anything, ask: who benefits, and how? If you can't answer that, stop.

2. Start where you are
Never ever assume you need to start from scratch. Assess what already works before designing something new. Most organizations already have something useful, the goal is to build on it, not erase it.

3. Progress iteratively with feedback
Don't try to design the perfect solution upfront. Deliver in small increments, gather feedback, adjust. This is the same philosophy behind Agile sprints and DevOps Continuous Delivery.

4. Collaborate and promote visibility
Work done in silos fails. Decisions made without the right people in the room fail. Make work visible: kanban boards, incident dashboards, change calendars; and involve stakeholders early, not at the end.

5. Think and work holistically
Services are systems. A change to one component can have unexpected effects elsewhere. Consider end-to-end impact, not just the piece in front of you. Lol!

6. Keep it simple and practical
If a process step doesn't add value, eliminate it. If a tool creates more overhead than it saves, reconsider. Complexity is the enemy of reliability. Where I come from, we say keep it simple and stupid, see here

7. Optimize and automate
First, optimize the process (eliminate waste, reduce friction). Then, automate what remains. Automating a broken process gives you broken results, faster. The order matters.



The 34 management practices

ITIL v4 defines 34 practices designed for performing work or accomplishing an objective. They replace what ITIL v3 called "processes" (the shift reflects that a practice includes culture, tools, and knowledge, not just the steps).

They're organized into 3 categories:

CategoryCountExamples
General management14Risk management, knowledge management, project management, continual improvement, organizational change management
Service management17Incident management, problem management, change enablement, service desk, service level management
Technical management3Deployment management, infrastructure & platform management, software development & management

The full list of 34 ITIL 4 practices exists for reference. No, you do not need to apply all 34 ITIL v4 practices. ITIL v4 is designed to be adopted and adapted. The core philosophy is to use only the practices that address your organization's specific needs, pain points, and goals.

When you start, you need to identify your top 3 pain points (e.g., too many incidents, slow changes) to pick the right practices to implement first.

Here, we are almost starting from zero and we are already overwhelmed to the max. I would like to suggest 4 practices that, if we don't have them, we're operating blind:

  1. IT asset management → What do we actually own?
  2. Service request management → What are our users asking for?
  3. Incident Management → What is broken right now?
  4. Change Enablement → What is changing and why?

The 4 together cover the 4 basic operational questions any IT team needs answered at any moment. Everything else can be phased in.



Incident management

Incident management is the practice responsible for managing the lifecycle of all incidents: unplanned interruptions or reductions in the quality of a service. Before we go further, let's nail the vocabulary:

TermDefinition
ErrorA flaw in a component that could cause incidents
IncidentAn unplanned interruption or reduction in service quality
ProblemThe cause (known or unknown) of one or more incidents
Known errorA problem that has been successfully analyzed, and a workaround or permanent solution has been identified
Root CauseThe fundamental reason behind a problem

These are distinct. An incident is what you respond to at 2am. A problem is what you investigate to prevent the next one. Conflating them is how teams fix the same incident repeatedly without ever addressing what's actually causing it.

The 7-Step incident lifecycle

StepAction
1Identification: monitoring alert, user report, or support ticket
2Logging: full record: timestamp, description, affected service, reporter
3Categorization: what type of incident? which service/component?
4Prioritization: how urgent? how impactful?
5Diagnosis: initial investigation, probable cause
6Resolution: fix applied, service restored
7Closure: confirm resolution, document lessons, update knowledge base

Incident Priority Matrix

Priority is determined by Impact (how many users/processes are affected) and Urgency (how quickly the business needs resolution). The matrix:

High UrgencyMedium UrgencyLow Urgency
High ImpactP1P2P3
Medium ImpactP2P3P3
Low ImpactP3P3P4
Trivial ImpactP4P4P4

Priority codes (not to be confused with Biosafety Levels):

  • P1 (Critical): Production down, major business impact. Immediate response, all hands.
  • P2 (High): Significant service degradation. Fast response, escalation ready.
  • P3 (Medium): Partial impact, workaround available. Normal SLA response.
  • P4 (Low): Minimal impact, cosmetic or edge case. Can be scheduled.

NB: BSL (Biosafety Level) is the standard, modern terminology, while P (P1, P2, P3, P4) is an older designation standing for Pathogen/Protection level.

Escalation

The process within incident and service request management used to transfer issues to higher levels of authority or expertise when they cannot be resolved within normal, front-line support procedures.

When should we escalate ? When the incident exceed SLA timeframes, require higher expertise, or need more management authority.

There are 2 types:

  • Functional escalation: pass to someone with more technical expertise
  • Hierarchical escalation: involve management when impact warrants it or when SLA breach is imminent

A crisis is when an incident has business-wide consequences that require executive involvement. Knowing when a P1 becomes a crisis (and who to call) should be documented before the crisis, not during it.



Change enablement

Change Enablement manages the lifecycle of all changes to maximize the probability of success and minimize disruption.

ITIL v4 recognizes 3 change types:

TypeRiskAuthorizationExamples
StandardLow, well-understoodPre-authorizedAntivirus update, user account creation
NormalVariableChange authority / CAB reviewApplication deployment, infrastructure change
EmergencyHigh urgencyEmergency authority (fast-track)Critical security patch, production hotfix

The CAB (Change Advisory Board)

The CAB is a group that reviews and authorizes significant normal changes. Typical members: IT operations, security, business stakeholders, and the change owner. The CAB doesn't exist to slow things down, it exists to catch problems before they become incidents. It is the organizational mechanism that prevents the finance team from being unable to run payroll on a Friday.

A proper CAB dossier for a normal change includes:

  • Impact: which services, users, and systems are affected?
  • Risk: what's the probability of failure, and what's the impact if it fails?
  • Rollback plan: how do we revert cleanly if the change fails? (Tested rollback, not theoretical)
  • Post-implementation review: how do we verify success after deployment?



Continual improvement

Continual improvement is both a guiding principle and a dedicated practice in ITIL 4. It ensures the organization continuously improves its services, practices, and the SVS itself.

The Continual Improvement Register (CIR) is the operational tool: a log of all improvement ideas, their status, priority, and outcomes. Think of it as a backlog specifically for operational improvements, not features, not incidents, but how we work.

The 7-Step improvement model

This model is cyclical by design. Reaching step 7 doesn't mean you're done, it means you start over with a new vision.

StepQuestion to Answer
1What is the vision? Align the improvement to business strategy
2Where are we now? Baseline assessment: metrics, maturity, current state
3Where do we want to be? Define target state and measurable success criteria
4How do we get there? Build the improvement plan
5Take action. Execute iteratively
6Did we get there? Measure against success criteria
7How do we keep the momentum? Embed the change, celebrate wins, start the cycle again

Measuring improvement: SLA vs XLA and beyond

MetricWhat It Measures
KPI (Key Performance Indicator)Technical performance against defined targets
SLA (Service Level Agreement)Contractual service commitments (uptime, response times)
OKR (Objective and Key Result)Strategic goal achievement across a period
XLA (Experience Level Agreement)User satisfaction and actual experience quality
CIRImprovement pipeline health and throughput

SLA vs XLA deserves its own paragraph. An SLA measures what the system does (was uptime ≥ 99.9%?). An XLA measures how the user felt about it. You can meet every SLA and still have furious users, because the system was slow even within the threshold, or the error messages were unhelpful, or the support experience was frustrating. XLAs align IT metrics with actual business outcomes, not just technical compliance.

3 Axes of improvement

  • Strategic: organization-wide, long-term (e.g., "reach 95% automated change deployment by end of year")
  • Tactical: team/process level, medium-term (e.g., "reduce MTTR on P1 incidents by 40% this quarter")
  • Operational: day-to-day (e.g., "update the incident categorization taxonomy this sprint")



Governance and Risk Management

Governance sits at the top of the SVS. It provides direction and ensures the organization stays aligned with its objectives and obligations. Three functions:

  • Direct: Set strategy, policies, and organizational objectives
  • Evaluate: Assess organizational performance, risks, and compliance
  • Monitor: Track adherence to policies and verify outcomes

For service management, governance ensures that the SVS is accountable: decisions are traceable, risks are understood, and exceptions are handled systematically.

Risk Management in ITIL follows a straightforward formula:

Risk = Probability × Impact

Three responses:

  1. Mitigate: reduce probability or impact (implement redundancy, add monitoring, require peer review)
  2. Transfer: shift the risk to another party (insurance, contractual SLAs with vendors, outsourcing)
  3. Accept: acknowledge the risk and decide to live with it (for low probability, low impact scenarios)

Risk isn't inherently bad. The problem is unacknowledged risk, for example: changes deployed without a risk assessment, dependencies unknown until they fail, vulnerabilities known but not documented. Risk management doesn't eliminate uncertainty; it makes it visible and traceable.



ITIL 4 ↔ Agile, DevOps, Lean

One of the major improvements in ITIL v4 over its predecessors is the explicit integration with modern practices. The framework doesn't compete with DevOps or Agile. It provides the governance layer that makes them sustainable at scale.

ITIL 4 ConceptAgile EquivalentDevOps EquivalentLean Equivalent
Service Value SystemProduct Delivery FlowCI/CD PipelineValue Stream
Continual ImprovementSprint RetrospectiveBlameless Post-mortemKaizen
Change EnablementSprint PlanningDeployment PipelineFlow Management
Incident ManagementBug SprintIncident ResponseA3 Problem Solving
Service RequestBacklog ItemWork ItemPull System
SLA / XLADefinition of DoneSLO / SLIQuality Standards
CAB ReviewSprint ReviewDeployment GateGo/No-Go Decision
Problem ManagementRoot Cause AnalysisPost-mortem5 Whys
Service DeskProduct Owner (ops)L1/L2 SupportAndon Cord

The official ITIL 4 and DevOps resources make the integration case explicitly. The short version: ITIL handles governance and accountability; DevOps handles speed and flow. You need both. Speed without governance creates chaos. Governance without speed creates bureaucracy. ITIL v4 was redesigned precisely to not be the bottleneck.



Case study

Here is what ITIL v4 looks like applied to a real IT direction inside a biomedical research center in Senegal. Any resemblance to real persons, living or dead, is purely coincidental.

Part 1 — Service catalog

The official list of what the IT direction provides, to whom, under what terms. It is the operational contract between the IT team and the rest of the organization. Without it, users don't know what to ask for, IT doesn't know what they're accountable for, and incidents pile up in a shared mailbox nobody consistently monitors.

Below: 5 services covering the majority of what a research institute IT direction actually handles day-to-day.

ServiceDescriptionChannelSLA — Response / ResolutionResponsible
User account provisioningCreation, modification, or deactivation of accounts (Active Directory, email, VPN, application access). Covers onboarding and offboarding.ITSM ticketing portal / HR workflow trigger4h / 1 business daySysadmin — Identity & Access team
Workstation supportHardware diagnosis and repair, OS reinstall, peripheral setup (printers, scanners, external drives). On-site or remote via support session.Ticketing portal — phone for P1/P2 onlyP3: 4h / 2 days — P4: next business day / 5 daysIT Support technician
Application supportFunctional support for lab and corporate applications: LIMS, ERP (finance, HR), syndromic surveillance platform, email (M365). Includes access requests and usage questions.Ticketing portal, categorized by applicationP2: 2h / 4h — P3: 4h / 1 day — P4: 1 day / 3 daysApplication owner + IT Operations
VPN & remote accessSetup, configuration, and troubleshooting of remote access (VPN client, MFA token). Covers new setups for approved staff and incident resolution for existing connections.Ticketing portal — escalation via phone if blocking remote workNew setup: 1 day / 2 days — Incident: 2h / 4hNetwork team
Backup & data recoveryScheduled backups of research data, institutional databases, and user file shares. Restore requests for accidental deletions or corruption. Covers file-level and full-system restores.Ticketing portal — critical restores via phone (P1/P2)P1 restore: 1h acknowledged / 4h RTO — P3 file restore: 4h / 1 daySysadmin — Infrastructure team

Part 2 — RFC & CAB Dossier: LIMS major version upgrade

A Request for Change (RFC) is the formal input to the Change Enablement practice. For a Normal or a Major change, it triggers a full CAB review before any action is taken. Here is what a complete dossier looks like.

Context: Institut Patrick de Kinshasa uses a LIMS (Laboratory Information Management System) to track samples, tests, and results across all its labs. The vendor has released version 4.0, which drops support for the current version 3.x at end of year. The upgrade is not optional, but it touches sample data, active test runs, and integrations with the syndromic surveillance platform.


RFC identification

FieldValue
RFC NumberRFC-2026-014
TitleLIMS major version upgrade: v3.8 → v4.0
Change typeNormal — Major
RequestorHead of IT Operations
Application ownerDirector of Laboratory Services
Submission date2026-05-03
Requested deployment window2026-05-23, Saturday 22:00 → Sunday 06:00 (WAT)
CAB review date2026-05-13

Justification

The current LIMS v3.x reaches vendor end-of-life on 2026-12-31. After that date: no security patches, no bug fixes, no vendor support. Version 4.0 also introduces a native REST API required by the next release of the syndromic surveillance platform. This change is driven by compliance (security posture) and technical dependency (surveillance platform roadmap) simultaneously.


Impact assessment

DimensionDetails
Systems affectedLIMS application servers (lims-prod-01, lims-prod-02), LIMS PostgreSQL database (lims-db-01), integration bridge with surveillance platform
Users affected~65 lab technicians and researchers across 4 departments: Virology, Bacteriology, Epidemiology, Quality
Business processes affectedSample reception, test assignment, result entry, result validation, report generation, QC workflows
Downstream servicesSurveillance platform reads LIMS results via API — will operate in read-only degraded mode during the window
Duration of impactLIMS unavailable ~8 hours; all impact contained within the Saturday night window
Data at risk~120,000 sample records, 3 years of test history. No patient PII in LIMS (samples referenced by anonymous ID).

Risk assessment

RiskProbabilityImpactScoreMitigation
DB migration script fails mid-run, leaving schema in inconsistent stateLowCritical🔴 HighFull DB snapshot before migration; script tested 3× on staging with a copy of production data
v4.0 breaks custom report templates (Quality dept uses non-standard format)MediumMedium🟡 MediumAll 12 templates validated on staging; Quality dept sign-off obtained before CAB submission
Surveillance platform integration fails after API changeLowHigh🟡 MediumIntegration tested on staging against v4.0 API; surveillance team on standby Sunday 00:30
Deployment overruns window, runs into Monday lab opening at 07:30LowHigh🟡 MediumHard rollback decision point at 04:00 — if not complete by then, rollback regardless of progress
UI changes slow down lab technicians Monday morningMediumLow🟢 Low30-min walkthrough session Monday 08:00; quick reference card distributed Friday

Implementation plan

Time (WAT)StepResponsibleCheckpoint
Sat 21:30Pre-deployment checklist: verify backup completed, confirm no active test runs, notify on-call lab managerIT Ops lead✓ required to proceed
Sat 22:00Put LIMS in maintenance mode (users see scheduled maintenance banner)Sysadmin
Sat 22:05Full PostgreSQL dump to backup server + verify checksumDBA✓ required to proceed
Sat 22:30Stop LIMS application services on both nodesSysadmin
Sat 22:35Run database schema migration script (estimated: 45 min)DBA✓ required to proceed
Sat 23:20Deploy v4.0 application package on lims-prod-01 (primary node)Sysadmin
Sat 23:40Run automated smoke test suite against lims-prod-01IT Ops lead✓ required to proceed
Sun 00:00Deploy v4.0 on lims-prod-02, verify cluster synchronizationSysadmin
Sun 00:30Test surveillance platform integration against new v4.0 APISurveillance team✓ required to proceed
Sun 01:00Manual validation by on-call lab manager: create test sample, enter result, generate reportLab manager✓ required to proceed
Sun 01:30Remove maintenance mode, monitor application logs for 30 minIT Ops lead
Sun 02:00Deployment declared successful — confirmation sent to all stakeholdersIT Ops lead

Hard rollback decision point: Sunday 04:00. If not complete and validated by then, rollback is initiated regardless of progress. No exceptions — the lab opens at 07:30.


Rollback procedure

Triggered if: any required checkpoint fails and cannot be resolved within 30 minutes, OR the 04:00 hard deadline is reached.

  1. Stop all LIMS application services on both nodes
  2. Drop the migrated database: dropdb lims_production
  3. Restore from the pre-migration dump: pg_restore -d lims_production /backup/lims-premig-20260523.dump
  4. Verify integrity: compare row counts for samples, tests, results tables against the pre-migration snapshot
  5. Redeploy the v3.8 package from artifact registry on both nodes
  6. Restart application services, verify cluster health
  7. Remove maintenance mode, run the same smoke test suite
  8. Notify all stakeholders: rollback completed, root cause analysis scheduled Monday

Estimated rollback duration: 45 minutes. The 04:00 decision point leaves 2.5 hours of margin before lab opening.


Communication plan

AudienceMessageWhenChannel
All LIMS usersScheduled maintenance — LIMS unavailable Saturday 22:00 → Sunday 06:00Friday 2026-05-22, 14:00Email + in-app banner
Lab managersUpgrade rationale, v4.0 UI changes, quick reference card attachedFriday 2026-05-22, 14:00Email
Surveillance teamAPI changes summary, integration test results, standby request for Sunday 00:30Thursday 2026-05-21Direct message + ticket
All stakeholdersOutcome (success or rollback) + next stepsSunday 2026-05-24, by 06:00Email

Post-implementation review

Scheduled: Monday 2026-05-25, 10:00 / 30 minutes. Attendees: IT Ops lead, application owner, one representative per affected lab department.

Agenda:

  1. Did deployment complete within window? If not: what caused the overrun?
  2. Were all required checkpoints met? Document any deviations or waivers.
  3. User-reported issues since go-live (collected Monday morning, before the review)
  4. Was rollback triggered? If yes: full root cause analysis before next major change
  5. Were Application Support SLAs met during the post-go-live monitoring period?
  6. Update the LIMS runbook with any new procedures discovered during the upgrade
  7. Close RFC-2026-014 in the ITSM tool; status: Successful or Successful with deviations

The output feeds directly into the Continual Improvement Register: anything harder than expected, any gap in the rollback procedure, any user adoption friction; logged as improvement items before the next major change.



Tooling

No tool enforces ITIL by itself. The process has to exist first, the tool just reduces friction. That said, the right tooling makes the difference between a practice that lives on paper and one that teams actually follow.

A word on scope: ServiceNow is the de facto enterprise standard and covers nearly everything in the table below. It is also expensive, slow to configure, and overkill for most organizations below a few hundred IT staff.

The open source column below deserves serious consideration. GLPI in particular covers incident, change, CMDB, and knowledge management in a single self-hosted package.

ITSM PlatformIncident Detection & AlertingIncident ResponseKnowledge ManagementChange TrackingContinual Improvement
Suggested toolsServiceNow, Jira Service Management, FreshservicePagerDuty, OpsGenie, incident.ioFireHydrant, RootlyConfluence, Notion, GitBookJira, LinearBSleuth, Datadog, LinearB
Free & open sourceGLPI, iTop, ZnunyPrometheus + Alertmanager, ZabbixWiki.js, Outline, BookStackPlane, Gitea / ForgejoFour Keys (Google)



ITIL v5

We've been taught during the training ITIL v5 was launched in february 2026. I tried to watch this video to have an overview of what is ITIL v5. The only thing I understood was: "The new pace of changes today driven by cloud, ai, digital products, experience-focused services, demand a new way of thinking." ITIL v5 answers that need. Phew!



Conclusion

ITIL v4 is a framework providing a vocabulary and a structure for conversations your organization was probably already having informally: Who decides what gets changed? How do we prioritize when everything is on fire? How do we know we're actually improving?

The most valuable takeaways from this training:

  1. Value is co-created, not delivered. You can deploy perfect infrastructure and still fail if users can't get value from it.
  2. SLA ≠ user happiness. XLAs measure what actually matters to the business.
  3. ITIL v4 and DevOps are complementary. ITIL provides the accountability layer that makes continuous delivery safe at organizational scale.
  4. Adopt and adapt. Apply the practices that solve real problems, and leave the rest on the shelf until you need them.

If your systems go down and nobody knows who to call, what changed yesterday, or how long things have been broken, that's not a technical problem. That's a service management problem. ITIL v4 gives you the vocabulary to fix it.


More on this topic

Congratulations, you made it to the end. If you enjoyed this, please like, share & subscriiiibe. You might also like my amazing post about AWS fundamentals or how I'm helping save lives in West Africa using free software.

This article is a summary of a three-day training course. Impossible to explore all the aspects of ITIL v4 in such a short post. Here are some useful links to lean more:

Even more: