ITIL 4 Foundations

Training notes turned into a practical guide to ITIL v4

02 May 2026

Disclaimers

Opinions expressed in this post (and in any of all my posts) are solely, unless otherwise specified, those of the author, me. Those opinions absolutely do not reflect the views, policies, positions of any organizations, employers, or affiliated groups.
If you are a colleague and recognize the situation described below, remember: I'm critiquing the system, not the people.
Render unto Caesar what is Caesar's: Claude helped me find up-to-date documentation links and helped beautify all the diagrams used (and not used) in this article.

Hook

I tend to stay in my lane. Focus on what I understand, the perimeter I actually control. Everything beyond my code, my systems, my terminal (the hierarchy, management, all the people I work with who don't write programs) has always been a blur to me. Vague, formless, abstract. Sometimes genuinely incomprehensible. Not because I think those people are wrong, but because whatever they do has always felt like noise I couldn't parse.

I'll grudgingly admit, though: since joining this biomedical research center, everything that happens outside my direct work has become the most interesting part of the job. Humans are fascinating creatures to observe from a safe distance. Maybe it's because my actual technical work is no longer the hardest thing on my plate. Or maybe because what happens in the corridors and meeting rooms has a direct, outsized impact on my work, on my freedom to do that work the way I'd want to.

The hardest thing is collaborating with many other people, exchanging information in a way that everyone actually understands each other. Not just hears: understands. The hardest thing is listening, really listening to what others have to say, deeply, absorbing and retaining what they meant. Then responding, expressing yourself in a way that gets through, in a way that convinces. Making the right compromises, the right calls, and moving forward together, making sure everyone gets something of what they came for.

Last Tuesday, I was enrolled in an ITIL 4 Foundations training organized by my hierarchy. The stated goals were clear: garantir la disponibilité des systèmes (ensure system availability), structurer la gestion des incidents (structure incident management), and parler le même langage (speak the same language). Perfectly reasonable goals.

In practice: sitting in a room while someone talks through concept after concept, deploying words and definitions at high velocity, is its own endurance sport. It's made harder by the fact that some concepts I thought I already knew get redefined with vocabulary I would never have chosen — and some familiar words get used in ways that are subtly but persistently disorienting.

That said, I have to be honest: ITIL 4 is not the bureaucratic nightmare its reputation suggests. If anything, I realized I'd already been practicing most of these concepts; informally, imperfectly, without knowing there was a name for them. Done right, this is a coherent framework that actually acknowledges modern realities: Agile, DevOps, SRE, Lean.

Since I took notes anyway, might as well turn them into something useful. I hope you'll learn something. Music \o/

What is ITIL 4?

ITIL (Information Technology Infrastructure Library) is a framework for IT Service Management (ITSM). It provides guidance on how organizations can use IT as a tool to facilitate business transformation and growth.

Key facts:

Origin: Developed in the 1980s by the UK government's CCTA (Central Computer and Telecommunications Agency) to standardize IT practices across public services.
ITIL 4 (released 2019): The current version. A major rethink compared to ITIL v3, designed to integrate with modern approaches: Agile, DevOps, Lean, SRE.
Owner: Currently maintained by PeopleCert, which acquired AXELOS (the previous owner) in 2021.
Who uses it? Organizations of all sizes, across industries. Particularly common in banking, telecom, healthcare, public sector, and large IT departments.

One thing ITIL 4 gets right that its predecessors didn't: it explicitly acknowledges that you don't need to implement all 34 practices. You pick what fits. This was not how earlier versions were often sold and it caused a lot of unnecessary process theater in organizations that implemented ITIL like a checkbox exercise.

Why should your organization care?

During the training, an interesting observation was made about three organizational thresholds where ITIL practices start to become relevant:

Threshold	Pain Point	What Starts Breaking
≥ 5 people	Memory loss	Knowledge lives in people's heads, not in runbooks
≥ 10 people	Complexity	Coordination overhead grows faster than the team
> 10 people	Compliance	Audits, regulations, and reporting requirements appear

The practical implication: you don't need all of ITIL. A 7-person startup building an internal tool doesn't need a Change Advisory Board. A bank with 200 IT staff probably does. ITIL 4's design philosophy is, understand the principles, apply the practices that solve your actual problems, and leave the rest on the shelf.

The vocabulary

Before we can discuss the framework, we need to agree on definitions. ITIL has precise meanings for words we use casually.

Who's involved?

Term	Definition
Organization	A person or group with its own functions, responsibilities, and authorities
Stakeholder	Any person with an interest in an organization, product, or service
Service Provider	An organization that delivers services to consumers
Service Consumer	An organization or individual that receives services
Customer	Defines requirements and owns the outcomes
User	The person who uses the service day-to-day
Sponsor	The person who authorizes the budget

In practice: the customer decides what they need, the user interacts with it daily, and the sponsor pays for it.

What are we delivering?

Term	Definition
Resource	Any component used to deliver value: people, infrastructure, capital, information
Product	A configuration of resources designed to offer value to a consumer
Service	A means of enabling value co-creation by facilitating outcomes that customers want
Service Offering	A formal description of one or more services, designed for a target consumer group

The value equation

This is arguably the most important concept in ITIL v4:

Value = Utility × Warranty

Utility: What the service does — fit for purpose. Does it support the customer's performance or remove a constraint?
Warranty: How reliably it does it — fit for use. Is it available when needed? Secure? Capable of handling the load?

A service that does the right thing but crashes every Monday morning has no value. A service that's always available but solves the wrong problem also has no value. We need both, because a zero on either side collapses the product.

And value is subjective, contextual, and co-created. The provider alone cannot create value, it's always produced in interaction with the consumer. This shifts the conversation from "we delivered the system" to "did the customer actually get what they needed?"

Who does what — for whom

Click any node to see its ITIL 4 definition

⚙ Service Provider

🧱

Resource

Raw input to build with

↓ assembles into

📦

Product

Configured for value delivery

↓ packaged as

⚡

Service

Enables consumer outcomes

↓ described in

📋

Service Offering

Formal promise to the consumer

→delivers

Utility × Warranty

= Value

co-created
by both sides

requires←

👥 Service Consumer

💰

Sponsor

Authorizes the budget

↓ funds

🎯

Customer

Defines requirements & owns outcomes

↓ specifies for

🖥

User

Uses the service day-to-day

↓ feeds back to

🌐

Stakeholder

Any party with an interest

← Click any node to see its definition

PESTLE: The external forces you cannot control

ITIL v4 acknowledges that organizations don't operate in a vacuum. PESTLE analysis provides a structured lens for scanning the external environment:

Factor	What It Covers	IT Relevance
Political	Government policies, regulations	Data sovereignty laws, public sector mandates
Economic	Budget cycles, cost pressures	Cloud cost optimization, license negotiation
Social	User expectations, workforce trends	Remote work adoption, digital literacy gaps
Technological	Emerging tech, vendor landscape	AI integration, cloud migration, security threats
Legal	Compliance requirements	GDPR, HIPAA, sector-specific regulations
Environmental	Sustainability, green IT	Data center energy, hardware lifecycle

PESTLE is not unique to ITIL, it's a general strategic analysis tool. ITIL v4 borrows it because service management decisions are always made inside an organizational and environmental context. We never want to build a technically excellent system that doesn't survive its first compliance audit or budget cut.

The big picture: Service Value System (SVS)

The Service Value System (SVS) is ITIL v4's model for how an organization creates value. Think of it as the engine diagram for your IT service operations.

It takes opportunities and demands as input, runs them through a system of governance, practices, and activities, and outputs value for all stakeholders.

The SVS is composed of five components:

Guiding Principles — universal recommendations that apply to all decisions
Governance — the framework for directing and controlling the organization
Service Value Chain (SVC) — the set of interconnected activities that turn demand into value
Management Practices — the 34 practices that provide organizational capability
Continual Improvement — the persistent drive to improve everything

ITIL 4 — Service Value System (SVS)

Click any component to learn more

💡

Input

Opportunity

📋

Input

Demand

→

✦ Guiding Principles — universally applicable ✦

🏛

Governance

Direct · Evaluate · Monitor

⛓

Service Value Chain

6 interconnected activities that convert demand into value

PlanEngageDesign & TransitionObtain / BuildDeliver & SupportImprove

🔧

34 Practices

General (14) · Service (17) · Technical (3)

↻ Continual Improvement — embedded at every level ↺

→

✨

Output

Value for all stakeholders

← Click a component above to see its role in the SVS

Guiding Principles

Governance

Service Value Chain

34 Practices

Continual Improvement

NOTA BENE: Guiding Principles and Continual Improvement are not steps in a sequence. They're a mindset that applies to all activities, at all times. You don't "do continual improvement" after you're done. You do it while you're doing everything else.

Inside the SVS: the Service Value Chain (SVC)

The Service Value Chain is the operational model at the core of the SVS. It defines 6 activities that can be combined in different ways. The combination for a given service or request is called a value stream.

Activity	Purpose
Plan	Shared understanding of vision, direction, and current state
Engage	Understand stakeholder needs; maintain transparency and relationships
Design & Transition	Ensure products and services meet stakeholder expectations
Obtain / Build	Ensure service components are available when and where needed
Deliver & Support	Ensure services are delivered and supported per agreed specifications
Improve	Continual improvement across all products, services, and practices

The SVC is not linear. An incident resolution primarily activates Engage, Deliver & Support, and Improve. A new service launch activates all six. A routine service request might only need Engage and Deliver & Support.

Service Value Chain (SVC)

Six interconnected activities — combined differently for each value stream

🗺

Plan

Direction & strategy

↔

🤝

Engage

Stakeholders & needs

↔

📐

Design & Transition

Meet expectations

↔

🔨

Obtain / Build

Components ready

↔

🚀

Deliver & Support

Run & resolve

↔

📈

Improve

Always, for everything

↔ Activities are non-linear: each service interaction combines them differently

← Click an activity above to see what it covers

The 4 dimensions

Every component of the SVS should be considered from 4 perspectives. Ignoring any of them is one of the most reliable ways to make a project fail.

Dimension	What It Covers	Classic Failure Mode
Organizations and People	Org structure, culture, roles, responsibilities	"We deployed the tool but nobody knows how to use it"
Information and Technology	Data, knowledge, tools, infrastructure	"The system works but we can't query the data we need"
Partners and Suppliers	External providers, contracts, integrations	"Our vendor went down and we had no fallback"
Value Streams and Processes	How work flows through the organization	"Everyone's busy but nothing ships"

These 4 dimensions are interdependent. Migrating to a new monitoring tool (Information & Technology) means retraining people (Organizations & People), renegotiating vendor contracts (Partners & Suppliers), and redesigning on-call workflows (Value Streams & Processes).

The 7 guiding principles

These are universal principles that guide decision-making in any circumstance. They apply regardless of which practices you're implementing.

1. Focus on value
Every activity should link back to value for stakeholders. Before starting anything, ask: who benefits, and how? If you can't answer that, stop.

2. Start where you are
Never ever assume you need to start from scratch. Assess what already works before designing something new. Most organizations already have something useful, the goal is to build on it, not erase it.

3. Progress iteratively with feedback
Don't try to design the perfect solution upfront. Deliver in small increments, gather feedback, adjust. This is the same philosophy behind Agile sprints and DevOps Continuous Delivery.

4. Collaborate and promote visibility
Work done in silos fails. Decisions made without the right people in the room fail. Make work visible: kanban boards, incident dashboards, change calendars; and involve stakeholders early, not at the end.

5. Think and work holistically
Services are systems. A change to one component can have unexpected effects elsewhere. Consider end-to-end impact, not just the piece in front of you. Lol!

6. Keep it simple and practical
If a process step doesn't add value, eliminate it. If a tool creates more overhead than it saves, reconsider. Complexity is the enemy of reliability. Where I come from, we say keep it simple and stupid, see here

7. Optimize and automate
First, optimize the process (eliminate waste, reduce friction). Then, automate what remains. Automating a broken process gives you broken results, faster. The order matters.

The 34 management practices

ITIL v4 defines 34 practices designed for performing work or accomplishing an objective. They replace what ITIL v3 called "processes" (the shift reflects that a practice includes culture, tools, and knowledge, not just the steps).

They're organized into 3 categories:

Category	Count	Examples
General management	14	Risk management, knowledge management, project management, continual improvement, organizational change management
Service management	17	Incident management, problem management, change enablement, service desk, service level management
Technical management	3	Deployment management, infrastructure & platform management, software development & management

The full list of 34 ITIL 4 practices exists for reference. No, you do not need to apply all 34 ITIL v4 practices. ITIL v4 is designed to be adopted and adapted. The core philosophy is to use only the practices that address your organization's specific needs, pain points, and goals.

When you start, you need to identify your top 3 pain points (e.g., too many incidents, slow changes) to pick the right practices to implement first.

Here, we are almost starting from zero and we are already overwhelmed to the max. I would like to suggest 4 practices that, if we don't have them, we're operating blind:

IT asset management → What do we actually own?
Service request management → What are our users asking for?
Incident Management → What is broken right now?
Change Enablement → What is changing and why?

The 4 together cover the 4 basic operational questions any IT team needs answered at any moment. Everything else can be phased in.

Incident management

Incident management is the practice responsible for managing the lifecycle of all incidents: unplanned interruptions or reductions in the quality of a service. Before we go further, let's nail the vocabulary:

Term	Definition
Error	A flaw in a component that could cause incidents
Incident	An unplanned interruption or reduction in service quality
Problem	The cause (known or unknown) of one or more incidents
Known error	A problem that has been successfully analyzed, and a workaround or permanent solution has been identified
Root Cause	The fundamental reason behind a problem

These are distinct. An incident is what you respond to at 2am. A problem is what you investigate to prevent the next one. Conflating them is how teams fix the same incident repeatedly without ever addressing what's actually causing it.

The 7-Step incident lifecycle

Step	Action
1	Identification: monitoring alert, user report, or support ticket
2	Logging: full record: timestamp, description, affected service, reporter
3	Categorization: what type of incident? which service/component?
4	Prioritization: how urgent? how impactful?
5	Diagnosis: initial investigation, probable cause
6	Resolution: fix applied, service restored
7	Closure: confirm resolution, document lessons, update knowledge base

Incident Priority Matrix

Priority is determined by Impact (how many users/processes are affected) and Urgency (how quickly the business needs resolution). The matrix:

	High Urgency	Medium Urgency	Low Urgency
High Impact	P1	P2	P3
Medium Impact	P2	P3	P3
Low Impact	P3	P3	P4
Trivial Impact	P4	P4	P4

Priority codes (not to be confused with Biosafety Levels):

P1 (Critical): Production down, major business impact. Immediate response, all hands.
P2 (High): Significant service degradation. Fast response, escalation ready.
P3 (Medium): Partial impact, workaround available. Normal SLA response.
P4 (Low): Minimal impact, cosmetic or edge case. Can be scheduled.

NB: BSL (Biosafety Level) is the standard, modern terminology, while P (P1, P2, P3, P4) is an older designation standing for Pathogen/Protection level.

Escalation

The process within incident and service request management used to transfer issues to higher levels of authority or expertise when they cannot be resolved within normal, front-line support procedures.

When should we escalate ? When the incident exceed SLA timeframes, require higher expertise, or need more management authority.

There are 2 types:

Functional escalation: pass to someone with more technical expertise
Hierarchical escalation: involve management when impact warrants it or when SLA breach is imminent

A crisis is when an incident has business-wide consequences that require executive involvement. Knowing when a P1 becomes a crisis (and who to call) should be documented before the crisis, not during it.

Change enablement

Change Enablement manages the lifecycle of all changes to maximize the probability of success and minimize disruption.

ITIL v4 recognizes 3 change types:

Type	Risk	Authorization	Examples
Standard	Low, well-understood	Pre-authorized	Antivirus update, user account creation
Normal	Variable	Change authority / CAB review	Application deployment, infrastructure change
Emergency	High urgency	Emergency authority (fast-track)	Critical security patch, production hotfix

The CAB (Change Advisory Board)

The CAB is a group that reviews and authorizes significant normal changes. Typical members: IT operations, security, business stakeholders, and the change owner. The CAB doesn't exist to slow things down, it exists to catch problems before they become incidents. It is the organizational mechanism that prevents the finance team from being unable to run payroll on a Friday.

A proper CAB dossier for a normal change includes:

Impact: which services, users, and systems are affected?
Risk: what's the probability of failure, and what's the impact if it fails?
Rollback plan: how do we revert cleanly if the change fails? (Tested rollback, not theoretical)
Post-implementation review: how do we verify success after deployment?

Continual improvement

Continual improvement is both a guiding principle and a dedicated practice in ITIL 4. It ensures the organization continuously improves its services, practices, and the SVS itself.

The Continual Improvement Register (CIR) is the operational tool: a log of all improvement ideas, their status, priority, and outcomes. Think of it as a backlog specifically for operational improvements, not features, not incidents, but how we work.

The 7-Step improvement model

This model is cyclical by design. Reaching step 7 doesn't mean you're done, it means you start over with a new vision.

Step	Question to Answer
1	What is the vision? Align the improvement to business strategy
2	Where are we now? Baseline assessment: metrics, maturity, current state
3	Where do we want to be? Define target state and measurable success criteria
4	How do we get there? Build the improvement plan
5	Take action. Execute iteratively
6	Did we get there? Measure against success criteria
7	How do we keep the momentum? Embed the change, celebrate wins, start the cycle again

Measuring improvement: SLA vs XLA and beyond

Metric	What It Measures
KPI (Key Performance Indicator)	Technical performance against defined targets
SLA (Service Level Agreement)	Contractual service commitments (uptime, response times)
OKR (Objective and Key Result)	Strategic goal achievement across a period
XLA (Experience Level Agreement)	User satisfaction and actual experience quality
CIR	Improvement pipeline health and throughput

SLA vs XLA deserves its own paragraph. An SLA measures what the system does (was uptime ≥ 99.9%?). An XLA measures how the user felt about it. You can meet every SLA and still have furious users, because the system was slow even within the threshold, or the error messages were unhelpful, or the support experience was frustrating. XLAs align IT metrics with actual business outcomes, not just technical compliance.

3 Axes of improvement

Strategic: organization-wide, long-term (e.g., "reach 95% automated change deployment by end of year")
Tactical: team/process level, medium-term (e.g., "reduce MTTR on P1 incidents by 40% this quarter")
Operational: day-to-day (e.g., "update the incident categorization taxonomy this sprint")

Governance and Risk Management

Governance sits at the top of the SVS. It provides direction and ensures the organization stays aligned with its objectives and obligations. Three functions:

Direct: Set strategy, policies, and organizational objectives
Evaluate: Assess organizational performance, risks, and compliance
Monitor: Track adherence to policies and verify outcomes

For service management, governance ensures that the SVS is accountable: decisions are traceable, risks are understood, and exceptions are handled systematically.

Risk Management in ITIL follows a straightforward formula:

Risk = Probability × Impact

Three responses:

Mitigate: reduce probability or impact (implement redundancy, add monitoring, require peer review)
Transfer: shift the risk to another party (insurance, contractual SLAs with vendors, outsourcing)
Accept: acknowledge the risk and decide to live with it (for low probability, low impact scenarios)

Risk isn't inherently bad. The problem is unacknowledged risk, for example: changes deployed without a risk assessment, dependencies unknown until they fail, vulnerabilities known but not documented. Risk management doesn't eliminate uncertainty; it makes it visible and traceable.

ITIL 4 ↔ Agile, DevOps, Lean

One of the major improvements in ITIL v4 over its predecessors is the explicit integration with modern practices. The framework doesn't compete with DevOps or Agile. It provides the governance layer that makes them sustainable at scale.

ITIL 4 Concept	Agile Equivalent	DevOps Equivalent	Lean Equivalent
Service Value System	Product Delivery Flow	CI/CD Pipeline	Value Stream
Continual Improvement	Sprint Retrospective	Blameless Post-mortem	Kaizen
Change Enablement	Sprint Planning	Deployment Pipeline	Flow Management
Incident Management	Bug Sprint	Incident Response	A3 Problem Solving
Service Request	Backlog Item	Work Item	Pull System
SLA / XLA	Definition of Done	SLO / SLI	Quality Standards
CAB Review	Sprint Review	Deployment Gate	Go/No-Go Decision
Problem Management	Root Cause Analysis	Post-mortem	5 Whys
Service Desk	Product Owner (ops)	L1/L2 Support	Andon Cord

The official ITIL 4 and DevOps resources make the integration case explicitly. The short version: ITIL handles governance and accountability; DevOps handles speed and flow. You need both. Speed without governance creates chaos. Governance without speed creates bureaucracy. ITIL v4 was redesigned precisely to not be the bottleneck.

Case study

Here is what ITIL v4 looks like applied to a real IT direction inside a biomedical research center in Senegal. Any resemblance to real persons, living or dead, is purely coincidental.

Part 1 — Service catalog

The official list of what the IT direction provides, to whom, under what terms. It is the operational contract between the IT team and the rest of the organization. Without it, users don't know what to ask for, IT doesn't know what they're accountable for, and incidents pile up in a shared mailbox nobody consistently monitors.

Below: 5 services covering the majority of what a research institute IT direction actually handles day-to-day.

Service	Description	Channel	SLA — Response / Resolution	Responsible
User account provisioning	Creation, modification, or deactivation of accounts (Active Directory, email, VPN, application access). Covers onboarding and offboarding.	ITSM ticketing portal / HR workflow trigger	4h / 1 business day	Sysadmin — Identity & Access team
Workstation support	Hardware diagnosis and repair, OS reinstall, peripheral setup (printers, scanners, external drives). On-site or remote via support session.	Ticketing portal — phone for P1/P2 only	P3: 4h / 2 days — P4: next business day / 5 days	IT Support technician
Application support	Functional support for lab and corporate applications: LIMS, ERP (finance, HR), syndromic surveillance platform, email (M365). Includes access requests and usage questions.	Ticketing portal, categorized by application	P2: 2h / 4h — P3: 4h / 1 day — P4: 1 day / 3 days	Application owner + IT Operations
VPN & remote access	Setup, configuration, and troubleshooting of remote access (VPN client, MFA token). Covers new setups for approved staff and incident resolution for existing connections.	Ticketing portal — escalation via phone if blocking remote work	New setup: 1 day / 2 days — Incident: 2h / 4h	Network team
Backup & data recovery	Scheduled backups of research data, institutional databases, and user file shares. Restore requests for accidental deletions or corruption. Covers file-level and full-system restores.	Ticketing portal — critical restores via phone (P1/P2)	P1 restore: 1h acknowledged / 4h RTO — P3 file restore: 4h / 1 day	Sysadmin — Infrastructure team

Part 2 — RFC & CAB Dossier: LIMS major version upgrade

A Request for Change (RFC) is the formal input to the Change Enablement practice. For a Normal or a Major change, it triggers a full CAB review before any action is taken. Here is what a complete dossier looks like.

Context: Institut Patrick de Kinshasa uses a LIMS (Laboratory Information Management System) to track samples, tests, and results across all its labs. The vendor has released version 4.0, which drops support for the current version 3.x at end of year. The upgrade is not optional, but it touches sample data, active test runs, and integrations with the syndromic surveillance platform.

RFC identification

Field	Value
RFC Number	RFC-2026-014
Title	LIMS major version upgrade: v3.8 → v4.0
Change type	Normal — Major
Requestor	Head of IT Operations
Application owner	Director of Laboratory Services
Submission date	2026-05-03
Requested deployment window	2026-05-23, Saturday 22:00 → Sunday 06:00 (WAT)
CAB review date	2026-05-13

Justification

The current LIMS v3.x reaches vendor end-of-life on 2026-12-31. After that date: no security patches, no bug fixes, no vendor support. Version 4.0 also introduces a native REST API required by the next release of the syndromic surveillance platform. This change is driven by compliance (security posture) and technical dependency (surveillance platform roadmap) simultaneously.

Impact assessment

Dimension	Details
Systems affected	LIMS application servers (lims-prod-01, lims-prod-02), LIMS PostgreSQL database (lims-db-01), integration bridge with surveillance platform
Users affected	~65 lab technicians and researchers across 4 departments: Virology, Bacteriology, Epidemiology, Quality
Business processes affected	Sample reception, test assignment, result entry, result validation, report generation, QC workflows
Downstream services	Surveillance platform reads LIMS results via API — will operate in read-only degraded mode during the window
Duration of impact	LIMS unavailable ~8 hours; all impact contained within the Saturday night window
Data at risk	~120,000 sample records, 3 years of test history. No patient PII in LIMS (samples referenced by anonymous ID).

Risk assessment

Risk	Probability	Impact	Score	Mitigation
DB migration script fails mid-run, leaving schema in inconsistent state	Low	Critical	🔴 High	Full DB snapshot before migration; script tested 3× on staging with a copy of production data
v4.0 breaks custom report templates (Quality dept uses non-standard format)	Medium	Medium	🟡 Medium	All 12 templates validated on staging; Quality dept sign-off obtained before CAB submission
Surveillance platform integration fails after API change	Low	High	🟡 Medium	Integration tested on staging against v4.0 API; surveillance team on standby Sunday 00:30
Deployment overruns window, runs into Monday lab opening at 07:30	Low	High	🟡 Medium	Hard rollback decision point at 04:00 — if not complete by then, rollback regardless of progress
UI changes slow down lab technicians Monday morning	Medium	Low	🟢 Low	30-min walkthrough session Monday 08:00; quick reference card distributed Friday

Implementation plan

Time (WAT)	Step	Responsible	Checkpoint
Sat 21:30	Pre-deployment checklist: verify backup completed, confirm no active test runs, notify on-call lab manager	IT Ops lead	✓ required to proceed
Sat 22:00	Put LIMS in maintenance mode (users see scheduled maintenance banner)	Sysadmin
Sat 22:05	Full PostgreSQL dump to backup server + verify checksum	DBA	✓ required to proceed
Sat 22:30	Stop LIMS application services on both nodes	Sysadmin
Sat 22:35	Run database schema migration script (estimated: 45 min)	DBA	✓ required to proceed
Sat 23:20	Deploy v4.0 application package on lims-prod-01 (primary node)	Sysadmin
Sat 23:40	Run automated smoke test suite against lims-prod-01	IT Ops lead	✓ required to proceed
Sun 00:00	Deploy v4.0 on lims-prod-02, verify cluster synchronization	Sysadmin
Sun 00:30	Test surveillance platform integration against new v4.0 API	Surveillance team	✓ required to proceed
Sun 01:00	Manual validation by on-call lab manager: create test sample, enter result, generate report	Lab manager	✓ required to proceed
Sun 01:30	Remove maintenance mode, monitor application logs for 30 min	IT Ops lead
Sun 02:00	Deployment declared successful — confirmation sent to all stakeholders	IT Ops lead

Hard rollback decision point: Sunday 04:00. If not complete and validated by then, rollback is initiated regardless of progress. No exceptions — the lab opens at 07:30.

Rollback procedure

Triggered if: any required checkpoint fails and cannot be resolved within 30 minutes, OR the 04:00 hard deadline is reached.

Stop all LIMS application services on both nodes
Drop the migrated database: dropdb lims_production
Restore from the pre-migration dump: pg_restore -d lims_production /backup/lims-premig-20260523.dump
Verify integrity: compare row counts for samples, tests, results tables against the pre-migration snapshot
Redeploy the v3.8 package from artifact registry on both nodes
Restart application services, verify cluster health
Remove maintenance mode, run the same smoke test suite
Notify all stakeholders: rollback completed, root cause analysis scheduled Monday

Estimated rollback duration: 45 minutes. The 04:00 decision point leaves 2.5 hours of margin before lab opening.

Communication plan

Audience	Message	When	Channel
All LIMS users	Scheduled maintenance — LIMS unavailable Saturday 22:00 → Sunday 06:00	Friday 2026-05-22, 14:00	Email + in-app banner
Lab managers	Upgrade rationale, v4.0 UI changes, quick reference card attached	Friday 2026-05-22, 14:00	Email
Surveillance team	API changes summary, integration test results, standby request for Sunday 00:30	Thursday 2026-05-21	Direct message + ticket
All stakeholders	Outcome (success or rollback) + next steps	Sunday 2026-05-24, by 06:00	Email

Post-implementation review

Scheduled: Monday 2026-05-25, 10:00 / 30 minutes. Attendees: IT Ops lead, application owner, one representative per affected lab department.

Agenda:

Did deployment complete within window? If not: what caused the overrun?
Were all required checkpoints met? Document any deviations or waivers.
User-reported issues since go-live (collected Monday morning, before the review)
Was rollback triggered? If yes: full root cause analysis before next major change
Were Application Support SLAs met during the post-go-live monitoring period?
Update the LIMS runbook with any new procedures discovered during the upgrade
Close RFC-2026-014 in the ITSM tool; status: Successful or Successful with deviations

The output feeds directly into the Continual Improvement Register: anything harder than expected, any gap in the rollback procedure, any user adoption friction; logged as improvement items before the next major change.

Tooling

No tool enforces ITIL by itself. The process has to exist first, the tool just reduces friction. That said, the right tooling makes the difference between a practice that lives on paper and one that teams actually follow.

A word on scope: ServiceNow is the de facto enterprise standard and covers nearly everything in the table below. It is also expensive, slow to configure, and overkill for most organizations below a few hundred IT staff.

The open source column below deserves serious consideration. GLPI in particular covers incident, change, CMDB, and knowledge management in a single self-hosted package.

	ITSM Platform	Incident Detection & Alerting	Incident Response	Knowledge Management	Change Tracking	Continual Improvement
Suggested tools	ServiceNow, Jira Service Management, Freshservice	PagerDuty, OpsGenie, incident.io	FireHydrant, Rootly	Confluence, Notion, GitBook	Jira, LinearB	Sleuth, Datadog, LinearB
Free & open source	GLPI, iTop, Znuny	Prometheus + Alertmanager, Zabbix	—	Wiki.js, Outline, BookStack	Plane, Gitea / Forgejo	Four Keys (Google)

ITIL v5

We've been taught during the training ITIL v5 was launched in february 2026. I tried to watch this video to have an overview of what is ITIL v5. The only thing I understood was: "The new pace of changes today driven by cloud, ai, digital products, experience-focused services, demand a new way of thinking." ITIL v5 answers that need. Phew!

Conclusion

ITIL v4 is a framework providing a vocabulary and a structure for conversations your organization was probably already having informally: Who decides what gets changed? How do we prioritize when everything is on fire? How do we know we're actually improving?

The most valuable takeaways from this training:

Value is co-created, not delivered. You can deploy perfect infrastructure and still fail if users can't get value from it.
SLA ≠ user happiness. XLAs measure what actually matters to the business.
ITIL v4 and DevOps are complementary. ITIL provides the accountability layer that makes continuous delivery safe at organizational scale.
Adopt and adapt. Apply the practices that solve real problems, and leave the rest on the shelf until you need them.

If your systems go down and nobody knows who to call, what changed yesterday, or how long things have been broken, that's not a technical problem. That's a service management problem. ITIL v4 gives you the vocabulary to fix it.