
Disclaimers
Opinions expressed in this post (and in any of all my posts) are solely, unless otherwise specified, those of the author, me. Those opinions absolutely do not reflect the views, policies, positions of any organizations, employers, or affiliated groups.
If you are a colleague and recognize the situation described below, remember: I'm critiquing the system, not the people.
Render unto Caesar what is Caesar's: Claude helped me find up-to-date documentation links and helped beautify all the diagrams used (and not used) in this article.
Hook
I tend to stay in my lane. Focus on what I understand, the perimeter I actually control. Everything beyond my code, my systems, my terminal (the hierarchy, management, all the people I work with who don't write programs) has always been a blur to me. Vague, formless, abstract. Sometimes genuinely incomprehensible. Not because I think those people are wrong, but because whatever they do has always felt like noise I couldn't parse.
I'll grudgingly admit, though: since joining this biomedical research center, everything that happens outside my direct work has become the most interesting part of the job. Humans are fascinating creatures to observe from a safe distance. Maybe it's because my actual technical work is no longer the hardest thing on my plate. Or maybe because what happens in the corridors and meeting rooms has a direct, outsized impact on my work, on my freedom to do that work the way I'd want to.
The hardest thing is collaborating with many other people, exchanging information in a way that everyone actually understands each other. Not just hears: understands. The hardest thing is listening, really listening to what others have to say, deeply, absorbing and retaining what they meant. Then responding, expressing yourself in a way that gets through, in a way that convinces. Making the right compromises, the right calls, and moving forward together, making sure everyone gets something of what they came for.
Last Tuesday, I was enrolled in an ITIL 4 Foundations training organized by my hierarchy. The stated goals were clear: garantir la disponibilité des systèmes (ensure system availability), structurer la gestion des incidents (structure incident management), and parler le même langage (speak the same language). Perfectly reasonable goals.
In practice: sitting in a room while someone talks through concept after concept, deploying words and definitions at high velocity, is its own endurance sport. It's made harder by the fact that some concepts I thought I already knew get redefined with vocabulary I would never have chosen — and some familiar words get used in ways that are subtly but persistently disorienting.
That said, I have to be honest: ITIL 4 is not the bureaucratic nightmare its reputation suggests. If anything, I realized I'd already been practicing most of these concepts; informally, imperfectly, without knowing there was a name for them. Done right, this is a coherent framework that actually acknowledges modern realities: Agile, DevOps, SRE, Lean.
Since I took notes anyway, might as well turn them into something useful. I hope you'll learn something. Music \o/
Table of contents
- What is itil 4?
- Why should your organization care?
- The vocabulary
- Pestle: external forces
- The service value system (svs)
- The service value chain (svc)
- The 4 dimensions
- The 7 guiding principles
- The 34 practices
- Incident management
- Change enablement
- Continual improvement
- Governance and risk management
- ITIL 4 ↔ agile, devops, lean
- Case study
- Tooling
- ITIL v5
- Conclusion
What is ITIL 4?
ITIL (Information Technology Infrastructure Library) is a framework for IT Service Management (ITSM). It provides guidance on how organizations can use IT as a tool to facilitate business transformation and growth.
Key facts:
- Origin: Developed in the 1980s by the UK government's CCTA (Central Computer and Telecommunications Agency) to standardize IT practices across public services.
- ITIL 4 (released 2019): The current version. A major rethink compared to ITIL v3, designed to integrate with modern approaches: Agile, DevOps, Lean, SRE.
- Owner: Currently maintained by PeopleCert, which acquired AXELOS (the previous owner) in 2021.
- Who uses it? Organizations of all sizes, across industries. Particularly common in banking, telecom, healthcare, public sector, and large IT departments.
One thing ITIL 4 gets right that its predecessors didn't: it explicitly acknowledges that you don't need to implement all 34 practices. You pick what fits. This was not how earlier versions were often sold and it caused a lot of unnecessary process theater in organizations that implemented ITIL like a checkbox exercise.
Why should your organization care?
During the training, an interesting observation was made about three organizational thresholds where ITIL practices start to become relevant:
| Threshold | Pain Point | What Starts Breaking |
|---|---|---|
| ≥ 5 people | Memory loss | Knowledge lives in people's heads, not in runbooks |
| ≥ 10 people | Complexity | Coordination overhead grows faster than the team |
| > 10 people | Compliance | Audits, regulations, and reporting requirements appear |
The practical implication: you don't need all of ITIL. A 7-person startup building an internal tool doesn't need a Change Advisory Board. A bank with 200 IT staff probably does. ITIL 4's design philosophy is, understand the principles, apply the practices that solve your actual problems, and leave the rest on the shelf.
The vocabulary
Before we can discuss the framework, we need to agree on definitions. ITIL has precise meanings for words we use casually.
Who's involved?
| Term | Definition |
|---|---|
| Organization | A person or group with its own functions, responsibilities, and authorities |
| Stakeholder | Any person with an interest in an organization, product, or service |
| Service Provider | An organization that delivers services to consumers |
| Service Consumer | An organization or individual that receives services |
| Customer | Defines requirements and owns the outcomes |
| User | The person who uses the service day-to-day |
| Sponsor | The person who authorizes the budget |
In practice: the customer decides what they need, the user interacts with it daily, and the sponsor pays for it.
What are we delivering?
| Term | Definition |
|---|---|
| Resource | Any component used to deliver value: people, infrastructure, capital, information |
| Product | A configuration of resources designed to offer value to a consumer |
| Service | A means of enabling value co-creation by facilitating outcomes that customers want |
| Service Offering | A formal description of one or more services, designed for a target consumer group |
The value equation
This is arguably the most important concept in ITIL v4:
Value = Utility × Warranty
- Utility: What the service does — fit for purpose. Does it support the customer's performance or remove a constraint?
- Warranty: How reliably it does it — fit for use. Is it available when needed? Secure? Capable of handling the load?
A service that does the right thing but crashes every Monday morning has no value. A service that's always available but solves the wrong problem also has no value. We need both, because a zero on either side collapses the product.
And value is subjective, contextual, and co-created. The provider alone cannot create value, it's always produced in interaction with the consumer. This shifts the conversation from "we delivered the system" to "did the customer actually get what they needed?"
Who does what — for whom
Click any node to see its ITIL 4 definition
by both sides
PESTLE: The external forces you cannot control
ITIL v4 acknowledges that organizations don't operate in a vacuum. PESTLE analysis provides a structured lens for scanning the external environment:
| Factor | What It Covers | IT Relevance |
|---|---|---|
| Political | Government policies, regulations | Data sovereignty laws, public sector mandates |
| Economic | Budget cycles, cost pressures | Cloud cost optimization, license negotiation |
| Social | User expectations, workforce trends | Remote work adoption, digital literacy gaps |
| Technological | Emerging tech, vendor landscape | AI integration, cloud migration, security threats |
| Legal | Compliance requirements | GDPR, HIPAA, sector-specific regulations |
| Environmental | Sustainability, green IT | Data center energy, hardware lifecycle |

PESTLE is not unique to ITIL, it's a general strategic analysis tool. ITIL v4 borrows it because service management decisions are always made inside an organizational and environmental context. We never want to build a technically excellent system that doesn't survive its first compliance audit or budget cut.
The big picture: Service Value System (SVS)
The Service Value System (SVS) is ITIL v4's model for how an organization creates value. Think of it as the engine diagram for your IT service operations.
It takes opportunities and demands as input, runs them through a system of governance, practices, and activities, and outputs value for all stakeholders.
The SVS is composed of five components:
- Guiding Principles — universal recommendations that apply to all decisions
- Governance — the framework for directing and controlling the organization
- Service Value Chain (SVC) — the set of interconnected activities that turn demand into value
- Management Practices — the 34 practices that provide organizational capability
- Continual Improvement — the persistent drive to improve everything
ITIL 4 — Service Value System (SVS)
Click any component to learn more
NOTA BENE: Guiding Principles and Continual Improvement are not steps in a sequence. They're a mindset that applies to all activities, at all times. You don't "do continual improvement" after you're done. You do it while you're doing everything else.
Inside the SVS: the Service Value Chain (SVC)
The Service Value Chain is the operational model at the core of the SVS. It defines 6 activities that can be combined in different ways. The combination for a given service or request is called a value stream.
| Activity | Purpose |
|---|---|
| Plan | Shared understanding of vision, direction, and current state |
| Engage | Understand stakeholder needs; maintain transparency and relationships |
| Design & Transition | Ensure products and services meet stakeholder expectations |
| Obtain / Build | Ensure service components are available when and where needed |
| Deliver & Support | Ensure services are delivered and supported per agreed specifications |
| Improve | Continual improvement across all products, services, and practices |
The SVC is not linear. An incident resolution primarily activates Engage, Deliver & Support, and Improve. A new service launch activates all six. A routine service request might only need Engage and Deliver & Support.
Service Value Chain (SVC)
Six interconnected activities — combined differently for each value stream
The 4 dimensions
Every component of the SVS should be considered from 4 perspectives. Ignoring any of them is one of the most reliable ways to make a project fail.
| Dimension | What It Covers | Classic Failure Mode |
|---|---|---|
| Organizations and People | Org structure, culture, roles, responsibilities | "We deployed the tool but nobody knows how to use it" |
| Information and Technology | Data, knowledge, tools, infrastructure | "The system works but we can't query the data we need" |
| Partners and Suppliers | External providers, contracts, integrations | "Our vendor went down and we had no fallback" |
| Value Streams and Processes | How work flows through the organization | "Everyone's busy but nothing ships" |
These 4 dimensions are interdependent. Migrating to a new monitoring tool (Information & Technology) means retraining people (Organizations & People), renegotiating vendor contracts (Partners & Suppliers), and redesigning on-call workflows (Value Streams & Processes).
The 7 guiding principles
These are universal principles that guide decision-making in any circumstance. They apply regardless of which practices you're implementing.
1. Focus on value
Every activity should link back to value for stakeholders. Before starting anything, ask: who benefits, and how? If you can't answer that, stop.
2. Start where you are
Never ever assume you need to start from scratch. Assess what already works before designing something new. Most organizations already have something useful, the goal is to build on it, not erase it.
3. Progress iteratively with feedback
Don't try to design the perfect solution upfront. Deliver in small increments, gather feedback, adjust. This is the same philosophy behind Agile sprints and DevOps Continuous Delivery.
4. Collaborate and promote visibility
Work done in silos fails. Decisions made without the right people in the room fail. Make work visible: kanban boards, incident dashboards, change calendars; and involve stakeholders early, not at the end.
5. Think and work holistically
Services are systems. A change to one component can have unexpected effects elsewhere. Consider end-to-end impact, not just the piece in front of you. Lol!
6. Keep it simple and practical
If a process step doesn't add value, eliminate it. If a tool creates more overhead than it saves, reconsider. Complexity is the enemy of reliability. Where I come from, we say keep it simple and stupid, see here
7. Optimize and automate
First, optimize the process (eliminate waste, reduce friction). Then, automate what remains. Automating a broken process gives you broken results, faster. The order matters.
The 34 management practices
ITIL v4 defines 34 practices designed for performing work or accomplishing an objective. They replace what ITIL v3 called "processes" (the shift reflects that a practice includes culture, tools, and knowledge, not just the steps).
They're organized into 3 categories:
| Category | Count | Examples |
|---|---|---|
| General management | 14 | Risk management, knowledge management, project management, continual improvement, organizational change management |
| Service management | 17 | Incident management, problem management, change enablement, service desk, service level management |
| Technical management | 3 | Deployment management, infrastructure & platform management, software development & management |
The full list of 34 ITIL 4 practices exists for reference. No, you do not need to apply all 34 ITIL v4 practices. ITIL v4 is designed to be adopted and adapted. The core philosophy is to use only the practices that address your organization's specific needs, pain points, and goals.
When you start, you need to identify your top 3 pain points (e.g., too many incidents, slow changes) to pick the right practices to implement first.
Here, we are almost starting from zero and we are already overwhelmed to the max. I would like to suggest 4 practices that, if we don't have them, we're operating blind:
- IT asset management → What do we actually own?
- Service request management → What are our users asking for?
- Incident Management → What is broken right now?
- Change Enablement → What is changing and why?
The 4 together cover the 4 basic operational questions any IT team needs answered at any moment. Everything else can be phased in.
Incident management
Incident management is the practice responsible for managing the lifecycle of all incidents: unplanned interruptions or reductions in the quality of a service. Before we go further, let's nail the vocabulary:
| Term | Definition |
|---|---|
| Error | A flaw in a component that could cause incidents |
| Incident | An unplanned interruption or reduction in service quality |
| Problem | The cause (known or unknown) of one or more incidents |
| Known error | A problem that has been successfully analyzed, and a workaround or permanent solution has been identified |
| Root Cause | The fundamental reason behind a problem |
These are distinct. An incident is what you respond to at 2am. A problem is what you investigate to prevent the next one. Conflating them is how teams fix the same incident repeatedly without ever addressing what's actually causing it.
The 7-Step incident lifecycle
| Step | Action |
|---|---|
| 1 | Identification: monitoring alert, user report, or support ticket |
| 2 | Logging: full record: timestamp, description, affected service, reporter |
| 3 | Categorization: what type of incident? which service/component? |
| 4 | Prioritization: how urgent? how impactful? |
| 5 | Diagnosis: initial investigation, probable cause |
| 6 | Resolution: fix applied, service restored |
| 7 | Closure: confirm resolution, document lessons, update knowledge base |
Incident Priority Matrix
Priority is determined by Impact (how many users/processes are affected) and Urgency (how quickly the business needs resolution). The matrix:
| High Urgency | Medium Urgency | Low Urgency | |
|---|---|---|---|
| High Impact | P1 | P2 | P3 |
| Medium Impact | P2 | P3 | P3 |
| Low Impact | P3 | P3 | P4 |
| Trivial Impact | P4 | P4 | P4 |
Priority codes (not to be confused with Biosafety Levels):
- P1 (Critical): Production down, major business impact. Immediate response, all hands.
- P2 (High): Significant service degradation. Fast response, escalation ready.
- P3 (Medium): Partial impact, workaround available. Normal SLA response.
- P4 (Low): Minimal impact, cosmetic or edge case. Can be scheduled.
NB: BSL (Biosafety Level) is the standard, modern terminology, while P (P1, P2, P3, P4) is an older designation standing for Pathogen/Protection level.
Escalation
The process within incident and service request management used to transfer issues to higher levels of authority or expertise when they cannot be resolved within normal, front-line support procedures.
When should we escalate ? When the incident exceed SLA timeframes, require higher expertise, or need more management authority.
There are 2 types:
- Functional escalation: pass to someone with more technical expertise
- Hierarchical escalation: involve management when impact warrants it or when SLA breach is imminent
A crisis is when an incident has business-wide consequences that require executive involvement. Knowing when a P1 becomes a crisis (and who to call) should be documented before the crisis, not during it.
Change enablement
Change Enablement manages the lifecycle of all changes to maximize the probability of success and minimize disruption.
ITIL v4 recognizes 3 change types:
| Type | Risk | Authorization | Examples |
|---|---|---|---|
| Standard | Low, well-understood | Pre-authorized | Antivirus update, user account creation |
| Normal | Variable | Change authority / CAB review | Application deployment, infrastructure change |
| Emergency | High urgency | Emergency authority (fast-track) | Critical security patch, production hotfix |
The CAB (Change Advisory Board)
The CAB is a group that reviews and authorizes significant normal changes. Typical members: IT operations, security, business stakeholders, and the change owner. The CAB doesn't exist to slow things down, it exists to catch problems before they become incidents. It is the organizational mechanism that prevents the finance team from being unable to run payroll on a Friday.
A proper CAB dossier for a normal change includes:
- Impact: which services, users, and systems are affected?
- Risk: what's the probability of failure, and what's the impact if it fails?
- Rollback plan: how do we revert cleanly if the change fails? (Tested rollback, not theoretical)
- Post-implementation review: how do we verify success after deployment?
Continual improvement
Continual improvement is both a guiding principle and a dedicated practice in ITIL 4. It ensures the organization continuously improves its services, practices, and the SVS itself.
The Continual Improvement Register (CIR) is the operational tool: a log of all improvement ideas, their status, priority, and outcomes. Think of it as a backlog specifically for operational improvements, not features, not incidents, but how we work.
The 7-Step improvement model
This model is cyclical by design. Reaching step 7 doesn't mean you're done, it means you start over with a new vision.
| Step | Question to Answer |
|---|---|
| 1 | What is the vision? Align the improvement to business strategy |
| 2 | Where are we now? Baseline assessment: metrics, maturity, current state |
| 3 | Where do we want to be? Define target state and measurable success criteria |
| 4 | How do we get there? Build the improvement plan |
| 5 | Take action. Execute iteratively |
| 6 | Did we get there? Measure against success criteria |
| 7 | How do we keep the momentum? Embed the change, celebrate wins, start the cycle again |
Measuring improvement: SLA vs XLA and beyond
| Metric | What It Measures |
|---|---|
| KPI (Key Performance Indicator) | Technical performance against defined targets |
| SLA (Service Level Agreement) | Contractual service commitments (uptime, response times) |
| OKR (Objective and Key Result) | Strategic goal achievement across a period |
| XLA (Experience Level Agreement) | User satisfaction and actual experience quality |
| CIR | Improvement pipeline health and throughput |
SLA vs XLA deserves its own paragraph. An SLA measures what the system does (was uptime ≥ 99.9%?). An XLA measures how the user felt about it. You can meet every SLA and still have furious users, because the system was slow even within the threshold, or the error messages were unhelpful, or the support experience was frustrating. XLAs align IT metrics with actual business outcomes, not just technical compliance.
3 Axes of improvement
- Strategic: organization-wide, long-term (e.g., "reach 95% automated change deployment by end of year")
- Tactical: team/process level, medium-term (e.g., "reduce MTTR on P1 incidents by 40% this quarter")
- Operational: day-to-day (e.g., "update the incident categorization taxonomy this sprint")
Governance and Risk Management
Governance sits at the top of the SVS. It provides direction and ensures the organization stays aligned with its objectives and obligations. Three functions:
- Direct: Set strategy, policies, and organizational objectives
- Evaluate: Assess organizational performance, risks, and compliance
- Monitor: Track adherence to policies and verify outcomes
For service management, governance ensures that the SVS is accountable: decisions are traceable, risks are understood, and exceptions are handled systematically.
Risk Management in ITIL follows a straightforward formula:
Risk = Probability × Impact
Three responses:
- Mitigate: reduce probability or impact (implement redundancy, add monitoring, require peer review)
- Transfer: shift the risk to another party (insurance, contractual SLAs with vendors, outsourcing)
- Accept: acknowledge the risk and decide to live with it (for low probability, low impact scenarios)
Risk isn't inherently bad. The problem is unacknowledged risk, for example: changes deployed without a risk assessment, dependencies unknown until they fail, vulnerabilities known but not documented. Risk management doesn't eliminate uncertainty; it makes it visible and traceable.
ITIL 4 ↔ Agile, DevOps, Lean
One of the major improvements in ITIL v4 over its predecessors is the explicit integration with modern practices. The framework doesn't compete with DevOps or Agile. It provides the governance layer that makes them sustainable at scale.
| ITIL 4 Concept | Agile Equivalent | DevOps Equivalent | Lean Equivalent |
|---|---|---|---|
| Service Value System | Product Delivery Flow | CI/CD Pipeline | Value Stream |
| Continual Improvement | Sprint Retrospective | Blameless Post-mortem | Kaizen |
| Change Enablement | Sprint Planning | Deployment Pipeline | Flow Management |
| Incident Management | Bug Sprint | Incident Response | A3 Problem Solving |
| Service Request | Backlog Item | Work Item | Pull System |
| SLA / XLA | Definition of Done | SLO / SLI | Quality Standards |
| CAB Review | Sprint Review | Deployment Gate | Go/No-Go Decision |
| Problem Management | Root Cause Analysis | Post-mortem | 5 Whys |
| Service Desk | Product Owner (ops) | L1/L2 Support | Andon Cord |
The official ITIL 4 and DevOps resources make the integration case explicitly. The short version: ITIL handles governance and accountability; DevOps handles speed and flow. You need both. Speed without governance creates chaos. Governance without speed creates bureaucracy. ITIL v4 was redesigned precisely to not be the bottleneck.
Case study
Here is what ITIL v4 looks like applied to a real IT direction inside a biomedical research center in Senegal. Any resemblance to real persons, living or dead, is purely coincidental.
Part 1 — Service catalog
The official list of what the IT direction provides, to whom, under what terms. It is the operational contract between the IT team and the rest of the organization. Without it, users don't know what to ask for, IT doesn't know what they're accountable for, and incidents pile up in a shared mailbox nobody consistently monitors.
Below: 5 services covering the majority of what a research institute IT direction actually handles day-to-day.
| Service | Description | Channel | SLA — Response / Resolution | Responsible |
|---|---|---|---|---|
| User account provisioning | Creation, modification, or deactivation of accounts (Active Directory, email, VPN, application access). Covers onboarding and offboarding. | ITSM ticketing portal / HR workflow trigger | 4h / 1 business day | Sysadmin — Identity & Access team |
| Workstation support | Hardware diagnosis and repair, OS reinstall, peripheral setup (printers, scanners, external drives). On-site or remote via support session. | Ticketing portal — phone for P1/P2 only | P3: 4h / 2 days — P4: next business day / 5 days | IT Support technician |
| Application support | Functional support for lab and corporate applications: LIMS, ERP (finance, HR), syndromic surveillance platform, email (M365). Includes access requests and usage questions. | Ticketing portal, categorized by application | P2: 2h / 4h — P3: 4h / 1 day — P4: 1 day / 3 days | Application owner + IT Operations |
| VPN & remote access | Setup, configuration, and troubleshooting of remote access (VPN client, MFA token). Covers new setups for approved staff and incident resolution for existing connections. | Ticketing portal — escalation via phone if blocking remote work | New setup: 1 day / 2 days — Incident: 2h / 4h | Network team |
| Backup & data recovery | Scheduled backups of research data, institutional databases, and user file shares. Restore requests for accidental deletions or corruption. Covers file-level and full-system restores. | Ticketing portal — critical restores via phone (P1/P2) | P1 restore: 1h acknowledged / 4h RTO — P3 file restore: 4h / 1 day | Sysadmin — Infrastructure team |
Part 2 — RFC & CAB Dossier: LIMS major version upgrade
A Request for Change (RFC) is the formal input to the Change Enablement practice. For a Normal or a Major change, it triggers a full CAB review before any action is taken. Here is what a complete dossier looks like.
Context: Institut Patrick de Kinshasa uses a LIMS (Laboratory Information Management System) to track samples, tests, and results across all its labs. The vendor has released version 4.0, which drops support for the current version 3.x at end of year. The upgrade is not optional, but it touches sample data, active test runs, and integrations with the syndromic surveillance platform.
RFC identification
| Field | Value |
|---|---|
| RFC Number | RFC-2026-014 |
| Title | LIMS major version upgrade: v3.8 → v4.0 |
| Change type | Normal — Major |
| Requestor | Head of IT Operations |
| Application owner | Director of Laboratory Services |
| Submission date | 2026-05-03 |
| Requested deployment window | 2026-05-23, Saturday 22:00 → Sunday 06:00 (WAT) |
| CAB review date | 2026-05-13 |
Justification
The current LIMS v3.x reaches vendor end-of-life on 2026-12-31. After that date: no security patches, no bug fixes, no vendor support. Version 4.0 also introduces a native REST API required by the next release of the syndromic surveillance platform. This change is driven by compliance (security posture) and technical dependency (surveillance platform roadmap) simultaneously.
Impact assessment
| Dimension | Details |
|---|---|
| Systems affected | LIMS application servers (lims-prod-01, lims-prod-02), LIMS PostgreSQL database (lims-db-01), integration bridge with surveillance platform |
| Users affected | ~65 lab technicians and researchers across 4 departments: Virology, Bacteriology, Epidemiology, Quality |
| Business processes affected | Sample reception, test assignment, result entry, result validation, report generation, QC workflows |
| Downstream services | Surveillance platform reads LIMS results via API — will operate in read-only degraded mode during the window |
| Duration of impact | LIMS unavailable ~8 hours; all impact contained within the Saturday night window |
| Data at risk | ~120,000 sample records, 3 years of test history. No patient PII in LIMS (samples referenced by anonymous ID). |
Risk assessment
| Risk | Probability | Impact | Score | Mitigation |
|---|---|---|---|---|
| DB migration script fails mid-run, leaving schema in inconsistent state | Low | Critical | 🔴 High | Full DB snapshot before migration; script tested 3× on staging with a copy of production data |
| v4.0 breaks custom report templates (Quality dept uses non-standard format) | Medium | Medium | 🟡 Medium | All 12 templates validated on staging; Quality dept sign-off obtained before CAB submission |
| Surveillance platform integration fails after API change | Low | High | 🟡 Medium | Integration tested on staging against v4.0 API; surveillance team on standby Sunday 00:30 |
| Deployment overruns window, runs into Monday lab opening at 07:30 | Low | High | 🟡 Medium | Hard rollback decision point at 04:00 — if not complete by then, rollback regardless of progress |
| UI changes slow down lab technicians Monday morning | Medium | Low | 🟢 Low | 30-min walkthrough session Monday 08:00; quick reference card distributed Friday |
Implementation plan
| Time (WAT) | Step | Responsible | Checkpoint |
|---|---|---|---|
| Sat 21:30 | Pre-deployment checklist: verify backup completed, confirm no active test runs, notify on-call lab manager | IT Ops lead | ✓ required to proceed |
| Sat 22:00 | Put LIMS in maintenance mode (users see scheduled maintenance banner) | Sysadmin | |
| Sat 22:05 | Full PostgreSQL dump to backup server + verify checksum | DBA | ✓ required to proceed |
| Sat 22:30 | Stop LIMS application services on both nodes | Sysadmin | |
| Sat 22:35 | Run database schema migration script (estimated: 45 min) | DBA | ✓ required to proceed |
| Sat 23:20 | Deploy v4.0 application package on lims-prod-01 (primary node) | Sysadmin | |
| Sat 23:40 | Run automated smoke test suite against lims-prod-01 | IT Ops lead | ✓ required to proceed |
| Sun 00:00 | Deploy v4.0 on lims-prod-02, verify cluster synchronization | Sysadmin | |
| Sun 00:30 | Test surveillance platform integration against new v4.0 API | Surveillance team | ✓ required to proceed |
| Sun 01:00 | Manual validation by on-call lab manager: create test sample, enter result, generate report | Lab manager | ✓ required to proceed |
| Sun 01:30 | Remove maintenance mode, monitor application logs for 30 min | IT Ops lead | |
| Sun 02:00 | Deployment declared successful — confirmation sent to all stakeholders | IT Ops lead |
Hard rollback decision point: Sunday 04:00. If not complete and validated by then, rollback is initiated regardless of progress. No exceptions — the lab opens at 07:30.
Rollback procedure
Triggered if: any required checkpoint fails and cannot be resolved within 30 minutes, OR the 04:00 hard deadline is reached.
- Stop all LIMS application services on both nodes
- Drop the migrated database:
dropdb lims_production - Restore from the pre-migration dump:
pg_restore -d lims_production /backup/lims-premig-20260523.dump - Verify integrity: compare row counts for
samples,tests,resultstables against the pre-migration snapshot - Redeploy the v3.8 package from artifact registry on both nodes
- Restart application services, verify cluster health
- Remove maintenance mode, run the same smoke test suite
- Notify all stakeholders: rollback completed, root cause analysis scheduled Monday
Estimated rollback duration: 45 minutes. The 04:00 decision point leaves 2.5 hours of margin before lab opening.
Communication plan
| Audience | Message | When | Channel |
|---|---|---|---|
| All LIMS users | Scheduled maintenance — LIMS unavailable Saturday 22:00 → Sunday 06:00 | Friday 2026-05-22, 14:00 | Email + in-app banner |
| Lab managers | Upgrade rationale, v4.0 UI changes, quick reference card attached | Friday 2026-05-22, 14:00 | |
| Surveillance team | API changes summary, integration test results, standby request for Sunday 00:30 | Thursday 2026-05-21 | Direct message + ticket |
| All stakeholders | Outcome (success or rollback) + next steps | Sunday 2026-05-24, by 06:00 |
Post-implementation review
Scheduled: Monday 2026-05-25, 10:00 / 30 minutes. Attendees: IT Ops lead, application owner, one representative per affected lab department.
Agenda:
- Did deployment complete within window? If not: what caused the overrun?
- Were all required checkpoints met? Document any deviations or waivers.
- User-reported issues since go-live (collected Monday morning, before the review)
- Was rollback triggered? If yes: full root cause analysis before next major change
- Were Application Support SLAs met during the post-go-live monitoring period?
- Update the LIMS runbook with any new procedures discovered during the upgrade
- Close RFC-2026-014 in the ITSM tool; status: Successful or Successful with deviations
The output feeds directly into the Continual Improvement Register: anything harder than expected, any gap in the rollback procedure, any user adoption friction; logged as improvement items before the next major change.
Tooling
No tool enforces ITIL by itself. The process has to exist first, the tool just reduces friction. That said, the right tooling makes the difference between a practice that lives on paper and one that teams actually follow.
A word on scope: ServiceNow is the de facto enterprise standard and covers nearly everything in the table below. It is also expensive, slow to configure, and overkill for most organizations below a few hundred IT staff.
The open source column below deserves serious consideration. GLPI in particular covers incident, change, CMDB, and knowledge management in a single self-hosted package.
| ITSM Platform | Incident Detection & Alerting | Incident Response | Knowledge Management | Change Tracking | Continual Improvement | |
|---|---|---|---|---|---|---|
| Suggested tools | ServiceNow, Jira Service Management, Freshservice | PagerDuty, OpsGenie, incident.io | FireHydrant, Rootly | Confluence, Notion, GitBook | Jira, LinearB | Sleuth, Datadog, LinearB |
| Free & open source | GLPI, iTop, Znuny | Prometheus + Alertmanager, Zabbix | — | Wiki.js, Outline, BookStack | Plane, Gitea / Forgejo | Four Keys (Google) |
ITIL v5
We've been taught during the training ITIL v5 was launched in february 2026. I tried to watch this video to have an overview of what is ITIL v5. The only thing I understood was: "The new pace of changes today driven by cloud, ai, digital products, experience-focused services, demand a new way of thinking." ITIL v5 answers that need. Phew!
Conclusion
ITIL v4 is a framework providing a vocabulary and a structure for conversations your organization was probably already having informally: Who decides what gets changed? How do we prioritize when everything is on fire? How do we know we're actually improving?
The most valuable takeaways from this training:
- Value is co-created, not delivered. You can deploy perfect infrastructure and still fail if users can't get value from it.
- SLA ≠ user happiness. XLAs measure what actually matters to the business.
- ITIL v4 and DevOps are complementary. ITIL provides the accountability layer that makes continuous delivery safe at organizational scale.
- Adopt and adapt. Apply the practices that solve real problems, and leave the rest on the shelf until you need them.
If your systems go down and nobody knows who to call, what changed yesterday, or how long things have been broken, that's not a technical problem. That's a service management problem. ITIL v4 gives you the vocabulary to fix it.
Congratulations, you made it to the end. If you enjoyed this, please like, share & subscriiiibe. You might also like my amazing post about AWS fundamentals or how I'm helping save lives in West Africa using free software.
This article is a summary of a three-day training course. Impossible to explore all the aspects of ITIL v4 in such a short post. Here are some useful links to lean more:
- ITIL 4 official certification path
- ITIL 4 Guiding Principles explained
- 34 ITIL 4 management practices
- What are XLAs and how do you use them?
- The Phoenix project
- Biosafety Levels
Even more: