OR-501WKBuilderDiscover4 hrsRefreshed 2025-09-21
GPT-5 PreparednessClaude 4 ConstitutionalGrok RealtimeGemini PerceptionEdge Ready

Reliability Baseline

T5 - Operations & Reliability

Assess current readiness and risks - Define SLOs and guardrail metrics - Prioritize operations backlog

Instrument, monitor, and govern AI workflows in production.

Key outcomes

  • Assess current readiness and risks
  • Define SLOs and guardrail metrics
  • Prioritize operations backlog

Deliverables

  • Reliability report
  • SLO map
  • Operations backlog

Prerequisites

  • AE-104

Evaluation signals

  • OPS-BASE-001
  • SLO-MAP-001

Persona fit

Delivery LeadAgent EngineerRisk & Governance

Assistant orchestration

View assistant playbook

Scout

Agent

Research horizons, regulation updates, and pattern watchlists.

  • Refresh critical intel within 24 hours of change
  • Maintain 95% citation accuracy
  • Flag module freshness risks automatically

Coach

Agent

Pair with learners during modules, labs, and retrospectives.

  • Median response under 2 seconds
  • Satisfaction above 4.6/5
  • Escalate risky experiments within 10 minutes

Critic

Agent

Guardrails, evaluations, and red-team simulations.

  • Detect 98% evaluation anomalies
  • Zero unlogged high-severity incidents
  • Attach control evidence to every flagged issue

Archivist

Agent

Evidence locker, credential manifests, and knowledge graph links.

  • Tag 100% deliverables with owners and signals
  • Keep schema drift under 1%
  • Generate credential payloads automatically

Companion

Agent

Health, pacing, and personalised nudges across squads.

  • On-time nudges for 90% milestones
  • Keep burnout false positives below 5%
  • Publish weekly sponsor-ready progress snapshots

Navigator

Agent

CTA instrumentation, sponsor digest composition, and mastery guardrails.

  • Cover 95% persona CTAs every sprint
  • Generate sponsor digest drafts within 5 minutes of module completion
  • Hold mastery drift within one tier

Micro lessons

Signal Sweep

25 min

Objective: Collect reliability, incident, and satisfaction signals to benchmark today.

Activities

  • Pull incident metrics
  • Interview support teams
  • Review evaluation results

Knowledge checks

  • What is the current MTTR?
  • Which guardrail breaches happened last quarter?

SLO Definition Workshop

30 min

Objective: Translate business outcomes into service level objectives and error budgets.

Activities

  • Map user journeys
  • Draft SLO statements
  • Agree on error budget policy

Knowledge checks

  • Which journey has the tightest SLO?
  • What triggers an error budget alert?

Ops Backlog Prioritisation

20 min

Objective: Rank reliability investments using impact, urgency, and effort.

Activities

  • Score backlog items
  • Align owners and timing
  • Log decisions for sponsors

Knowledge checks

  • Which item mitigates the highest risk?
  • Who owns the next action?

Knowledge points

Reliability Readiness Checklist

Assess monitoring, guardrails, on-call, and retros against AI operations best practices.

SLO Playbook

Define SLOs, SLIs, SLAs, and escalation policies aligned with business goals.

Micro paths featuring this module

LaunchAgent Engineer

Stand up a dependable coding agent that ships quality pull requests.

Day 1 - Frame and scope
AE-101AE-102
Day 2 - Prototype loop
AE-103RP-201
Day 3 - Guardrails
AE-104OR-501
Day 4 - Demo narrative
CC-401
Day 5 - Handoff
CC-404OR-503
Launch micro path
OperateDelivery Lead

Instrument operations, run drills, and keep sponsors informed.

Day 1 - Baseline
OR-501
Day 2 - Observability
OR-502OR-503
Day 3 - Runbooks
OR-504
Day 4 - Communications
CC-404
Day 5 - Sponsor digest
LS-603
Launch micro path

Credential alignment

This module contributes evidence across multiple credentials. See the credential framework for details.

Primary documentation