Skip to content

Monitoring Framework

Project: Sable AI Ltd — AI Governance Framework Stage: Stage 4 — Monitoring & Operational Controls Status: Draft Version: v1 Date: 2026-03-01 Assumptions: Built on outline assumptions — not verified against real Sable AI Ltd data


1. Purpose and Scope

This document establishes the operational monitoring framework for Scout, Sable AI Ltd's AI-powered CV screening and candidate shortlisting tool [ASSUMPTION A-002]. It defines the key performance indicators, measurement methods, alert thresholds, and governance processes required to ensure Scout continues to operate accurately, fairly, and in compliance with UK GDPR, Equality Act 2010, and ICO expectations after deployment.

Monitoring is a legal obligation, not an optional operational practice. UK GDPR Article 5(1)(a)–(e) requires that personal data processing remains lawful, fair, accurate, and limited throughout the processing lifecycle. The ICO's AI in Recruitment Outcomes Report (November 2024) confirms that the ICO actively assesses whether AI providers have "regularly monitored accuracy and bias and swiftly addressed issues throughout the AI lifecycle." The DSIT Responsible AI in Recruitment Guide (March 2024) further confirms that continuous monitoring is required to detect model drift, errors, and bias emerging during live operation.

Scope: This framework covers all processing of candidate personal data by Scout, including CV ingestion, Anthropic Claude API processing [ASSUMPTION A-005], structured output generation, and human reviewer interface interactions [ASSUMPTION A-007]. It applies to Sable AI Ltd as data processor and to Sable AI Ltd's customers who are controllers.


2. Regulatory Basis

Obligation Source Relevance to Scout
Ongoing accuracy and fairness UK GDPR Art. 5(1)(a)–(d); Art. 25 Accuracy of shortlisting outputs; fairness to candidates across processing lifecycle
Data minimisation in monitoring UK GDPR Art. 5(1)(c) Monitoring methods must not collect excessive personal data
Regular testing of controls UK GDPR Art. 32(1)(d) Regular testing and evaluation of technical and organisational security measures
Breach documentation UK GDPR Art. 33(5) All breaches recorded regardless of reportability to ICO
Accountability UK GDPR Art. 5(2); Art. 24 Demonstrate compliance with all monitoring obligations through documented evidence
Human review of AI outputs UK GDPR Arts. 22A–22C; ICO AI in Recruitment Outcomes Report (Nov 2024) Human review before candidate contact; genuine override capability
Bias and fairness monitoring ICO AI in Recruitment Outcomes Report (Nov 2024); DSIT Responsible AI in Recruitment Guide (March 2024) Four-fifths rule; periodic testing; KPI reporting to senior management
Model drift detection DSIT Responsible AI in Recruitment Guide (March 2024) Post-deployment performance monitoring to prevent accuracy decay
Evidence retention ICO AI in Recruitment Outcomes Report (Nov 2024); UK GDPR Art. 5(2) Retain test results and records of remediation actions taken

3. Monitoring Principles

3.1 Proactive, not reactive

Monitoring must detect issues before they cause candidate harm. Waiting for complaints to surface bias or accuracy problems is insufficient. The DSIT Responsible AI in Recruitment Guide states that "a failure to proactively monitor for these risks can result in the emergence of harms and a reduction in system efficacy." The ICO expects evidence of proactive monitoring, not merely reactive incident records.

3.2 Evidence-based governance

All monitoring results must be documented and retained. The ICO expects AI providers to "report key performance indicators for accuracy and bias regularly to senior managers and key stakeholders" and to retain "test results or reports and evidence of actions taken to address issues" (ICO AI in Recruitment Outcomes Report, November 2024).

3.3 Human feedback as a monitoring input

Automated metric monitoring must be supplemented by structured human feedback channels — from recruiters and from candidates — because automated performance testing cannot identify every possible harm (DSIT Responsible AI in Recruitment Guide, Contestability and Redress section). Complaints, challenge requests, and recruiter feedback are monitoring inputs, not merely customer service events.

3.4 Data minimisation

Monitoring methods must not collect personal data beyond what is strictly necessary for the monitoring purpose. This constraint applies with particular force to demographic data used for bias monitoring. See L3-4.2-Bias-Monitoring-Protocol-v1.md for the full legal framework governing demographic data collection and use.

3.5 Proportionality

Monitoring intensity must be appropriate for an early-stage company [ASSUMPTION A-001]. This framework is designed to be implementable by a 10–15 person team without a dedicated compliance function. Metric collection should be largely automated; manual review time should be targeted at high-risk outputs and threshold breaches.


4. Metrics Register

All metrics below are grouped by category. Each metric includes its definition, measurement method, frequency, alert threshold, and responsible owner. Owners are defined by role per L1-2.5-Roles-and-Responsibilities-v1.md.

4.1 Shortlisting Performance Metrics

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-01 Shortlisting accuracy rate % of Scout shortlist recommendations that a trained human reviewer confirms as appropriate to advance, based on structured sampling Random sample of completed shortlisting runs — minimum 5% of outputs per month or 50 outputs (whichever is higher) [ASSUMPTION] Monthly <80% accuracy on sampled outputs triggers investigation CTO
M-02 Human review compliance rate % of Scout shortlisting outputs formally reviewed by a human recruiter before any candidate contact is made Automated flag: candidate contact recorded in Scout audit log before review completion event is logged [ASSUMPTION A-020] Continuous (automated); weekly summary report <100% is a compliance breach — any instance triggers immediate investigation Customer Success Lead
M-03 Human override rate % of Scout outputs where a human reviewer changes the recommendation (advances a candidate Scout rated below threshold, or rejects one rated above) Review audit log — count of override decisions as % of total reviews completed Monthly >25% override rate may indicate model accuracy degradation requiring investigation [ASSUMPTION] CTO
M-04 Confidence score distribution Distribution of Scout's internal relevance or confidence scores across all shortlisted and rejected candidates per shortlisting run API output log analysis — distribution statistics per run Per-run; monthly aggregate trend Unexpected spike in low-confidence outputs, or marked compression of score range, triggers investigation Engineering Lead

4.2 Human Review Quality Metrics

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-05 Review completion time Average time from Scout output delivery to human review completion Timestamped review events in Scout audit log [ASSUMPTION A-020] Weekly >5 business days average may indicate reviewers bypassing the review step [ASSUMPTION] Customer Success Lead
M-06 Reviewer challenge rate % of candidate shortlisting decisions formally flagged by a reviewer as potentially biased, inaccurate, or requiring re-assessment Challenge log entries as % of total reviews Monthly Any non-zero rate triggers bias investigation protocol per L3-4.2-Bias-Monitoring-Protocol-v1.md CTO

4.3 Candidate-Facing Metrics

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-07 Candidate complaint rate Number of candidates who raise a formal complaint about Scout's assessment of their application, including requests for human review on fairness or accuracy grounds Complaints register; intake via customer escalation path Monthly >1% of candidates processed in any month triggers investigation and DPIA review consideration [ASSUMPTION] CTO (acting DPO)
M-08 Arts. 22A–22C human review / challenge request rate Number of candidates invoking their rights under UK GDPR Arts. 22A–22C to request human review of a solely automated decision, or exercising the right to contest under Art. 22C Rights request log maintained by CTO (acting DPO) Monthly Any sustained volume over 3 consecutive months triggers DPIA review and transparency notice update CTO (acting DPO)
M-09 Transparency notice delivery rate % of candidate data processing events where a compliant transparency notice has been delivered by the recruiter customer prior to Scout processing [ASSUMPTION A-019] Customer attestation; periodic contractual audit Quarterly <100% is a contractual breach requiring customer remediation action Customer Success Lead

4.4 Model Performance and Drift Metrics

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-10 Shortlisting rate stability % of candidates shortlisted per run, tracked over time — stable rates suggest model consistency; unexplained shifts indicate potential drift Aggregate run-level statistics from Scout audit log; rolling 3-month baseline Per-run; monthly trend analysis >15 percentage point shift from rolling 3-month baseline triggers investigation [ASSUMPTION] Engineering Lead
M-11 Canary accuracy test Whether a set of predefined test CVs with known expected outputs continue to produce consistent Scout recommendations following any model, prompt, or configuration change Canary test suite — minimum 10 test CVs covering range of seniority, background, and CV format [ASSUMPTION] Before any model or prompt change is deployed; monthly otherwise Any canary test failure blocks deployment of the relevant change until investigation complete Engineering Lead
M-12 Output format compliance % of Scout outputs conforming to the expected structured format: required fields present; no hallucinated content fields; no narrative content outside specification Automated output schema validation at API response layer Per-run >1% format non-compliance in any single run Engineering Lead

4.5 Bias Proxy Metrics

The following metrics address the obligation to monitor for discriminatory impact in Scout's shortlisting outputs. They involve a direct tension between the duty to monitor for bias and UK GDPR constraints on processing demographic (special category) personal data. The legal framework governing the permissible approach to each metric is set out in detail in L3-4.2-Bias-Monitoring-Protocol-v1.md. [LEGAL REVIEW REQUIRED] before implementing any approach to demographic data collection for monitoring purposes.

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-13 Adverse impact ratio — voluntary demographic data Where candidate demographic data has been collected on a legally permissible basis (see L3-4.2-Bias-Monitoring-Protocol-v1.md §3–4): shortlisting rate ratio between the lowest-performing protected-characteristic group and the highest-performing group Statistical analysis of shortlisting outcomes segmented by demographic group — applies four-fifths rule (see L3-4.2-Bias-Monitoring-Protocol-v1.md §5) Monthly where data available; otherwise quarterly aggregate Adverse impact ratio <0.80 for any monitored group triggers bias investigation per L3-4.2-Bias-Monitoring-Protocol-v1.md CTO
M-14 Employment gap shortlisting rate Shortlisting rate for candidates whose CVs contain employment gaps of >12 months, compared to candidates without such gaps — employment gaps are a recognised proxy for pregnancy/maternity, disability, and caring responsibilities [ASSUMPTION] Manual review of sampled rejection outputs where employment gap identified in CV Quarterly >10% relative rejection-rate difference between gap and no-gap CVs triggers prompt review and bias investigation CTO
M-15 Aggregate shortlisting rate consistency Comparison of overall shortlisting rates across different customer job roles and sectors over time, to detect unexplained rate compression or expansion that might indicate a systemic bias signal Aggregate run statistics by customer segment from Scout audit log Monthly >20 percentage point difference in shortlisting rate across comparable job categories [ASSUMPTION] Engineering Lead

4.6 Data Governance Metrics

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-16 Candidate data retention compliance % of candidate personal data records deleted or pseudonymised in accordance with the agreed retention policy [ASSUMPTION A-013: retention periods not yet formally defined] Automated check against data creation date and agreed retention period Monthly Any records exceeding policy retention period triggers immediate deletion and documentation CTO (acting DPO)
M-17 Data access audit Review of all access to candidate personal data in Scout — recruiter users, Sable AI Ltd internal staff, Anthropic sub-processor [ASSUMPTION A-005] — against the access control policy Access log review against authorised access list Quarterly Any access event not matching the authorised access list triggers investigation Engineering Lead
M-18 DPIA currency Whether L2-3.4-DPIA-Template-v1.md remains current following any material change to Scout's processing: new model version, new data field, new customer segment, new processing purpose Manual review trigger on change events; annual calendar trigger otherwise Annually; on every material change event DPIA not reviewed within 12 months, or following any material change, triggers formal DPIA review CTO (acting DPO)

4.7 Design and Feature Audit Controls

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-23 Protected-characteristic filtering route review Periodic review of Scout's UI filters, API parameters, admin tools, search operators, and export logic to verify that users cannot directly or indirectly exclude candidates by protected characteristic or proxy indicator (e.g., name-based ethnic filtering, graduation year range as age proxy, employment gap filtering as disability proxy) Structured design review checklist covering UI, API query parameters, admin configuration options, and export fields; reviewed by Engineering Lead with sign-off from CTO Annually as a minimum; additionally before any major UI release, new filter feature, or API parameter change Any filtering route identified that enables exclusion by protected characteristic or proxy triggers immediate remediation and blocks the relevant release Engineering Lead; CTO (sign-off)

Regulatory basis: ICO AI in Recruitment Outcomes Report (November 2024): "features in some tools could lead to discrimination by having a search functionality that allowed recruiters to filter out candidates with certain protected characteristics." Equality Act 2010, s.19 (indirect discrimination by provision, criterion or practice). L2-3.2-Equality-Act-2010-Compliance-Map-v1.md (indirect discrimination mechanism analysis per characteristic).


4.8 Infrastructure and API Metrics

# Metric Definition Measurement Method Frequency Alert Threshold Owner
M-19 Anthropic API availability % uptime of the Anthropic Claude API measured at Scout's integration point [ASSUMPTION A-005] API health check from Scout application; Anthropic status page monitoring Continuous; daily summary report Any unplanned outage >30 minutes triggers escalation to Engineering Lead and Customer Success Lead Engineering Lead
M-20 API error rate % of Anthropic API calls returning a 4xx or 5xx error response Scout application error log Continuous; daily summary >1% error rate in any 1-hour window triggers investigation Engineering Lead
M-21 API response latency Average time from Scout sending a CV processing request to receiving a structured output from the Anthropic Claude API Scout application instrumentation Continuous; daily summary >10 seconds average latency or >30 second p99 latency in any hour triggers investigation [ASSUMPTION] Engineering Lead
M-22 Data transmission integrity Confirmation that only extracted CV text and job description text — not raw document files or additional personal data fields — are transmitted to the Anthropic API [ASSUMPTION A-011] Quarterly code-level data-flow audit; payload sampling review post each material code change Quarterly Any payload found to contain personal data beyond CV and JD text triggers immediate investigation and customer notification Engineering Lead

5. Dashboard Design

The ICO expects organisations to "report key performance indicators for accuracy and bias regularly to senior managers and key stakeholders" and to retain evidence of action taken. For an early-stage company [ASSUMPTION A-001], this means a lightweight but structured governance reporting process, not a complex enterprise BI platform.

5.1 Monthly KPI Report (internal)

A one-page structured monthly report covering the metrics above. Format: metric | current value | alert threshold | status (GREEN / AMBER / RED) | action required.

Distribution: Founder/CEO, CTO, Engineering Lead, Customer Success Lead.

Required content as a minimum:

Metric Notes
M-01 Shortlisting accuracy rate Include sample size and methodology note
M-02 Human review compliance rate Flag any breach instances by customer
M-03 Human override rate Flag >25% instances
M-07 Candidate complaint rate Include complaint type breakdown
M-10 Shortlisting rate stability Trend chart (3-month rolling)
M-13 Adverse impact ratio Where demographic data is available and permissible; note if unavailable
M-19 API availability Monthly uptime percentage
Incident summary Count and severity classification of any incidents per L3-4.3-Incident-Response-Plan-v1.md

5.2 Automated Alerting

The following metrics must trigger automated alerts to the responsible owner immediately on threshold breach:

Metric Alert channel Escalation recipient
M-02 Human review compliance Real-time application alert Customer Success Lead; CTO
M-11 Canary test failure Pre-deployment automated block; Engineering alert Engineering Lead; CTO
M-12 Output format compliance Per-run application alert Engineering Lead
M-19 API availability Real-time monitoring alert Engineering Lead; Customer Success Lead
M-20 API error rate Real-time monitoring alert Engineering Lead

All other metrics: surface in the monthly KPI report. AMBER status requires a documented explanation. RED status requires a written action plan within 5 business days [ASSUMPTION].

5.3 Tooling Recommendation

[ASSUMPTION] At an early stage, a full observability or BI platform is likely disproportionate. The recommended approach is:

  • Audit log: Scout application audit log exports to a controlled-access shared store (engineering-managed)
  • Monthly report: Produced by Engineering Lead from log export; reviewed by CTO; distributed to above recipients via internal document channel
  • Automated alerts: Delivered via existing engineering monitoring tooling (e.g., application error monitoring with Slack or email webhook notifications)
  • Escalation: Scale to a purpose-built analytics stack as the company grows, as customer volume increases, or as ICO audit risk increases

6. Monitoring Review Cadence

Review type Frequency Trigger Output Owner
Monthly KPI report Monthly Calendar Monthly KPI report (Section 5.1) Engineering Lead (produce); CTO (review and sign off)
Bias monitoring report Monthly where demographic data available; quarterly minimum Calendar and post-model change Adverse impact report per L3-4.2-Bias-Monitoring-Protocol-v1.md CTO
Canary / accuracy test Monthly and pre-deployment Calendar and code/prompt/model change Test pass/fail report; deployment gate decision Engineering Lead
Data access audit Quarterly Calendar Access review report Engineering Lead
DPIA review Annual; post-material change Calendar and change event Updated DPIA per L2-3.4-DPIA-Template-v1.md CTO (acting DPO)
Full framework review Annual Calendar Revised monitoring framework; updated thresholds CTO
External bias audit Annual minimum; post-significant bias finding Calendar External audit report; recommendations CTO (to commission)
Feature and design audit (M-23) Annual minimum; additionally before any major UI release or new filter / API parameter feature Calendar and pre-release trigger Design review checklist — confirmed no protected-characteristic filtering routes Engineering Lead (conduct); CTO (sign-off)

7. Escalation and Governance

7.1 Threshold breach response

Any metric breaching a RED threshold must be: 1. Logged in the Scout incident register (reference L3-4.3-Incident-Response-Plan-v1.md for severity classification) 2. Assigned a named owner and a remediation deadline (within 5 business days [ASSUMPTION]) 3. Reported in the next monthly KPI report with root cause analysis and corrective action

7.2 Governance escalation

Where a RED threshold breach relates to any of the following, the matter must be escalated to the Founder/CEO in addition to the CTO: - M-02 Human review compliance (Arts. 22A–22C safeguards breach) - M-07 or M-08 Candidate complaints or rights requests - M-13 Adverse impact ratio below 0.80 for any monitored group

7.3 External notification triggers

Monitoring findings may trigger external reporting obligations. The conditions and timelines for notifying the ICO under UK GDPR Art. 33 and for considering EHRC engagement are set out in L3-4.3-Incident-Response-Plan-v1.md.

7.4 Evidence retention

All monitoring reports, test results, alert records, and action plans must be retained for a minimum of 3 years [ASSUMPTION A-021]. The ICO expects organisations to be able to produce test results and remediation evidence on request.


8. Cross-References

Document Relevance
L3-4.2-Bias-Monitoring-Protocol-v1.md Detailed methodology for M-13–M-15 bias proxy metrics; legal framework for demographic data collection; adverse impact ratio calculation; remediation process
L3-4.3-Incident-Response-Plan-v1.md Incident definitions, severity levels P1–P3, external notification obligations triggered by monitoring findings
L1-2.5-Roles-and-Responsibilities-v1.md RACI for monitoring responsibilities
L2-3.4-DPIA-Template-v1.md DPIA review triggered by material changes detected through monitoring
L2-3.1-UK-GDPR-Mapping-Matrix-v1.md UK GDPR obligations mapping including Arts. 22A–22D and Art. 5
L2-3.2-Equality-Act-2010-Compliance-Map-v1.md Equality Act discrimination risk context for bias metrics

9. Regulatory Sources

Source Relevant provisions
UK GDPR Art. 5(1)(a)–(e), Art. 5(2), Arts. 22A–22D, Art. 24, Art. 25, Art. 32(1)(d), Art. 33(5)
ICO AI in Recruitment Outcomes Report (November 2024) Bias monitoring; human review; adverse impact ratio; evidence retention; senior reporting obligations
DSIT Responsible AI in Recruitment Guide (March 2024) Model drift; ongoing monitoring; iterative bias audits; contestability
ICO Guide to Accountability and Governance Documentation standards; review cycles; breach registers
ICO Personal Data Breaches Guidance Breach documentation and recording requirements