Monitoring Framework

Project: Sable AI Ltd — AI Governance Framework Stage: Stage 4 — Monitoring & Operational Controls Status: Draft Version: v1 Date: 2026-03-01 Assumptions: Built on outline assumptions — not verified against real Sable AI Ltd data

1. Purpose and Scope

This document establishes the operational monitoring framework for Scout, Sable AI Ltd's AI-powered CV screening and candidate shortlisting tool [ASSUMPTION A-002]. It defines the key performance indicators, measurement methods, alert thresholds, and governance processes required to ensure Scout continues to operate accurately, fairly, and in compliance with UK GDPR, Equality Act 2010, and ICO expectations after deployment.

Monitoring is a legal obligation, not an optional operational practice. UK GDPR Article 5(1)(a)–(e) requires that personal data processing remains lawful, fair, accurate, and limited throughout the processing lifecycle. The ICO's AI in Recruitment Outcomes Report (November 2024) confirms that the ICO actively assesses whether AI providers have "regularly monitored accuracy and bias and swiftly addressed issues throughout the AI lifecycle." The DSIT Responsible AI in Recruitment Guide (March 2024) further confirms that continuous monitoring is required to detect model drift, errors, and bias emerging during live operation.

Scope: This framework covers all processing of candidate personal data by Scout, including CV ingestion, Anthropic Claude API processing [ASSUMPTION A-005], structured output generation, and human reviewer interface interactions [ASSUMPTION A-007]. It applies to Sable AI Ltd as data processor and to Sable AI Ltd's customers who are controllers.

2. Regulatory Basis

Obligation	Source	Relevance to Scout
Ongoing accuracy and fairness	UK GDPR Art. 5(1)(a)–(d); Art. 25	Accuracy of shortlisting outputs; fairness to candidates across processing lifecycle
Data minimisation in monitoring	UK GDPR Art. 5(1)(c)	Monitoring methods must not collect excessive personal data
Regular testing of controls	UK GDPR Art. 32(1)(d)	Regular testing and evaluation of technical and organisational security measures
Breach documentation	UK GDPR Art. 33(5)	All breaches recorded regardless of reportability to ICO
Accountability	UK GDPR Art. 5(2); Art. 24	Demonstrate compliance with all monitoring obligations through documented evidence
Human review of AI outputs	UK GDPR Arts. 22A–22C; ICO AI in Recruitment Outcomes Report (Nov 2024)	Human review before candidate contact; genuine override capability
Bias and fairness monitoring	ICO AI in Recruitment Outcomes Report (Nov 2024); DSIT Responsible AI in Recruitment Guide (March 2024)	Four-fifths rule; periodic testing; KPI reporting to senior management
Model drift detection	DSIT Responsible AI in Recruitment Guide (March 2024)	Post-deployment performance monitoring to prevent accuracy decay
Evidence retention	ICO AI in Recruitment Outcomes Report (Nov 2024); UK GDPR Art. 5(2)	Retain test results and records of remediation actions taken

3. Monitoring Principles

3.1 Proactive, not reactive

Monitoring must detect issues before they cause candidate harm. Waiting for complaints to surface bias or accuracy problems is insufficient. The DSIT Responsible AI in Recruitment Guide states that "a failure to proactively monitor for these risks can result in the emergence of harms and a reduction in system efficacy." The ICO expects evidence of proactive monitoring, not merely reactive incident records.

3.2 Evidence-based governance

All monitoring results must be documented and retained. The ICO expects AI providers to "report key performance indicators for accuracy and bias regularly to senior managers and key stakeholders" and to retain "test results or reports and evidence of actions taken to address issues" (ICO AI in Recruitment Outcomes Report, November 2024).

3.3 Human feedback as a monitoring input

Automated metric monitoring must be supplemented by structured human feedback channels — from recruiters and from candidates — because automated performance testing cannot identify every possible harm (DSIT Responsible AI in Recruitment Guide, Contestability and Redress section). Complaints, challenge requests, and recruiter feedback are monitoring inputs, not merely customer service events.

3.4 Data minimisation

Monitoring methods must not collect personal data beyond what is strictly necessary for the monitoring purpose. This constraint applies with particular force to demographic data used for bias monitoring. See L3-4.2-Bias-Monitoring-Protocol-v1.md for the full legal framework governing demographic data collection and use.

3.5 Proportionality

Monitoring intensity must be appropriate for an early-stage company [ASSUMPTION A-001]. This framework is designed to be implementable by a 10–15 person team without a dedicated compliance function. Metric collection should be largely automated; manual review time should be targeted at high-risk outputs and threshold breaches.

4. Metrics Register

All metrics below are grouped by category. Each metric includes its definition, measurement method, frequency, alert threshold, and responsible owner. Owners are defined by role per L1-2.5-Roles-and-Responsibilities-v1.md.

4.1 Shortlisting Performance Metrics

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-01	Shortlisting accuracy rate	% of Scout shortlist recommendations that a trained human reviewer confirms as appropriate to advance, based on structured sampling	Random sample of completed shortlisting runs — minimum 5% of outputs per month or 50 outputs (whichever is higher) [ASSUMPTION]	Monthly	<80% accuracy on sampled outputs triggers investigation	CTO
M-02	Human review compliance rate	% of Scout shortlisting outputs formally reviewed by a human recruiter before any candidate contact is made	Automated flag: candidate contact recorded in Scout audit log before review completion event is logged [ASSUMPTION A-020]	Continuous (automated); weekly summary report	<100% is a compliance breach — any instance triggers immediate investigation	Customer Success Lead
M-03	Human override rate	% of Scout outputs where a human reviewer changes the recommendation (advances a candidate Scout rated below threshold, or rejects one rated above)	Review audit log — count of override decisions as % of total reviews completed	Monthly	>25% override rate may indicate model accuracy degradation requiring investigation [ASSUMPTION]	CTO
M-04	Confidence score distribution	Distribution of Scout's internal relevance or confidence scores across all shortlisted and rejected candidates per shortlisting run	API output log analysis — distribution statistics per run	Per-run; monthly aggregate trend	Unexpected spike in low-confidence outputs, or marked compression of score range, triggers investigation	Engineering Lead

4.2 Human Review Quality Metrics

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-05	Review completion time	Average time from Scout output delivery to human review completion	Timestamped review events in Scout audit log [ASSUMPTION A-020]	Weekly	>5 business days average may indicate reviewers bypassing the review step [ASSUMPTION]	Customer Success Lead
M-06	Reviewer challenge rate	% of candidate shortlisting decisions formally flagged by a reviewer as potentially biased, inaccurate, or requiring re-assessment	Challenge log entries as % of total reviews	Monthly	Any non-zero rate triggers bias investigation protocol per `L3-4.2-Bias-Monitoring-Protocol-v1.md`	CTO

4.3 Candidate-Facing Metrics

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-07	Candidate complaint rate	Number of candidates who raise a formal complaint about Scout's assessment of their application, including requests for human review on fairness or accuracy grounds	Complaints register; intake via customer escalation path	Monthly	>1% of candidates processed in any month triggers investigation and DPIA review consideration [ASSUMPTION]	CTO (acting DPO)
M-08	Arts. 22A–22C human review / challenge request rate	Number of candidates invoking their rights under UK GDPR Arts. 22A–22C to request human review of a solely automated decision, or exercising the right to contest under Art. 22C	Rights request log maintained by CTO (acting DPO)	Monthly	Any sustained volume over 3 consecutive months triggers DPIA review and transparency notice update	CTO (acting DPO)
M-09	Transparency notice delivery rate	% of candidate data processing events where a compliant transparency notice has been delivered by the recruiter customer prior to Scout processing [ASSUMPTION A-019]	Customer attestation; periodic contractual audit	Quarterly	<100% is a contractual breach requiring customer remediation action	Customer Success Lead

4.4 Model Performance and Drift Metrics

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-10	Shortlisting rate stability	% of candidates shortlisted per run, tracked over time — stable rates suggest model consistency; unexplained shifts indicate potential drift	Aggregate run-level statistics from Scout audit log; rolling 3-month baseline	Per-run; monthly trend analysis	>15 percentage point shift from rolling 3-month baseline triggers investigation [ASSUMPTION]	Engineering Lead
M-11	Canary accuracy test	Whether a set of predefined test CVs with known expected outputs continue to produce consistent Scout recommendations following any model, prompt, or configuration change	Canary test suite — minimum 10 test CVs covering range of seniority, background, and CV format [ASSUMPTION]	Before any model or prompt change is deployed; monthly otherwise	Any canary test failure blocks deployment of the relevant change until investigation complete	Engineering Lead
M-12	Output format compliance	% of Scout outputs conforming to the expected structured format: required fields present; no hallucinated content fields; no narrative content outside specification	Automated output schema validation at API response layer	Per-run	>1% format non-compliance in any single run	Engineering Lead

4.5 Bias Proxy Metrics

The following metrics address the obligation to monitor for discriminatory impact in Scout's shortlisting outputs. They involve a direct tension between the duty to monitor for bias and UK GDPR constraints on processing demographic (special category) personal data. The legal framework governing the permissible approach to each metric is set out in detail in L3-4.2-Bias-Monitoring-Protocol-v1.md. [LEGAL REVIEW REQUIRED] before implementing any approach to demographic data collection for monitoring purposes.

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-13	Adverse impact ratio — voluntary demographic data	Where candidate demographic data has been collected on a legally permissible basis (see `L3-4.2-Bias-Monitoring-Protocol-v1.md` §3–4): shortlisting rate ratio between the lowest-performing protected-characteristic group and the highest-performing group	Statistical analysis of shortlisting outcomes segmented by demographic group — applies four-fifths rule (see `L3-4.2-Bias-Monitoring-Protocol-v1.md` §5)	Monthly where data available; otherwise quarterly aggregate	Adverse impact ratio <0.80 for any monitored group triggers bias investigation per `L3-4.2-Bias-Monitoring-Protocol-v1.md`	CTO
M-14	Employment gap shortlisting rate	Shortlisting rate for candidates whose CVs contain employment gaps of >12 months, compared to candidates without such gaps — employment gaps are a recognised proxy for pregnancy/maternity, disability, and caring responsibilities [ASSUMPTION]	Manual review of sampled rejection outputs where employment gap identified in CV	Quarterly	>10% relative rejection-rate difference between gap and no-gap CVs triggers prompt review and bias investigation	CTO
M-15	Aggregate shortlisting rate consistency	Comparison of overall shortlisting rates across different customer job roles and sectors over time, to detect unexplained rate compression or expansion that might indicate a systemic bias signal	Aggregate run statistics by customer segment from Scout audit log	Monthly	>20 percentage point difference in shortlisting rate across comparable job categories [ASSUMPTION]	Engineering Lead

4.6 Data Governance Metrics

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-16	Candidate data retention compliance	% of candidate personal data records deleted or pseudonymised in accordance with the agreed retention policy [ASSUMPTION A-013: retention periods not yet formally defined]	Automated check against data creation date and agreed retention period	Monthly	Any records exceeding policy retention period triggers immediate deletion and documentation	CTO (acting DPO)
M-17	Data access audit	Review of all access to candidate personal data in Scout — recruiter users, Sable AI Ltd internal staff, Anthropic sub-processor [ASSUMPTION A-005] — against the access control policy	Access log review against authorised access list	Quarterly	Any access event not matching the authorised access list triggers investigation	Engineering Lead
M-18	DPIA currency	Whether `L2-3.4-DPIA-Template-v1.md` remains current following any material change to Scout's processing: new model version, new data field, new customer segment, new processing purpose	Manual review trigger on change events; annual calendar trigger otherwise	Annually; on every material change event	DPIA not reviewed within 12 months, or following any material change, triggers formal DPIA review	CTO (acting DPO)

4.7 Design and Feature Audit Controls

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-23	Protected-characteristic filtering route review	Periodic review of Scout's UI filters, API parameters, admin tools, search operators, and export logic to verify that users cannot directly or indirectly exclude candidates by protected characteristic or proxy indicator (e.g., name-based ethnic filtering, graduation year range as age proxy, employment gap filtering as disability proxy)	Structured design review checklist covering UI, API query parameters, admin configuration options, and export fields; reviewed by Engineering Lead with sign-off from CTO	Annually as a minimum; additionally before any major UI release, new filter feature, or API parameter change	Any filtering route identified that enables exclusion by protected characteristic or proxy triggers immediate remediation and blocks the relevant release	Engineering Lead; CTO (sign-off)

Regulatory basis: ICO AI in Recruitment Outcomes Report (November 2024): "features in some tools could lead to discrimination by having a search functionality that allowed recruiters to filter out candidates with certain protected characteristics." Equality Act 2010, s.19 (indirect discrimination by provision, criterion or practice). L2-3.2-Equality-Act-2010-Compliance-Map-v1.md (indirect discrimination mechanism analysis per characteristic).

4.8 Infrastructure and API Metrics

#	Metric	Definition	Measurement Method	Frequency	Alert Threshold	Owner
M-19	Anthropic API availability	% uptime of the Anthropic Claude API measured at Scout's integration point [ASSUMPTION A-005]	API health check from Scout application; Anthropic status page monitoring	Continuous; daily summary report	Any unplanned outage >30 minutes triggers escalation to Engineering Lead and Customer Success Lead	Engineering Lead
M-20	API error rate	% of Anthropic API calls returning a 4xx or 5xx error response	Scout application error log	Continuous; daily summary	>1% error rate in any 1-hour window triggers investigation	Engineering Lead
M-21	API response latency	Average time from Scout sending a CV processing request to receiving a structured output from the Anthropic Claude API	Scout application instrumentation	Continuous; daily summary	>10 seconds average latency or >30 second p99 latency in any hour triggers investigation [ASSUMPTION]	Engineering Lead
M-22	Data transmission integrity	Confirmation that only extracted CV text and job description text — not raw document files or additional personal data fields — are transmitted to the Anthropic API [ASSUMPTION A-011]	Quarterly code-level data-flow audit; payload sampling review post each material code change	Quarterly	Any payload found to contain personal data beyond CV and JD text triggers immediate investigation and customer notification	Engineering Lead

5. Dashboard Design

The ICO expects organisations to "report key performance indicators for accuracy and bias regularly to senior managers and key stakeholders" and to retain evidence of action taken. For an early-stage company [ASSUMPTION A-001], this means a lightweight but structured governance reporting process, not a complex enterprise BI platform.

5.1 Monthly KPI Report (internal)

A one-page structured monthly report covering the metrics above. Format: metric | current value | alert threshold | status (GREEN / AMBER / RED) | action required.

Distribution: Founder/CEO, CTO, Engineering Lead, Customer Success Lead.

Required content as a minimum:

Metric	Notes
M-01 Shortlisting accuracy rate	Include sample size and methodology note
M-02 Human review compliance rate	Flag any breach instances by customer
M-03 Human override rate	Flag >25% instances
M-07 Candidate complaint rate	Include complaint type breakdown
M-10 Shortlisting rate stability	Trend chart (3-month rolling)
M-13 Adverse impact ratio	Where demographic data is available and permissible; note if unavailable
M-19 API availability	Monthly uptime percentage
Incident summary	Count and severity classification of any incidents per `L3-4.3-Incident-Response-Plan-v1.md`

5.2 Automated Alerting

The following metrics must trigger automated alerts to the responsible owner immediately on threshold breach:

Metric	Alert channel	Escalation recipient
M-02 Human review compliance	Real-time application alert	Customer Success Lead; CTO
M-11 Canary test failure	Pre-deployment automated block; Engineering alert	Engineering Lead; CTO
M-12 Output format compliance	Per-run application alert	Engineering Lead
M-19 API availability	Real-time monitoring alert	Engineering Lead; Customer Success Lead
M-20 API error rate	Real-time monitoring alert	Engineering Lead

All other metrics: surface in the monthly KPI report. AMBER status requires a documented explanation. RED status requires a written action plan within 5 business days [ASSUMPTION].

5.3 Tooling Recommendation

[ASSUMPTION] At an early stage, a full observability or BI platform is likely disproportionate. The recommended approach is:

Audit log: Scout application audit log exports to a controlled-access shared store (engineering-managed)
Monthly report: Produced by Engineering Lead from log export; reviewed by CTO; distributed to above recipients via internal document channel
Automated alerts: Delivered via existing engineering monitoring tooling (e.g., application error monitoring with Slack or email webhook notifications)
Escalation: Scale to a purpose-built analytics stack as the company grows, as customer volume increases, or as ICO audit risk increases

6. Monitoring Review Cadence

Review type	Frequency	Trigger	Output	Owner
Monthly KPI report	Monthly	Calendar	Monthly KPI report (Section 5.1)	Engineering Lead (produce); CTO (review and sign off)
Bias monitoring report	Monthly where demographic data available; quarterly minimum	Calendar and post-model change	Adverse impact report per `L3-4.2-Bias-Monitoring-Protocol-v1.md`	CTO
Canary / accuracy test	Monthly and pre-deployment	Calendar and code/prompt/model change	Test pass/fail report; deployment gate decision	Engineering Lead
Data access audit	Quarterly	Calendar	Access review report	Engineering Lead
DPIA review	Annual; post-material change	Calendar and change event	Updated DPIA per `L2-3.4-DPIA-Template-v1.md`	CTO (acting DPO)
Full framework review	Annual	Calendar	Revised monitoring framework; updated thresholds	CTO
External bias audit	Annual minimum; post-significant bias finding	Calendar	External audit report; recommendations	CTO (to commission)
Feature and design audit (M-23)	Annual minimum; additionally before any major UI release or new filter / API parameter feature	Calendar and pre-release trigger	Design review checklist — confirmed no protected-characteristic filtering routes	Engineering Lead (conduct); CTO (sign-off)

7. Escalation and Governance

7.1 Threshold breach response

Any metric breaching a RED threshold must be: 1. Logged in the Scout incident register (reference L3-4.3-Incident-Response-Plan-v1.md for severity classification) 2. Assigned a named owner and a remediation deadline (within 5 business days [ASSUMPTION]) 3. Reported in the next monthly KPI report with root cause analysis and corrective action

7.2 Governance escalation

Where a RED threshold breach relates to any of the following, the matter must be escalated to the Founder/CEO in addition to the CTO: - M-02 Human review compliance (Arts. 22A–22C safeguards breach) - M-07 or M-08 Candidate complaints or rights requests - M-13 Adverse impact ratio below 0.80 for any monitored group

7.3 External notification triggers

Monitoring findings may trigger external reporting obligations. The conditions and timelines for notifying the ICO under UK GDPR Art. 33 and for considering EHRC engagement are set out in L3-4.3-Incident-Response-Plan-v1.md.

7.4 Evidence retention

All monitoring reports, test results, alert records, and action plans must be retained for a minimum of 3 years [ASSUMPTION A-021]. The ICO expects organisations to be able to produce test results and remediation evidence on request.

8. Cross-References

Document	Relevance
`L3-4.2-Bias-Monitoring-Protocol-v1.md`	Detailed methodology for M-13–M-15 bias proxy metrics; legal framework for demographic data collection; adverse impact ratio calculation; remediation process
`L3-4.3-Incident-Response-Plan-v1.md`	Incident definitions, severity levels P1–P3, external notification obligations triggered by monitoring findings
`L1-2.5-Roles-and-Responsibilities-v1.md`	RACI for monitoring responsibilities
`L2-3.4-DPIA-Template-v1.md`	DPIA review triggered by material changes detected through monitoring
`L2-3.1-UK-GDPR-Mapping-Matrix-v1.md`	UK GDPR obligations mapping including Arts. 22A–22D and Art. 5
`L2-3.2-Equality-Act-2010-Compliance-Map-v1.md`	Equality Act discrimination risk context for bias metrics

9. Regulatory Sources

Source	Relevant provisions
UK GDPR	Art. 5(1)(a)–(e), Art. 5(2), Arts. 22A–22D, Art. 24, Art. 25, Art. 32(1)(d), Art. 33(5)
ICO AI in Recruitment Outcomes Report (November 2024)	Bias monitoring; human review; adverse impact ratio; evidence retention; senior reporting obligations
DSIT Responsible AI in Recruitment Guide (March 2024)	Model drift; ongoing monitoring; iterative bias audits; contestability
ICO Guide to Accountability and Governance	Documentation standards; review cycles; breach registers
ICO Personal Data Breaches Guidance	Breach documentation and recording requirements