AI Monitoring Framework

Project: Pickles GmbH — AI Governance Framework Stage: Stage 4 — Monitoring & Operational Controls Status: Draft Version: v1 Date: 2026-02-26 Assumptions: Built on outline assumptions — not verified against real Pickles GmbH data

Purpose

This document defines Pickles GmbH's operational monitoring framework for all deployed AI systems. It fulfils the post-market monitoring system requirement under EU AI Act Article 72 for SYS-04 (high-risk), and establishes good-practice monitoring for SYS-01 through SYS-03 (limited-risk).

Regulatory basis: - EU AI Act Article 72 — Post-market monitoring system (mandatory for SYS-04) - EU AI Act Article 9(2)(c) — Risk management system: continuous iterative process including data from post-market monitoring - EU AI Act Article 15 — Accuracy, robustness, and cybersecurity performance targets - EU AI Act Article 17(1)(h) — Quality management system must include post-market monitoring - EU AI Act Article 12 — Logging requirements for high-risk AI systems - ISO/IEC 42001 Clauses 9.1, 9.2, 9.3 — Performance evaluation

[ASSUMPTION] All metrics, thresholds, and measurement methods in this document are proposed based on the assumed product architecture. They must be validated against real system capabilities and real baseline performance data before operational use.

[LEGAL REVIEW REQUIRED] Article 72(3) requires the post-market monitoring plan for SYS-04 to form part of the Annex IV technical documentation (L2-4.2 Section 9). The Commission implementing act establishing the template for that plan was due by 2 February 2026 — check whether it has been published and update accordingly.

1. Scope

System	Risk Tier	Monitoring Obligation	Notes
SYS-01 — Legal Research Assistant	Tier 3 Limited-risk	Good practice monitoring	No Article 72 mandate
SYS-02 — Document Drafting Tool	Tier 3 Limited-risk	Good practice monitoring	No Article 72 mandate
SYS-03 — Document Summarisation Tool	Tier 3 Limited-risk	Good practice monitoring	No Article 72 mandate
SYS-04 — Legal Analysis Tool	Tier 2 High-risk	Article 72 mandatory post-market monitoring	Full monitoring plan required

2. Monitoring Architecture

2.1 Data Collection Sources

Per Article 72(2), monitoring data may be provided by deployers or collected from other sources. [ASSUMPTION] Pickles GmbH collects monitoring data from:

Source	Data Type	Collection Method
Platform event logs (Article 12)	Session events, errors, latency	Automated — system-generated [ASSUMPTION]
User feedback mechanism (in-product)	User-reported errors, inaccuracies, complaints	In-product flag/report button [ASSUMPTION]
Lawyer client reports	Client-escalated accuracy concerns	Structured incident report form [ASSUMPTION]
Automated output sampling	Periodic automated quality checks	Sampling pipeline [ASSUMPTION]
Expert review panel	Independent legal accuracy assessment	Quarterly panel review [ASSUMPTION]
Model provider notifications	Model version changes, known issues	Provider update channel per L2-5.3

2.2 Monitoring Responsibilities

[ASSUMPTION]

Role	Monitoring Responsibility
AI Risk and Information Officer (AIRO)	Owns monitoring framework; reviews monthly dashboard; escalates to CEO
Head of Engineering	Operates technical monitoring tooling; reviews daily/weekly technical metrics
Head of Product	Reviews output quality and user experience metrics; owns expert review panel
DPO	Reviews data-related metrics; receives breach-relevant flags
Client Success	Aggregates and categorises client-reported concerns

3. Metrics Framework

3.1 Primary Metrics Table

#	Metric	Description	System Scope	Measurement Method	Frequency	Alert Threshold [ASSUMPTION]
M-01	Hallucination / factual error rate	Rate of AI outputs containing demonstrably false or unsupported legal statements	SYS-01, SYS-02, SYS-03, SYS-04	Expert review panel sampling (n=50 outputs per system per quarter); user-reported errors normalised to output volume	Quarterly (expert); ongoing (user-reported)	>2% expert-identified errors per sample triggers review; >5% triggers incident
M-02	Citation accuracy rate	Rate of legal citations (case references, article numbers, legislation) that are correctly identified and traceable	SYS-01, SYS-04	Automated citation verification against legal database [ASSUMPTION]; monthly sample audit	Monthly automated; quarterly audit	<95% accuracy triggers investigation; <90% triggers incident
M-03	User error report rate	Volume of user-submitted error/inaccuracy reports per 1,000 outputs	All systems	In-product reporting tool; normalised to output volume	Weekly	>5 reports per 1,000 outputs triggers product review
M-04	Output override / discard rate	Rate at which users actively discard or override AI outputs	All systems	User action logging (override/discard events)	Weekly	Significant increase (>20% week-on-week) triggers investigation — may indicate quality degradation or automation bias reduction
M-05	Bias signal monitoring	Differential performance across document language (German/English), practice area, or document type	SYS-01, SYS-04	Stratified sampling by category; compare error rates across strata	Quarterly	Statistically significant performance gap between strata triggers bias investigation
M-06	System availability / uptime	Percentage of time system is available and responsive	All systems	Infrastructure monitoring	Continuous	<99.5% monthly uptime triggers engineering review
M-07	Latency — P95 response time	95th percentile response time for standard queries	All systems	Infrastructure monitoring	Continuous	>10 seconds P95 triggers engineering review [ASSUMPTION]
M-08	Model drift indicator	Detected change in output distribution, style, or behaviour without an authorised model update	SYS-04	Automated comparison of output embedding distribution against baseline; triggered on model provider update notifications	Continuous; reviewed monthly	Any statistically significant drift not attributable to an authorised update triggers L3-6.3 change management
M-09	Complaint volume and classification	Number and category of formal client complaints relating to AI output quality	All systems	Client complaints log (Client Success); AIRO review	Monthly	>3 complaints per month in same category triggers root cause analysis
M-10	Out-of-scope use detection	Queries or use patterns outside the system's intended purpose	SYS-04	Log analysis for query patterns outside defined intended purpose categories [ASSUMPTION]	Monthly	Confirmed out-of-scope use pattern triggers client communication and Terms of Service review

3.2 SYS-04 High-Risk Specific Metrics (Article 72 — Post-Market Monitoring)

In addition to M-01 through M-10, SYS-04 requires the following Article 72-specific monitoring to evaluate continuous compliance with Chapter III Section 2 requirements:

#	Metric	Chapter III Section 2 Requirement	Measurement Method
M-11	Risk management system effectiveness	Article 9 — ongoing risk management	Annual risk management review; compare identified risks against incident log
M-12	Human oversight compliance	Article 14 — users able to override, disregard, halt	Quarterly audit of override/discard rate (M-04); user competence assessment [ASSUMPTION]
M-13	Logging completeness	Article 12 — automatic event logging	Monthly log audit — confirm all session events are captured; no gaps
M-14	Accuracy declaration compliance	Article 15(3) — accuracy metrics declared in instructions for use	Quarterly comparison of actual accuracy (M-01, M-02) against declared metrics; flag if actual performance falls below declared level
M-15	Cybersecurity posture	Article 15(5) — resilience against adversarial inputs	Annual penetration test; quarterly review of prompt injection defence logs [ASSUMPTION]

4. Monitoring Frequency and Reporting Schedule

Frequency	Activities	Output	Recipient [ASSUMPTION]
Continuous	Uptime, latency, error alerts (M-06, M-07), drift detection (M-08)	Real-time alerts	Head of Engineering
Weekly	User error report rate (M-03), override/discard rate (M-04)	Weekly metrics digest	Head of Product, Head of Engineering
Monthly	Citation accuracy (M-02 automated), complaint classification (M-09), out-of-scope use (M-10), logging audit (M-13)	Monthly monitoring report	AIRO, DPO (data-related metrics)
Quarterly	Hallucination/error rate (M-01 expert review), bias signals (M-05), human oversight compliance (M-12), accuracy declaration compliance (M-14)	Quarterly performance report	AIRO, CEO, Head of Product
Annual	Risk management system effectiveness (M-11), cybersecurity posture (M-15), full post-market monitoring plan review	Annual AI system review	CEO, AIRO, DPO, Board [ASSUMPTION]

5. Dashboard Design

[ASSUMPTION] Pickles GmbH's monitoring dashboard aggregates the above metrics into a single view accessible to the AIRO and senior leadership. Recommended dashboard structure:

5.1 Dashboard Sections

Section A — System Health (Real-time) - Uptime status per system (RAG: green/amber/red) - P95 latency per system - Active incident count (links to L3-6.2 incident log)

Section B — Output Quality (Weekly/Monthly) - User error report rate trend (line chart, 13-week rolling) - Override/discard rate trend (line chart, 13-week rolling) - Complaint volume by category (bar chart, monthly) - Citation accuracy rate (SYS-01, SYS-04) — monthly trend

Section C — High-Risk Compliance (SYS-04) - Last expert review date and result (M-01) - Logging completeness status (M-13) - Drift indicator status (M-08) - Days since last model update and change management status

Section D — Governance - Open action items from previous monitoring reports - Next scheduled expert review date - Assumptions requiring verification (link to ASSUMPTIONS-LOG.md)

6. Escalation Thresholds and Incident Triggers

When a monitoring metric breaches its alert threshold, the following escalation applies:

Trigger	Immediate Action	Escalation Path
M-01 >5% error rate in expert sample	Pause deployment of affected system pending investigation	Head of Engineering → AIRO → CEO; activate L3-6.2
M-02 citation accuracy <90%	Investigate root cause; suspend marketing claims about accuracy	Head of Product → AIRO
M-08 confirmed unauthorised model drift	Activate L3-6.3 change management; assess Article 20 / Article 73 obligations	Head of Engineering → AIRO → Legal
Any metric meeting Article 3(49) serious incident definition	Activate L3-6.2 Incident Response Playbook immediately	AIRO → CEO → Legal → Article 73 reporting
GDPR data breach detected	Activate L3-6.2 GDPR breach channel; 72-hour GDPR Article 33 clock starts	DPO → CEO → Legal

7. Post-Market Monitoring Plan — SYS-04 (Article 72)

This section constitutes the post-market monitoring plan for SYS-04 required by Article 72(3). It must be incorporated into the SYS-04 Technical Documentation Pack (L2-4.2 Section 9).

Element	Content
Monitoring objective	Evaluate continuous compliance of SYS-04 with Chapter III Section 2 requirements throughout its operational lifetime
Data actively collected	Metrics M-01 through M-15 as defined in Section 3; event logs per Article 12
Data from deployers	Client-reported errors (M-03, M-09); client-confirmed out-of-scope use reports (M-10) [ASSUMPTION]
Review frequency	Per Section 4 schedule; annual comprehensive review
Compliance evaluation trigger	Any metric breach (Section 6) triggers compliance re-evaluation against Chapter III Section 2
Corrective action link	Metric breaches trigger L3-6.2 (incident) or L3-6.3 (change management) as appropriate
Interaction with other AI systems	[ASSUMPTION] SYS-04 does not currently interact with other AI systems — confirm against real architecture; update if integration occurs
Plan update trigger	Material change to SYS-04 (L3-6.3); change in regulatory requirements; annual review

8. Cross-References

Document	Relevance
L2-4.1 — EU AI Act Risk Mapping Matrix	Risk classification determining Article 72 scope
L2-4.2 — Technical Documentation Pack	Section 9 — post-market monitoring plan; Section 2.7 — accuracy metrics declared
L3-6.2 — Incident Response Playbook	Activated when monitoring triggers Article 3(49) or GDPR breach
L3-6.3 — Model Change Management Protocol	Activated when monitoring detects drift or change event
L2-5.3 — Vendor Risk Assessment	Model provider update notifications feed into M-08

Document Control

Field	Detail
Document ID	L3-6.1
Next review	Annual; or when SYS-04 model changes materially
Regulatory basis	EU AI Act Articles 9, 12, 15, 17(1)(h), 72; ISO/IEC 42001 Clauses 9.1–9.3
Assumptions relied upon	A-001, A-004, A-005, A-009