Skip to content

AI Monitoring Framework

Project: Pickles GmbH — AI Governance Framework Stage: Stage 4 — Monitoring & Operational Controls Status: Draft Version: v1 Date: 2026-02-26 Assumptions: Built on outline assumptions — not verified against real Pickles GmbH data


Purpose

This document defines Pickles GmbH's operational monitoring framework for all deployed AI systems. It fulfils the post-market monitoring system requirement under EU AI Act Article 72 for SYS-04 (high-risk), and establishes good-practice monitoring for SYS-01 through SYS-03 (limited-risk).

Regulatory basis: - EU AI Act Article 72 — Post-market monitoring system (mandatory for SYS-04) - EU AI Act Article 9(2)(c) — Risk management system: continuous iterative process including data from post-market monitoring - EU AI Act Article 15 — Accuracy, robustness, and cybersecurity performance targets - EU AI Act Article 17(1)(h) — Quality management system must include post-market monitoring - EU AI Act Article 12 — Logging requirements for high-risk AI systems - ISO/IEC 42001 Clauses 9.1, 9.2, 9.3 — Performance evaluation

[ASSUMPTION] All metrics, thresholds, and measurement methods in this document are proposed based on the assumed product architecture. They must be validated against real system capabilities and real baseline performance data before operational use.

[LEGAL REVIEW REQUIRED] Article 72(3) requires the post-market monitoring plan for SYS-04 to form part of the Annex IV technical documentation (L2-4.2 Section 9). The Commission implementing act establishing the template for that plan was due by 2 February 2026 — check whether it has been published and update accordingly.


1. Scope

System Risk Tier Monitoring Obligation Notes
SYS-01 — Legal Research Assistant Tier 3 Limited-risk Good practice monitoring No Article 72 mandate
SYS-02 — Document Drafting Tool Tier 3 Limited-risk Good practice monitoring No Article 72 mandate
SYS-03 — Document Summarisation Tool Tier 3 Limited-risk Good practice monitoring No Article 72 mandate
SYS-04 — Legal Analysis Tool Tier 2 High-risk Article 72 mandatory post-market monitoring Full monitoring plan required

2. Monitoring Architecture

2.1 Data Collection Sources

Per Article 72(2), monitoring data may be provided by deployers or collected from other sources. [ASSUMPTION] Pickles GmbH collects monitoring data from:

Source Data Type Collection Method
Platform event logs (Article 12) Session events, errors, latency Automated — system-generated [ASSUMPTION]
User feedback mechanism (in-product) User-reported errors, inaccuracies, complaints In-product flag/report button [ASSUMPTION]
Lawyer client reports Client-escalated accuracy concerns Structured incident report form [ASSUMPTION]
Automated output sampling Periodic automated quality checks Sampling pipeline [ASSUMPTION]
Expert review panel Independent legal accuracy assessment Quarterly panel review [ASSUMPTION]
Model provider notifications Model version changes, known issues Provider update channel per L2-5.3

2.2 Monitoring Responsibilities

[ASSUMPTION]

Role Monitoring Responsibility
AI Risk and Information Officer (AIRO) Owns monitoring framework; reviews monthly dashboard; escalates to CEO
Head of Engineering Operates technical monitoring tooling; reviews daily/weekly technical metrics
Head of Product Reviews output quality and user experience metrics; owns expert review panel
DPO Reviews data-related metrics; receives breach-relevant flags
Client Success Aggregates and categorises client-reported concerns

3. Metrics Framework

3.1 Primary Metrics Table

# Metric Description System Scope Measurement Method Frequency Alert Threshold [ASSUMPTION]
M-01 Hallucination / factual error rate Rate of AI outputs containing demonstrably false or unsupported legal statements SYS-01, SYS-02, SYS-03, SYS-04 Expert review panel sampling (n=50 outputs per system per quarter); user-reported errors normalised to output volume Quarterly (expert); ongoing (user-reported) >2% expert-identified errors per sample triggers review; >5% triggers incident
M-02 Citation accuracy rate Rate of legal citations (case references, article numbers, legislation) that are correctly identified and traceable SYS-01, SYS-04 Automated citation verification against legal database [ASSUMPTION]; monthly sample audit Monthly automated; quarterly audit <95% accuracy triggers investigation; <90% triggers incident
M-03 User error report rate Volume of user-submitted error/inaccuracy reports per 1,000 outputs All systems In-product reporting tool; normalised to output volume Weekly >5 reports per 1,000 outputs triggers product review
M-04 Output override / discard rate Rate at which users actively discard or override AI outputs All systems User action logging (override/discard events) Weekly Significant increase (>20% week-on-week) triggers investigation — may indicate quality degradation or automation bias reduction
M-05 Bias signal monitoring Differential performance across document language (German/English), practice area, or document type SYS-01, SYS-04 Stratified sampling by category; compare error rates across strata Quarterly Statistically significant performance gap between strata triggers bias investigation
M-06 System availability / uptime Percentage of time system is available and responsive All systems Infrastructure monitoring Continuous <99.5% monthly uptime triggers engineering review
M-07 Latency — P95 response time 95th percentile response time for standard queries All systems Infrastructure monitoring Continuous >10 seconds P95 triggers engineering review [ASSUMPTION]
M-08 Model drift indicator Detected change in output distribution, style, or behaviour without an authorised model update SYS-04 Automated comparison of output embedding distribution against baseline; triggered on model provider update notifications Continuous; reviewed monthly Any statistically significant drift not attributable to an authorised update triggers L3-6.3 change management
M-09 Complaint volume and classification Number and category of formal client complaints relating to AI output quality All systems Client complaints log (Client Success); AIRO review Monthly >3 complaints per month in same category triggers root cause analysis
M-10 Out-of-scope use detection Queries or use patterns outside the system's intended purpose SYS-04 Log analysis for query patterns outside defined intended purpose categories [ASSUMPTION] Monthly Confirmed out-of-scope use pattern triggers client communication and Terms of Service review

3.2 SYS-04 High-Risk Specific Metrics (Article 72 — Post-Market Monitoring)

In addition to M-01 through M-10, SYS-04 requires the following Article 72-specific monitoring to evaluate continuous compliance with Chapter III Section 2 requirements:

# Metric Chapter III Section 2 Requirement Measurement Method
M-11 Risk management system effectiveness Article 9 — ongoing risk management Annual risk management review; compare identified risks against incident log
M-12 Human oversight compliance Article 14 — users able to override, disregard, halt Quarterly audit of override/discard rate (M-04); user competence assessment [ASSUMPTION]
M-13 Logging completeness Article 12 — automatic event logging Monthly log audit — confirm all session events are captured; no gaps
M-14 Accuracy declaration compliance Article 15(3) — accuracy metrics declared in instructions for use Quarterly comparison of actual accuracy (M-01, M-02) against declared metrics; flag if actual performance falls below declared level
M-15 Cybersecurity posture Article 15(5) — resilience against adversarial inputs Annual penetration test; quarterly review of prompt injection defence logs [ASSUMPTION]

4. Monitoring Frequency and Reporting Schedule

Frequency Activities Output Recipient [ASSUMPTION]
Continuous Uptime, latency, error alerts (M-06, M-07), drift detection (M-08) Real-time alerts Head of Engineering
Weekly User error report rate (M-03), override/discard rate (M-04) Weekly metrics digest Head of Product, Head of Engineering
Monthly Citation accuracy (M-02 automated), complaint classification (M-09), out-of-scope use (M-10), logging audit (M-13) Monthly monitoring report AIRO, DPO (data-related metrics)
Quarterly Hallucination/error rate (M-01 expert review), bias signals (M-05), human oversight compliance (M-12), accuracy declaration compliance (M-14) Quarterly performance report AIRO, CEO, Head of Product
Annual Risk management system effectiveness (M-11), cybersecurity posture (M-15), full post-market monitoring plan review Annual AI system review CEO, AIRO, DPO, Board [ASSUMPTION]

5. Dashboard Design

[ASSUMPTION] Pickles GmbH's monitoring dashboard aggregates the above metrics into a single view accessible to the AIRO and senior leadership. Recommended dashboard structure:

5.1 Dashboard Sections

Section A — System Health (Real-time) - Uptime status per system (RAG: green/amber/red) - P95 latency per system - Active incident count (links to L3-6.2 incident log)

Section B — Output Quality (Weekly/Monthly) - User error report rate trend (line chart, 13-week rolling) - Override/discard rate trend (line chart, 13-week rolling) - Complaint volume by category (bar chart, monthly) - Citation accuracy rate (SYS-01, SYS-04) — monthly trend

Section C — High-Risk Compliance (SYS-04) - Last expert review date and result (M-01) - Logging completeness status (M-13) - Drift indicator status (M-08) - Days since last model update and change management status

Section D — Governance - Open action items from previous monitoring reports - Next scheduled expert review date - Assumptions requiring verification (link to ASSUMPTIONS-LOG.md)


6. Escalation Thresholds and Incident Triggers

When a monitoring metric breaches its alert threshold, the following escalation applies:

Trigger Immediate Action Escalation Path
M-01 >5% error rate in expert sample Pause deployment of affected system pending investigation Head of Engineering → AIRO → CEO; activate L3-6.2
M-02 citation accuracy <90% Investigate root cause; suspend marketing claims about accuracy Head of Product → AIRO
M-08 confirmed unauthorised model drift Activate L3-6.3 change management; assess Article 20 / Article 73 obligations Head of Engineering → AIRO → Legal
Any metric meeting Article 3(49) serious incident definition Activate L3-6.2 Incident Response Playbook immediately AIRO → CEO → Legal → Article 73 reporting
GDPR data breach detected Activate L3-6.2 GDPR breach channel; 72-hour GDPR Article 33 clock starts DPO → CEO → Legal

7. Post-Market Monitoring Plan — SYS-04 (Article 72)

This section constitutes the post-market monitoring plan for SYS-04 required by Article 72(3). It must be incorporated into the SYS-04 Technical Documentation Pack (L2-4.2 Section 9).

Element Content
Monitoring objective Evaluate continuous compliance of SYS-04 with Chapter III Section 2 requirements throughout its operational lifetime
Data actively collected Metrics M-01 through M-15 as defined in Section 3; event logs per Article 12
Data from deployers Client-reported errors (M-03, M-09); client-confirmed out-of-scope use reports (M-10) [ASSUMPTION]
Review frequency Per Section 4 schedule; annual comprehensive review
Compliance evaluation trigger Any metric breach (Section 6) triggers compliance re-evaluation against Chapter III Section 2
Corrective action link Metric breaches trigger L3-6.2 (incident) or L3-6.3 (change management) as appropriate
Interaction with other AI systems [ASSUMPTION] SYS-04 does not currently interact with other AI systems — confirm against real architecture; update if integration occurs
Plan update trigger Material change to SYS-04 (L3-6.3); change in regulatory requirements; annual review

8. Cross-References

Document Relevance
L2-4.1 — EU AI Act Risk Mapping Matrix Risk classification determining Article 72 scope
L2-4.2 — Technical Documentation Pack Section 9 — post-market monitoring plan; Section 2.7 — accuracy metrics declared
L3-6.2 — Incident Response Playbook Activated when monitoring triggers Article 3(49) or GDPR breach
L3-6.3 — Model Change Management Protocol Activated when monitoring detects drift or change event
L2-5.3 — Vendor Risk Assessment Model provider update notifications feed into M-08

Document Control

Field Detail
Document ID L3-6.1
Next review Annual; or when SYS-04 model changes materially
Regulatory basis EU AI Act Articles 9, 12, 15, 17(1)(h), 72; ISO/IEC 42001 Clauses 9.1–9.3
Assumptions relied upon A-001, A-004, A-005, A-009