AI Monitoring Framework
Project: Pickles GmbH — AI Governance Framework Stage: Stage 4 — Monitoring & Operational Controls Status: Draft Version: v1 Date: 2026-02-26 Assumptions: Built on outline assumptions — not verified against real Pickles GmbH data
Purpose
This document defines Pickles GmbH's operational monitoring framework for all deployed AI systems. It fulfils the post-market monitoring system requirement under EU AI Act Article 72 for SYS-04 (high-risk), and establishes good-practice monitoring for SYS-01 through SYS-03 (limited-risk).
Regulatory basis: - EU AI Act Article 72 — Post-market monitoring system (mandatory for SYS-04) - EU AI Act Article 9(2)(c) — Risk management system: continuous iterative process including data from post-market monitoring - EU AI Act Article 15 — Accuracy, robustness, and cybersecurity performance targets - EU AI Act Article 17(1)(h) — Quality management system must include post-market monitoring - EU AI Act Article 12 — Logging requirements for high-risk AI systems - ISO/IEC 42001 Clauses 9.1, 9.2, 9.3 — Performance evaluation
[ASSUMPTION] All metrics, thresholds, and measurement methods in this document are proposed based on the assumed product architecture. They must be validated against real system capabilities and real baseline performance data before operational use.
[LEGAL REVIEW REQUIRED] Article 72(3) requires the post-market monitoring plan for SYS-04 to form part of the Annex IV technical documentation (L2-4.2 Section 9). The Commission implementing act establishing the template for that plan was due by 2 February 2026 — check whether it has been published and update accordingly.
1. Scope
| System | Risk Tier | Monitoring Obligation | Notes |
|---|---|---|---|
| SYS-01 — Legal Research Assistant | Tier 3 Limited-risk | Good practice monitoring | No Article 72 mandate |
| SYS-02 — Document Drafting Tool | Tier 3 Limited-risk | Good practice monitoring | No Article 72 mandate |
| SYS-03 — Document Summarisation Tool | Tier 3 Limited-risk | Good practice monitoring | No Article 72 mandate |
| SYS-04 — Legal Analysis Tool | Tier 2 High-risk | Article 72 mandatory post-market monitoring | Full monitoring plan required |
2. Monitoring Architecture
2.1 Data Collection Sources
Per Article 72(2), monitoring data may be provided by deployers or collected from other sources. [ASSUMPTION] Pickles GmbH collects monitoring data from:
| Source | Data Type | Collection Method |
|---|---|---|
| Platform event logs (Article 12) | Session events, errors, latency | Automated — system-generated [ASSUMPTION] |
| User feedback mechanism (in-product) | User-reported errors, inaccuracies, complaints | In-product flag/report button [ASSUMPTION] |
| Lawyer client reports | Client-escalated accuracy concerns | Structured incident report form [ASSUMPTION] |
| Automated output sampling | Periodic automated quality checks | Sampling pipeline [ASSUMPTION] |
| Expert review panel | Independent legal accuracy assessment | Quarterly panel review [ASSUMPTION] |
| Model provider notifications | Model version changes, known issues | Provider update channel per L2-5.3 |
2.2 Monitoring Responsibilities
[ASSUMPTION]
| Role | Monitoring Responsibility |
|---|---|
| AI Risk and Information Officer (AIRO) | Owns monitoring framework; reviews monthly dashboard; escalates to CEO |
| Head of Engineering | Operates technical monitoring tooling; reviews daily/weekly technical metrics |
| Head of Product | Reviews output quality and user experience metrics; owns expert review panel |
| DPO | Reviews data-related metrics; receives breach-relevant flags |
| Client Success | Aggregates and categorises client-reported concerns |
3. Metrics Framework
3.1 Primary Metrics Table
| # | Metric | Description | System Scope | Measurement Method | Frequency | Alert Threshold [ASSUMPTION] |
|---|---|---|---|---|---|---|
| M-01 | Hallucination / factual error rate | Rate of AI outputs containing demonstrably false or unsupported legal statements | SYS-01, SYS-02, SYS-03, SYS-04 | Expert review panel sampling (n=50 outputs per system per quarter); user-reported errors normalised to output volume | Quarterly (expert); ongoing (user-reported) | >2% expert-identified errors per sample triggers review; >5% triggers incident |
| M-02 | Citation accuracy rate | Rate of legal citations (case references, article numbers, legislation) that are correctly identified and traceable | SYS-01, SYS-04 | Automated citation verification against legal database [ASSUMPTION]; monthly sample audit | Monthly automated; quarterly audit | <95% accuracy triggers investigation; <90% triggers incident |
| M-03 | User error report rate | Volume of user-submitted error/inaccuracy reports per 1,000 outputs | All systems | In-product reporting tool; normalised to output volume | Weekly | >5 reports per 1,000 outputs triggers product review |
| M-04 | Output override / discard rate | Rate at which users actively discard or override AI outputs | All systems | User action logging (override/discard events) | Weekly | Significant increase (>20% week-on-week) triggers investigation — may indicate quality degradation or automation bias reduction |
| M-05 | Bias signal monitoring | Differential performance across document language (German/English), practice area, or document type | SYS-01, SYS-04 | Stratified sampling by category; compare error rates across strata | Quarterly | Statistically significant performance gap between strata triggers bias investigation |
| M-06 | System availability / uptime | Percentage of time system is available and responsive | All systems | Infrastructure monitoring | Continuous | <99.5% monthly uptime triggers engineering review |
| M-07 | Latency — P95 response time | 95th percentile response time for standard queries | All systems | Infrastructure monitoring | Continuous | >10 seconds P95 triggers engineering review [ASSUMPTION] |
| M-08 | Model drift indicator | Detected change in output distribution, style, or behaviour without an authorised model update | SYS-04 | Automated comparison of output embedding distribution against baseline; triggered on model provider update notifications | Continuous; reviewed monthly | Any statistically significant drift not attributable to an authorised update triggers L3-6.3 change management |
| M-09 | Complaint volume and classification | Number and category of formal client complaints relating to AI output quality | All systems | Client complaints log (Client Success); AIRO review | Monthly | >3 complaints per month in same category triggers root cause analysis |
| M-10 | Out-of-scope use detection | Queries or use patterns outside the system's intended purpose | SYS-04 | Log analysis for query patterns outside defined intended purpose categories [ASSUMPTION] | Monthly | Confirmed out-of-scope use pattern triggers client communication and Terms of Service review |
3.2 SYS-04 High-Risk Specific Metrics (Article 72 — Post-Market Monitoring)
In addition to M-01 through M-10, SYS-04 requires the following Article 72-specific monitoring to evaluate continuous compliance with Chapter III Section 2 requirements:
| # | Metric | Chapter III Section 2 Requirement | Measurement Method |
|---|---|---|---|
| M-11 | Risk management system effectiveness | Article 9 — ongoing risk management | Annual risk management review; compare identified risks against incident log |
| M-12 | Human oversight compliance | Article 14 — users able to override, disregard, halt | Quarterly audit of override/discard rate (M-04); user competence assessment [ASSUMPTION] |
| M-13 | Logging completeness | Article 12 — automatic event logging | Monthly log audit — confirm all session events are captured; no gaps |
| M-14 | Accuracy declaration compliance | Article 15(3) — accuracy metrics declared in instructions for use | Quarterly comparison of actual accuracy (M-01, M-02) against declared metrics; flag if actual performance falls below declared level |
| M-15 | Cybersecurity posture | Article 15(5) — resilience against adversarial inputs | Annual penetration test; quarterly review of prompt injection defence logs [ASSUMPTION] |
4. Monitoring Frequency and Reporting Schedule
| Frequency | Activities | Output | Recipient [ASSUMPTION] |
|---|---|---|---|
| Continuous | Uptime, latency, error alerts (M-06, M-07), drift detection (M-08) | Real-time alerts | Head of Engineering |
| Weekly | User error report rate (M-03), override/discard rate (M-04) | Weekly metrics digest | Head of Product, Head of Engineering |
| Monthly | Citation accuracy (M-02 automated), complaint classification (M-09), out-of-scope use (M-10), logging audit (M-13) | Monthly monitoring report | AIRO, DPO (data-related metrics) |
| Quarterly | Hallucination/error rate (M-01 expert review), bias signals (M-05), human oversight compliance (M-12), accuracy declaration compliance (M-14) | Quarterly performance report | AIRO, CEO, Head of Product |
| Annual | Risk management system effectiveness (M-11), cybersecurity posture (M-15), full post-market monitoring plan review | Annual AI system review | CEO, AIRO, DPO, Board [ASSUMPTION] |
5. Dashboard Design
[ASSUMPTION] Pickles GmbH's monitoring dashboard aggregates the above metrics into a single view accessible to the AIRO and senior leadership. Recommended dashboard structure:
5.1 Dashboard Sections
Section A — System Health (Real-time) - Uptime status per system (RAG: green/amber/red) - P95 latency per system - Active incident count (links to L3-6.2 incident log)
Section B — Output Quality (Weekly/Monthly) - User error report rate trend (line chart, 13-week rolling) - Override/discard rate trend (line chart, 13-week rolling) - Complaint volume by category (bar chart, monthly) - Citation accuracy rate (SYS-01, SYS-04) — monthly trend
Section C — High-Risk Compliance (SYS-04) - Last expert review date and result (M-01) - Logging completeness status (M-13) - Drift indicator status (M-08) - Days since last model update and change management status
Section D — Governance - Open action items from previous monitoring reports - Next scheduled expert review date - Assumptions requiring verification (link to ASSUMPTIONS-LOG.md)
6. Escalation Thresholds and Incident Triggers
When a monitoring metric breaches its alert threshold, the following escalation applies:
| Trigger | Immediate Action | Escalation Path |
|---|---|---|
| M-01 >5% error rate in expert sample | Pause deployment of affected system pending investigation | Head of Engineering → AIRO → CEO; activate L3-6.2 |
| M-02 citation accuracy <90% | Investigate root cause; suspend marketing claims about accuracy | Head of Product → AIRO |
| M-08 confirmed unauthorised model drift | Activate L3-6.3 change management; assess Article 20 / Article 73 obligations | Head of Engineering → AIRO → Legal |
| Any metric meeting Article 3(49) serious incident definition | Activate L3-6.2 Incident Response Playbook immediately | AIRO → CEO → Legal → Article 73 reporting |
| GDPR data breach detected | Activate L3-6.2 GDPR breach channel; 72-hour GDPR Article 33 clock starts | DPO → CEO → Legal |
7. Post-Market Monitoring Plan — SYS-04 (Article 72)
This section constitutes the post-market monitoring plan for SYS-04 required by Article 72(3). It must be incorporated into the SYS-04 Technical Documentation Pack (L2-4.2 Section 9).
| Element | Content |
|---|---|
| Monitoring objective | Evaluate continuous compliance of SYS-04 with Chapter III Section 2 requirements throughout its operational lifetime |
| Data actively collected | Metrics M-01 through M-15 as defined in Section 3; event logs per Article 12 |
| Data from deployers | Client-reported errors (M-03, M-09); client-confirmed out-of-scope use reports (M-10) [ASSUMPTION] |
| Review frequency | Per Section 4 schedule; annual comprehensive review |
| Compliance evaluation trigger | Any metric breach (Section 6) triggers compliance re-evaluation against Chapter III Section 2 |
| Corrective action link | Metric breaches trigger L3-6.2 (incident) or L3-6.3 (change management) as appropriate |
| Interaction with other AI systems | [ASSUMPTION] SYS-04 does not currently interact with other AI systems — confirm against real architecture; update if integration occurs |
| Plan update trigger | Material change to SYS-04 (L3-6.3); change in regulatory requirements; annual review |
8. Cross-References
| Document | Relevance |
|---|---|
| L2-4.1 — EU AI Act Risk Mapping Matrix | Risk classification determining Article 72 scope |
| L2-4.2 — Technical Documentation Pack | Section 9 — post-market monitoring plan; Section 2.7 — accuracy metrics declared |
| L3-6.2 — Incident Response Playbook | Activated when monitoring triggers Article 3(49) or GDPR breach |
| L3-6.3 — Model Change Management Protocol | Activated when monitoring detects drift or change event |
| L2-5.3 — Vendor Risk Assessment | Model provider update notifications feed into M-08 |
Document Control
| Field | Detail |
|---|---|
| Document ID | L3-6.1 |
| Next review | Annual; or when SYS-04 model changes materially |
| Regulatory basis | EU AI Act Articles 9, 12, 15, 17(1)(h), 72; ISO/IEC 42001 Clauses 9.1–9.3 |
| Assumptions relied upon | A-001, A-004, A-005, A-009 |