Model Change Management Protocol

Project: Pickles GmbH — AI Governance Framework Stage: Stage 4 — Monitoring & Operational Controls Status: Draft Version: v1 Date: 2026-02-26 Assumptions: Built on outline assumptions — not verified against real Pickles GmbH data

Purpose

This protocol governs all changes to Pickles GmbH's AI systems — whether initiated internally (prompt redesign, retraining, architectural changes) or externally (third-party model provider updates). It ensures that:

Changes are tested before production deployment
Substantial modifications triggering new EU AI Act conformity assessments are identified before deployment
Clients are notified of changes affecting system behaviour
A rollback plan exists for every production change
The Technical Documentation Pack (L2-4.2) stays current

Regulatory basis: - EU AI Act Article 3(23) — definition of substantial modification - EU AI Act Article 43(4) — substantial modification triggers new conformity assessment - EU AI Act Article 17(1)(a) — QMS must include procedures for management of modifications - EU AI Act Article 9(2) — risk management system is an iterative process (changes require risk re-evaluation) - EU AI Act Article 6(4) — self-assessment must be updated if classification changes - ISO/IEC 42001 Clause 10 — continual improvement and corrective action

[ASSUMPTION] The change governance roles and approval thresholds are based on the assumed organisational structure. They must be confirmed against real Pickles GmbH structure before operational use.

1. Change Classification Framework

Every proposed change must first be classified. Classification determines the approval path, testing requirements, and regulatory obligations.

1.1 Change Types

Type	Description	Examples
Type A — Prompt redesign	Changes to system prompts, instruction sets, or retrieval query logic — no model weight changes	Revised instruction wording; updated system prompt; modified retrieval parameters
Type B — Configuration change	Changes to system configuration, output formatting, safety filters, or operational parameters	Adjusting output length limit; enabling/disabling a feature; changing safety filter threshold
Type C — Third-party model update	New version of a third-party model API deployed by the provider	GPT-4o → GPT-4o-mini; Claude 3 → Claude 3.5; model version deprecation
Type D — Fine-tuning / retraining	Pickles GmbH fine-tunes, retrains, or trains a model on new data	Domain fine-tuning on new legal corpus; RLHF update
Type E — Architecture change	Changes to system architecture, infrastructure, or integration patterns	Switching from RAG to fine-tuning; replacing vector database; new API integration
Type F — Provider switch	Replacing the third-party AI model provider entirely	Switching from Provider A to Provider B

1.2 Substantial Modification Assessment

EU AI Act Article 3(23) defines a substantial modification as a change that is: 1. Not foreseen or planned in the initial conformity assessment (see L2-4.2 Section 2.6 — pre-determined changes); AND 2. Affects compliance with Chapter III Section 2 requirements (Articles 9–15), OR modifies the intended purpose of the AI system

Article 43(4): pre-determined changes documented in Annex IV technical documentation (L2-4.2 Section 2.6) do NOT constitute substantial modification.

Substantial modification classification matrix:

Change Type	Pre-determined?	Affects Ch. III S.2 or intended purpose?	Substantial Modification?
Minor prompt wording adjustment within existing scope	Yes (if documented)	No	No
Prompt change that materially alters output type or scope	No	Yes (intended purpose affected)	Yes
Configuration change within pre-defined bounds	Yes (if documented)	No	No
Configuration change outside pre-defined bounds	No	Depends — assess	Assess — likely Yes
Third-party model version update (same capability class)	Possible — if update policy documented	May affect accuracy/robustness (Article 15)	Assess
Major model version change (new capability class)	No	Yes — accuracy, robustness, intended purpose	Yes
Fine-tuning on new domain data	No	May affect Article 10 data governance compliance	Assess — likely Yes
Architecture change affecting core processing	No	Yes	Yes
Provider switch	No	Yes — new system with different characteristics	Yes

[LEGAL REVIEW REQUIRED] Substantial modification assessments for SYS-04 have conformity assessment implications. A qualified EU AI Act practitioner must confirm the assessment for any Type C, D, E, or F change before deployment.

2. Change Request Process

2.1 Initiation

Any change to a Pickles GmbH AI system must begin with a Change Request (CR). CRs may be raised by: - Head of Engineering (technical changes) - Head of Product (product changes) - AIRO (governance-driven changes) - Third-party model provider notification (Type C, F) — received per L2-5.3

Change Request must document:

CHANGE REQUEST — [CR-ID: CR-YYYY-MM-DD-NN]

System affected:
Change type (A/B/C/D/E/F):
Description of proposed change:
Reason for change:
Pre-determined change? (Y/N — reference L2-4.2 Section 2.6 if yes):
Raised by:
Date raised:

2.2 Initial Triage

The AIRO and Head of Engineering perform initial triage within 2 business days of CR receipt:

Triage Question	If Yes
Is this a pre-determined change (L2-4.2 Section 2.6)?	Expedited path — confirm in writing; proceed to Testing (Step 4)
Is this a Type C third-party update with vendor regression data?	Use vendor data to supplement internal testing; proceed to Assessment
Does the change affect the intended purpose of the system?	Substantial modification assessment required
Does the change affect SYS-04 (high-risk) accuracy, robustness, or data governance?	Substantial modification assessment required; notify Legal

3. Substantial Modification Assessment

For any change not confirmed as pre-determined, complete the following assessment before testing:

SUBSTANTIAL MODIFICATION ASSESSMENT — [CR-ID]

1. Does this change modify the intended purpose of the system?
   (Intended purpose is defined in L2-4.2 Section 1.1 for each system)
   Yes / No / Uncertain [LEGAL REVIEW REQUIRED if Uncertain]

2. Does this change affect compliance with any of the following?
   Article 9 (risk management system): Yes / No
   Article 10 (data governance — if retraining): Yes / No
   Article 12 (logging capabilities): Yes / No
   Article 13 (transparency to deployers): Yes / No
   Article 14 (human oversight mechanisms): Yes / No
   Article 15 (accuracy, robustness, cybersecurity): Yes / No

3. Was this change foreseen and documented in L2-4.2 Section 2.6?
   Yes (reference: ___) / No

4. CONCLUSION
   Substantial modification: Yes / No / Requires legal review

   If Yes: New conformity assessment required before deployment (Article 43(4))
   Approved by (AIRO + Legal): ___        Date: ___

4. Testing Protocol

All changes — regardless of classification — undergo testing before production deployment. The depth of testing scales with change risk.

4.1 Testing Tiers

Testing Tier	Applies To	Minimum Requirements
Tier 1 — Smoke test	Type A minor prompt changes; Type B configuration within pre-defined bounds	Run benchmark query suite (n=20); confirm output format and quality unchanged; no regressions on known failure modes
Tier 2 — Standard regression	Type A major prompt changes; Type B outside pre-defined bounds; Type C minor model updates	Full benchmark query suite (n=100); citation accuracy check; latency check; human review of 10 sampled outputs by a qualified legal reviewer [ASSUMPTION]
Tier 3 — Full regression	Type C major model updates; Type D (fine-tuning); Type E (architecture); Type F (provider switch)	Full benchmark query suite (n=200+); citation accuracy; bias check across document types; latency; safety filter validation; human review of 25+ outputs by qualified legal reviewer; AIRO sign-off before staging deployment
Tier 4 — Full regression + third-party validation	Substantial modifications requiring new conformity assessment	All Tier 3 requirements + independent third-party accuracy assessment + new Annex IV technical documentation + conformity assessment before production deployment [LEGAL REVIEW REQUIRED]

4.2 Benchmark Query Suite

[ASSUMPTION] The benchmark query suite is a curated set of test queries covering: - Standard legal research queries across Pickles GmbH's primary practice area coverage - Known edge cases and failure modes identified in previous incidents and monitoring - Citation-heavy queries (for accuracy testing) - Multilingual queries if system supports multiple languages - Queries designed to probe bias and differential performance

The benchmark suite is maintained by the Head of Product and updated after every P1/P2 incident and every major model change. [ASSUMPTION]

4.3 Regression Testing Pass Criteria

A change passes regression testing and may proceed to staging if:

Criterion	Pass Standard
Citation accuracy	≥95% on benchmark suite (no regression from previous baseline)
Error rate	≤2% expert-identified errors in human review sample
Latency	P95 response time within 10% of previous baseline
No new failure modes	No failure modes not present in previous baseline
Safety filter	No outputs violating content safety requirements
Bias check (Tier 3/4)	No statistically significant performance degradation in any tested stratum

4.4 Staging Deployment

Before production deployment, all Tier 2+ changes must be deployed in a staging environment for a minimum period: - Tier 2: 2 business days - Tier 3: 5 business days - Tier 4: As required by conformity assessment timelines

5. Sign-Off Authority

Change Tier	Sign-Off Required
Tier 1	Head of Engineering
Tier 2	Head of Engineering + Head of Product
Tier 3	Head of Engineering + Head of Product + AIRO
Tier 4 (Substantial modification)	AIRO + CEO + Legal (plus conformity assessment body if applicable)

No change to SYS-04 (high-risk) may be deployed to production without AIRO sign-off, regardless of tier.

6. Rollback Plan

Every change deployed to production must have a documented rollback plan approved before deployment.

6.1 Rollback Requirement

ROLLBACK PLAN — [CR-ID]

Previous version/configuration:
Rollback method (how to revert to previous state):
Rollback owner:
Maximum rollback time (from decision to complete):
Rollback trigger conditions:
  - Automatic (monitoring threshold breach): [specify]
  - Manual (AIRO or Head of Engineering decision): [specify]
Data implications of rollback (any data created during new version that is affected):
Client notification required on rollback? (Y/N):

6.2 Rollback Decision Authority

Situation	Rollback Decision Authority
Automated monitoring alert (L3-6.1) triggering rollback condition	Head of Engineering — may initiate immediately
P1 incident linked to recent change	AIRO — may order immediate rollback
P2 incident with probable link to recent change	AIRO + Head of Engineering — joint decision
Regulatory authority instruction (Article 79)	CEO + Legal — mandatory compliance

6.3 Post-Rollback Actions

Following any rollback: 1. Incident log opened (minimum P2) 2. Root cause analysis initiated (L3-6.2 RCA template) 3. Technical documentation updated to record rolled-back change (L2-4.2 Section 6) 4. CR closed as failed; new CR required for revised approach

7. Specific Change Scenarios

7.1 Third-Party Model Provider Update (Type C)

When a provider notifies Pickles GmbH of a model update per L2-5.3 Section 7:

Step	Action	Owner	Timing
1	Log provider notification; open CR	Head of Engineering	Day 0
2	Review provider release notes; classify as minor or major update	Head of Engineering + Head of Product	Day 1–2
3	Assign testing tier (Tier 2 for minor; Tier 3 for major)	AIRO	Day 2
4	Run regression testing in isolated environment	Head of Engineering	Per tier requirements
5	Substantial modification assessment	AIRO + Legal	During testing
6	Staging deployment and monitoring	Head of Engineering	Post-testing
7	Sign-off and production deployment	Per Section 5	After staging
8	Update L2-4.2 Section 6 (lifecycle change log)	Head of Engineering	At deployment
9	Client notification if behaviour materially changes	Head of Product	Before or at deployment

7.2 Prompt Redesign (Type A)

Minor prompt changes may follow an expedited path if they are documented as pre-determined in L2-4.2 Section 2.6. All others follow the standard CR process with Tier 1 or Tier 2 testing.

7.3 Provider Switch (Type F)

A complete provider switch is always a substantial modification requiring Tier 4 testing and new conformity assessment for SYS-04. Additionally: - New DPA must be executed with the new provider (L2-5.3) - §43e BRAO service agreement required - SCCs assessed if new provider is non-EEA - Client notification required before switch

7.4 Retraining / Fine-Tuning (Type D)

Any retraining or fine-tuning on new data requires: - Data governance review under EU AI Act Article 10 (training data quality, bias examination) - L2-4.2 Section 2.4 (training data) updated - Bias assessment (M-05) run on new model before production - Tier 3 or Tier 4 testing depending on scope

8. Documentation Updates Required After Any Change

Document	Section to Update	Trigger
L2-4.2 Technical Documentation Pack	Section 6 — lifecycle changes	Every production change
L2-4.2 Technical Documentation Pack	Section 2.1 — development methods; Section 2.4 — training data	Type D, F changes
L2-4.2 Technical Documentation Pack	Section 2.6 — pre-determined changes	If new change type should be pre-determined in future
L3-6.1 AI Monitoring Framework	Benchmark baselines	After every Tier 3+ change
ASSUMPTIONS-LOG.md	Update relevant assumptions if architecture confirmed	If change reveals or confirms assumptions

9. Change Log

CR-ID	Date	Type	System	Description	Tier	Substantial?	Outcome
—	—	—	—	No changes recorded yet	—	—	—

Document Control

Field	Detail
Document ID	L3-6.3
Next review	Annual; after any Tier 4 change; after any P1 incident linked to a change
Regulatory basis	EU AI Act Articles 3(23), 6(4), 9, 17(1)(a), 43(4); ISO/IEC 42001 Clause 10
Cross-references	L2-4.2 (technical documentation), L2-5.3 (vendor update notifications), L3-6.1 (monitoring drift), L3-6.2 (incident response post-change)
Assumptions relied upon	A-001, A-004, A-009