Skip to content

Model Change Management Protocol

Project: Pickles GmbH — AI Governance Framework Stage: Stage 4 — Monitoring & Operational Controls Status: Draft Version: v1 Date: 2026-02-26 Assumptions: Built on outline assumptions — not verified against real Pickles GmbH data


Purpose

This protocol governs all changes to Pickles GmbH's AI systems — whether initiated internally (prompt redesign, retraining, architectural changes) or externally (third-party model provider updates). It ensures that:

  • Changes are tested before production deployment
  • Substantial modifications triggering new EU AI Act conformity assessments are identified before deployment
  • Clients are notified of changes affecting system behaviour
  • A rollback plan exists for every production change
  • The Technical Documentation Pack (L2-4.2) stays current

Regulatory basis: - EU AI Act Article 3(23) — definition of substantial modification - EU AI Act Article 43(4) — substantial modification triggers new conformity assessment - EU AI Act Article 17(1)(a) — QMS must include procedures for management of modifications - EU AI Act Article 9(2) — risk management system is an iterative process (changes require risk re-evaluation) - EU AI Act Article 6(4) — self-assessment must be updated if classification changes - ISO/IEC 42001 Clause 10 — continual improvement and corrective action

[ASSUMPTION] The change governance roles and approval thresholds are based on the assumed organisational structure. They must be confirmed against real Pickles GmbH structure before operational use.


1. Change Classification Framework

Every proposed change must first be classified. Classification determines the approval path, testing requirements, and regulatory obligations.

1.1 Change Types

Type Description Examples
Type A — Prompt redesign Changes to system prompts, instruction sets, or retrieval query logic — no model weight changes Revised instruction wording; updated system prompt; modified retrieval parameters
Type B — Configuration change Changes to system configuration, output formatting, safety filters, or operational parameters Adjusting output length limit; enabling/disabling a feature; changing safety filter threshold
Type C — Third-party model update New version of a third-party model API deployed by the provider GPT-4o → GPT-4o-mini; Claude 3 → Claude 3.5; model version deprecation
Type D — Fine-tuning / retraining Pickles GmbH fine-tunes, retrains, or trains a model on new data Domain fine-tuning on new legal corpus; RLHF update
Type E — Architecture change Changes to system architecture, infrastructure, or integration patterns Switching from RAG to fine-tuning; replacing vector database; new API integration
Type F — Provider switch Replacing the third-party AI model provider entirely Switching from Provider A to Provider B

1.2 Substantial Modification Assessment

EU AI Act Article 3(23) defines a substantial modification as a change that is: 1. Not foreseen or planned in the initial conformity assessment (see L2-4.2 Section 2.6 — pre-determined changes); AND 2. Affects compliance with Chapter III Section 2 requirements (Articles 9–15), OR modifies the intended purpose of the AI system

Article 43(4): pre-determined changes documented in Annex IV technical documentation (L2-4.2 Section 2.6) do NOT constitute substantial modification.

Substantial modification classification matrix:

Change Type Pre-determined? Affects Ch. III S.2 or intended purpose? Substantial Modification?
Minor prompt wording adjustment within existing scope Yes (if documented) No No
Prompt change that materially alters output type or scope No Yes (intended purpose affected) Yes
Configuration change within pre-defined bounds Yes (if documented) No No
Configuration change outside pre-defined bounds No Depends — assess Assess — likely Yes
Third-party model version update (same capability class) Possible — if update policy documented May affect accuracy/robustness (Article 15) Assess
Major model version change (new capability class) No Yes — accuracy, robustness, intended purpose Yes
Fine-tuning on new domain data No May affect Article 10 data governance compliance Assess — likely Yes
Architecture change affecting core processing No Yes Yes
Provider switch No Yes — new system with different characteristics Yes

[LEGAL REVIEW REQUIRED] Substantial modification assessments for SYS-04 have conformity assessment implications. A qualified EU AI Act practitioner must confirm the assessment for any Type C, D, E, or F change before deployment.


2. Change Request Process

2.1 Initiation

Any change to a Pickles GmbH AI system must begin with a Change Request (CR). CRs may be raised by: - Head of Engineering (technical changes) - Head of Product (product changes) - AIRO (governance-driven changes) - Third-party model provider notification (Type C, F) — received per L2-5.3

Change Request must document:

CHANGE REQUEST — [CR-ID: CR-YYYY-MM-DD-NN]

System affected:
Change type (A/B/C/D/E/F):
Description of proposed change:
Reason for change:
Pre-determined change? (Y/N — reference L2-4.2 Section 2.6 if yes):
Raised by:
Date raised:

2.2 Initial Triage

The AIRO and Head of Engineering perform initial triage within 2 business days of CR receipt:

Triage Question If Yes
Is this a pre-determined change (L2-4.2 Section 2.6)? Expedited path — confirm in writing; proceed to Testing (Step 4)
Is this a Type C third-party update with vendor regression data? Use vendor data to supplement internal testing; proceed to Assessment
Does the change affect the intended purpose of the system? Substantial modification assessment required
Does the change affect SYS-04 (high-risk) accuracy, robustness, or data governance? Substantial modification assessment required; notify Legal

3. Substantial Modification Assessment

For any change not confirmed as pre-determined, complete the following assessment before testing:

SUBSTANTIAL MODIFICATION ASSESSMENT — [CR-ID]

1. Does this change modify the intended purpose of the system?
   (Intended purpose is defined in L2-4.2 Section 1.1 for each system)
   Yes / No / Uncertain [LEGAL REVIEW REQUIRED if Uncertain]

2. Does this change affect compliance with any of the following?
   Article 9 (risk management system): Yes / No
   Article 10 (data governance — if retraining): Yes / No
   Article 12 (logging capabilities): Yes / No
   Article 13 (transparency to deployers): Yes / No
   Article 14 (human oversight mechanisms): Yes / No
   Article 15 (accuracy, robustness, cybersecurity): Yes / No

3. Was this change foreseen and documented in L2-4.2 Section 2.6?
   Yes (reference: ___) / No

4. CONCLUSION
   Substantial modification: Yes / No / Requires legal review

   If Yes: New conformity assessment required before deployment (Article 43(4))
   Approved by (AIRO + Legal): ___        Date: ___

4. Testing Protocol

All changes — regardless of classification — undergo testing before production deployment. The depth of testing scales with change risk.

4.1 Testing Tiers

Testing Tier Applies To Minimum Requirements
Tier 1 — Smoke test Type A minor prompt changes; Type B configuration within pre-defined bounds Run benchmark query suite (n=20); confirm output format and quality unchanged; no regressions on known failure modes
Tier 2 — Standard regression Type A major prompt changes; Type B outside pre-defined bounds; Type C minor model updates Full benchmark query suite (n=100); citation accuracy check; latency check; human review of 10 sampled outputs by a qualified legal reviewer [ASSUMPTION]
Tier 3 — Full regression Type C major model updates; Type D (fine-tuning); Type E (architecture); Type F (provider switch) Full benchmark query suite (n=200+); citation accuracy; bias check across document types; latency; safety filter validation; human review of 25+ outputs by qualified legal reviewer; AIRO sign-off before staging deployment
Tier 4 — Full regression + third-party validation Substantial modifications requiring new conformity assessment All Tier 3 requirements + independent third-party accuracy assessment + new Annex IV technical documentation + conformity assessment before production deployment [LEGAL REVIEW REQUIRED]

4.2 Benchmark Query Suite

[ASSUMPTION] The benchmark query suite is a curated set of test queries covering: - Standard legal research queries across Pickles GmbH's primary practice area coverage - Known edge cases and failure modes identified in previous incidents and monitoring - Citation-heavy queries (for accuracy testing) - Multilingual queries if system supports multiple languages - Queries designed to probe bias and differential performance

The benchmark suite is maintained by the Head of Product and updated after every P1/P2 incident and every major model change. [ASSUMPTION]

4.3 Regression Testing Pass Criteria

A change passes regression testing and may proceed to staging if:

Criterion Pass Standard
Citation accuracy ≥95% on benchmark suite (no regression from previous baseline)
Error rate ≤2% expert-identified errors in human review sample
Latency P95 response time within 10% of previous baseline
No new failure modes No failure modes not present in previous baseline
Safety filter No outputs violating content safety requirements
Bias check (Tier 3/4) No statistically significant performance degradation in any tested stratum

4.4 Staging Deployment

Before production deployment, all Tier 2+ changes must be deployed in a staging environment for a minimum period: - Tier 2: 2 business days - Tier 3: 5 business days - Tier 4: As required by conformity assessment timelines


5. Sign-Off Authority

Change Tier Sign-Off Required
Tier 1 Head of Engineering
Tier 2 Head of Engineering + Head of Product
Tier 3 Head of Engineering + Head of Product + AIRO
Tier 4 (Substantial modification) AIRO + CEO + Legal (plus conformity assessment body if applicable)

No change to SYS-04 (high-risk) may be deployed to production without AIRO sign-off, regardless of tier.


6. Rollback Plan

Every change deployed to production must have a documented rollback plan approved before deployment.

6.1 Rollback Requirement

ROLLBACK PLAN — [CR-ID]

Previous version/configuration:
Rollback method (how to revert to previous state):
Rollback owner:
Maximum rollback time (from decision to complete):
Rollback trigger conditions:
  - Automatic (monitoring threshold breach): [specify]
  - Manual (AIRO or Head of Engineering decision): [specify]
Data implications of rollback (any data created during new version that is affected):
Client notification required on rollback? (Y/N):

6.2 Rollback Decision Authority

Situation Rollback Decision Authority
Automated monitoring alert (L3-6.1) triggering rollback condition Head of Engineering — may initiate immediately
P1 incident linked to recent change AIRO — may order immediate rollback
P2 incident with probable link to recent change AIRO + Head of Engineering — joint decision
Regulatory authority instruction (Article 79) CEO + Legal — mandatory compliance

6.3 Post-Rollback Actions

Following any rollback: 1. Incident log opened (minimum P2) 2. Root cause analysis initiated (L3-6.2 RCA template) 3. Technical documentation updated to record rolled-back change (L2-4.2 Section 6) 4. CR closed as failed; new CR required for revised approach


7. Specific Change Scenarios

7.1 Third-Party Model Provider Update (Type C)

When a provider notifies Pickles GmbH of a model update per L2-5.3 Section 7:

Step Action Owner Timing
1 Log provider notification; open CR Head of Engineering Day 0
2 Review provider release notes; classify as minor or major update Head of Engineering + Head of Product Day 1–2
3 Assign testing tier (Tier 2 for minor; Tier 3 for major) AIRO Day 2
4 Run regression testing in isolated environment Head of Engineering Per tier requirements
5 Substantial modification assessment AIRO + Legal During testing
6 Staging deployment and monitoring Head of Engineering Post-testing
7 Sign-off and production deployment Per Section 5 After staging
8 Update L2-4.2 Section 6 (lifecycle change log) Head of Engineering At deployment
9 Client notification if behaviour materially changes Head of Product Before or at deployment

7.2 Prompt Redesign (Type A)

Minor prompt changes may follow an expedited path if they are documented as pre-determined in L2-4.2 Section 2.6. All others follow the standard CR process with Tier 1 or Tier 2 testing.

7.3 Provider Switch (Type F)

A complete provider switch is always a substantial modification requiring Tier 4 testing and new conformity assessment for SYS-04. Additionally: - New DPA must be executed with the new provider (L2-5.3) - §43e BRAO service agreement required - SCCs assessed if new provider is non-EEA - Client notification required before switch

7.4 Retraining / Fine-Tuning (Type D)

Any retraining or fine-tuning on new data requires: - Data governance review under EU AI Act Article 10 (training data quality, bias examination) - L2-4.2 Section 2.4 (training data) updated - Bias assessment (M-05) run on new model before production - Tier 3 or Tier 4 testing depending on scope


8. Documentation Updates Required After Any Change

Document Section to Update Trigger
L2-4.2 Technical Documentation Pack Section 6 — lifecycle changes Every production change
L2-4.2 Technical Documentation Pack Section 2.1 — development methods; Section 2.4 — training data Type D, F changes
L2-4.2 Technical Documentation Pack Section 2.6 — pre-determined changes If new change type should be pre-determined in future
L3-6.1 AI Monitoring Framework Benchmark baselines After every Tier 3+ change
ASSUMPTIONS-LOG.md Update relevant assumptions if architecture confirmed If change reveals or confirms assumptions

9. Change Log

CR-ID Date Type System Description Tier Substantial? Outcome
No changes recorded yet

Document Control

Field Detail
Document ID L3-6.3
Next review Annual; after any Tier 4 change; after any P1 incident linked to a change
Regulatory basis EU AI Act Articles 3(23), 6(4), 9, 17(1)(a), 43(4); ISO/IEC 42001 Clause 10
Cross-references L2-4.2 (technical documentation), L2-5.3 (vendor update notifications), L3-6.1 (monitoring drift), L3-6.2 (incident response post-change)
Assumptions relied upon A-001, A-004, A-009