Auditable AI by Construction: SI-Core for Regulators and Auditors

Community Article Published December 26, 2025

Draft v0.1 — Non‑normative supplement to SI‑Core / SI‑NOS / SIM/SIS / Ethics Interface / GDPR Ethical Redaction docs

This document is non‑normative. It explains how to reason about and use SI‑Core as a foundation for auditable, controllable AI systems, especially from the perspective of regulators, auditors, and risk/governance teams.

Normative contracts live in the SI‑Core Spec, SI‑NOS design docs, SIM/SIS/SCP specs, the Ethics Interface spec, and the GDPR Ethical Redaction / OSS supply‑chain supplements.

1. What regulators actually want: a control plane and evidence

Most AI debates talk about accuracy and capabilities. Regulators and auditors usually care about something more basic:

Can we see what it knew when it acted?
Can we tell who (or what) initiated the action?
Can we stop it, or roll it back, when we discover harm?
Can we prove to a third party what happened?

Informally, regulators are asking for two things:

(1) A CONTROL PLANE
    "Where do we push STOP / SLOW / MORE AUDIT / SAFE-MODE?"

(2) EVIDENCE
    "What trace do we get when something goes wrong?"

Traditional AI stacks answer these with a mix of:

log scraping,
ad‑hoc dashboards,
“model cards” and PDFs,
and a lot of after‑the‑fact forensics.

SI‑Core takes a different stance: it defines a runtime architecture where the control plane and the evidence are first‑class, not bolt‑ons.

The core invariants are:

[OBS] Observation Gate – no action without a well‑formed observation;
[ID] Identity & Origin – who/what initiated this jump, in which role;
[ETH] Ethics Interface – which ethics/policy lens was applied;
[EVAL] Evaluation & Risk Gating – high‑risk moves must be gated;
[MEM] Memory & Audit – hash‑chained, append‑only logs;
RML‑1/2/3 Rollback – how far and how reliably you can undo.

This document explains what those invariants buy you, in regulatory terms, and how semantic compression, goal‑native algorithms, and the ethics interface combine into an explainable, inspectable history of behavior.

2. What SI‑Core invariants guarantee (for auditors)

This section restates the SI‑Core invariants in regulator‑facing language. Precise definitions live in the Core Spec; here we focus on what they guarantee when you have to give answers.

2.1 [OBS] — Observation Gate

Technical:

Every “jump” (effectful decision) must be preceded by a parsed observation.
If Observation-Status != PARSED, the jump must not execute.
Observation structure (fields, types, coverage metrics) is explicit and logged.

Regulatory mapping:

“What did the system know when it acted?” → the [OBS] record.
“Did it have enough information to make this decision?” → coverage metrics + semantic compression policies.
“Did it act under poor or missing data?” → under‑observation is visible as a status, not silently ignored.

2.2 [ID] — Identity & Origin

Technical:

Each jump carries an IdentityConstruct:
- actor – which system or human triggered this,
- role – planner / executor / reviewer / batch job,
- origin – self‑generated vs external instruction (and from where).

Regulatory mapping:

“Who is accountable for this decision?” → actor/role.
“Was this AI acting on its own reasoning, or executing an external command?” → origin.
“Can we segregate decisions by business unit / geography / processor?” → actors can be namespaced and filtered.

2.3 [ETH] — Ethics Interface

Technical:

All effectful jumps pass through an ethics overlay that:
- evaluates actions against policy rules and constraints,
- returns a canonical decision: ALLOW | DENY | ESCALATE,
- writes an EthicsTrace per decision.
Ethics policies are versioned, and the version used is attached to the trace.

Note (presentation vs. canonical): Implementations may display UI-friendly labels like “APPROVED/REJECTED”, but audit logs SHOULD record the canonical ALLOW|DENY|ESCALATE to keep interop consistent across vendors and regulators.

An EthicsTrace typically includes:

{
  "initiator": "city.orchestrator",
  "viewpoint_base": ["city.residents", "city.hospitals"],
  "policy_version": "ETH-FLOOD-002@v2.3",
  "decision": "ALLOW",
  "decision_label": "APPROVED",
  "rationale": "within legal bounds; safety goal dominates",
  "structural_consequence": "gate_offset=+12cm, valid_for=15min"
}

Regulatory mapping:

“Which policy lens was actually used?” → policy_version.
“Were vulnerable groups considered?” → viewpoint_base and fairness constraints.
“Can we show that a particular decision followed the declared policy?” → compare EthicsTrace to the policy definition.

Ethics is not a PDF on a shelf; it is a runtime layer that leaves a cryptographically chained footprint.

2.4 [EVAL] — Evaluation & Risk Gating

Technical:

Every jump carries a risk profile.
For high‑risk contexts, an Evaluator must approve:
- may run sandbox simulations,
- must be audited distinctly as [EVAL] activity,
- cannot commit external effects directly.

Regulatory mapping:

“What did you do differently for ‘high‑risk’ decisions?” → [EVAL] paths, sandbox logs.
“Can you show that safety‑critical operations had extra scrutiny?” → risk profiles + evaluator outcomes.

2.5 [MEM] — Memory & Audit

Technical:

Jumps and their traces are recorded in a hash‑chained audit ledger.
External effects (e.g., DB writes, actuator commands) are recorded in an effect ledger with idempotent compensators.
Ledgers are append‑only; redaction and masking use ethical redaction schemes rather than in‑place edits.

Regulatory mapping:

“Can you prove that you are not silently rewriting history?” → hash chain + WORM storage.
“How do you honour erasure requests without destroying evidence?” → see §4 (GDPR Ethical Redaction).

2.6 RML‑1/2/3 — Rollback Maturity Levels

Technical (very short):

RML‑1 — local state snapshots; undo internal state only.
RML‑2 — snapshots + compensating transactions; undo external effects within a bounded scope.
RML‑3 — cross‑system effect ledger and reconciler; multi‑system rollback with causal consistency.

Regulatory mapping:

“If this goes wrong, how far back can you put the world?”
“Can you show that rollback actually works?” → RIR (Rollback Integrity Rate) and RBL (Rollback Latency) metrics; scheduled drills.

Put together, [OBS][ID][ETH][EVAL][MEM] + RML define a governed execution environment that is much closer to financial‑grade infrastructure than to “a model in a box”.

2.7 Mapping to other regulatory frameworks (illustrative, non-normative)

SI-Core is not tied to a single law. Its structural guarantees map naturally to many regulatory families.

EU AI Act (high-level alignment)

High-risk AI requirements (drafted form) typically expect:

Risk management system
→ SI-Core: [EVAL] modules + explicit risk profiles for jumps.
Data and data governance
→ SI-Core: [OBS] contracts + SIM/SIS with typed semantic units and backing_refs.
Technical documentation
→ SI-Core: Self-Declaration + conformance class + metric SLOs and dashboards.
Record-keeping and logging
→ SI-Core: [MEM] audit ledgers, effect ledgers, rollback traces (RML-1/2/3).
Transparency & explainability
→ SI-Core: EthicsTrace + GCS vectors + structured observation frames.
Human oversight
→ SI-Core: safe-mode, escalation paths, human-in-the-loop gates for high-risk jumps.
Accuracy, robustness, cybersecurity
→ SI-Core: CAS / SCI / RBL / RIR and related metrics from the SI Evaluation Pack.

Medical device–style regimes (e.g. MDR/IVDR)

Typical expectations:

Clinical / real-world performance evaluation
→ SI-Core: GCS and outcome metrics over time; replayable decision traces.
Post-market surveillance
→ SI-Core: continuous metrics (EAI, SCI, CAS, RBL/RIR) + incident logs.
Traceability
→ SI-Core: [MEM] + .sirrev + backing_refs to both semantic and raw data.

Financial regulations (e.g. algorithmic trading rules, MiFID-style)

Best execution / fair treatment
→ SI-Core: goal-native GCS with explicit trade-offs; structured logs of how trades were chosen.
Record-keeping for orders and decisions
→ SI-Core: hash-chained ledgers for jumps and effects with declared retention.
Algorithmic oversight and kill-switches
→ SI-Core: [EVAL] gates, safe-mode, and rollback capabilities (RML-2/3).

ISO / IEC–style standards

AI management systems (e.g. ISO/IEC 42001-type)
→ SI-Core: provides the runtime control-plane where policies are actually enforced.
AI risk management (e.g. ISO/IEC 23894-type)
→ SI-Core: makes risk identification, mitigation, monitoring and review concrete at jump level.

Key point:
SI-Core’s invariants ([OBS][ETH][MEM][ID][EVAL] + RML) are structural.
That makes them reusable across GDPR, AI-Act-style rules, sectoral regulations, and ISO-type standards — you map different legal texts onto the same runtime surface.

3. From models to explainable histories: semantics, goals, ethics

The SI stack does not stop at SI‑Core. Three additional pieces make histories structurally explainable rather than just “logs of tokens”:

Semantic compression (SCE / SIM / SIS / SCP) — what the system saw;
Goal‑Native Algorithms / GCS — what the system tried to optimize;
Ethics Interface — what constraints and trade‑offs were applied.

3.1 What the system saw: semantic memories

Instead of raw sensor ticks or opaque embeddings, SI‑Core consumes semantic units from SIM/SIS:

{
  "type": "city.flood_risk_state/v1",
  "scope": {"sector": 12, "horizon_min": 60},
  "payload": {
    "risk_score": 0.73,
    "expected_damage_eur": 1.9e6
  },
  "confidence": 0.87,
  "backing_refs": [
    "sim://city/sensor_grid/sector-12@t=2028-04-01T04:55Z"
  ],
  "goals": [
    "city.flood_risk_minimization",
    "city.hospital_access"
  ]
}

From an audit perspective:

Observations are typed and scoped.
You can follow backing_refs back into SIS for forensic reconstruction.
You can see which goals each piece of information was considered relevant to.

3.2 What the system tried to optimize: Goal Contribution Scores (GCS)

In the goal‑native layer, decisions are evaluated against explicit goal objects with Goal Contribution Scores (GCS).

A simplified log snippet around a floodgate decision might look like:

{
  "jump_id": "J-2028-04-01T05:00:03.340Z-123",
  "class": "L2",
  "obs_ref": "OBS-2028-04-01T05:00:01.112Z-987",
  "candidate_actions": ["OPEN_10CM", "OPEN_20CM", "NO_CHANGE"],
  "chosen_action": "OPEN_12CM",
  "gcs": {
    "city.flood_risk_minimization": +0.72,
    "city.hospital_access": +0.31,
    "city.traffic_efficiency": -0.18
  },
  "policy": "CITY-FLOOD-POLICY@v1.4"
}

This answers the regulatory question:

“What trade‑offs did you actually make?”

Not in prose, but as structured vectors that can be:

aggregated,
monitored over time,
compared across regions or populations,
checked against declared policies.

3.3 What constraints were enforced: EthicsTrace

When combined with the EthicsTrace, the same jump becomes reconstructable:

{
  "jump_id": "J-...-123",
  "ethics_trace": {
    "viewpoint_base": ["city.residents", "city.hospitals"],
    "constraints_checked": [
      "ETH-FLOOD-BOUNDS",
      "ETH-HOSPITAL-ACCESS-MIN",
      "ETH-FAIRNESS-GEOGRAPHIC"
    ],
    "decision": "ALLOW",
    "decision_label": "APPROVED",
    "policy_version": "ETH-FLOOD-002@v2.3",
    "dilemma": null
  }
}

An auditor can now ask:

“Did you consistently apply the same fairness constraint across sectors?” → check constraint counters.
“Did a new ethics policy version degrade outcomes for a protected group?” → correlate policy versions with goal metrics.

This is explainability as a by‑product of the runtime, not an after‑the‑fact report.

4. GDPR Ethical Redaction, logs, and training data

One of the hardest tensions in regulated AI is between:

auditability (keep detailed logs), and
data protection / erasure (delete personal data on request).

Naively:

Audit logs want to be append‑only, immutable.
Data protection law wants you to remove or anonymize personal data on demand.

The GDPR Ethical Redaction supplement and SI‑Core’s [MEM]/[ETH] invariants are designed to work together:

4.1 Tokenization and indirection

Instead of storing raw PII everywhere, systems store tokens:

user_email   → pii_token: "pii:email:8f3c..."
user_address → pii_token: "pii:addr:1a9b..."

Application data, logs, and even training data refer to the tokens, not the raw values.
A separate, tightly controlled PII vault holds the mapping pii_token → cleartext, with its own retention & access policies.

4.2 Redaction by cryptographic shredding

To honour an erasure request:

Find all PII tokens belonging to the subject.
Destroy (or cryptographically shred) the mapping entries for those tokens.
Leave the tokens themselves in the logs and training data.

After that:

The system can still show structural evidence of decisions (same jump logs, same GCS vectors).
But it can no longer reconstruct the person’s identity from those tokens.

An Ethical Redaction Proof is then written into a dedicated ledger:

{
  "redaction_id": "ER-2028-04-01-123",
  "subject_ref": "data_subject:XYZ",
  "tokens_shredded": ["pii:email:8f3c...", "pii:addr:1a9b..."],
  "trigger": "DSR-REQ-2028-03-31-456",
  "performed_by": "dpo@city.gov",
  "timestamp": "2028-04-01T09:12:33Z"
}

This becomes part of the [MEM] audit story:

“Show that you honoured erasure requests.” → redaction ledger.
“Show that you did not destroy evidence.” → jump/effect ledgers remain intact, but de‑identified.

4.3 Training data and model behaviour

For training data:

Training corpora reference tokens; models see pseudonymous patterns.
When tokens are shredded, you can:
- prevent those tokens from appearing in future prompts/outputs (via filters),
- and gradually refresh models so that memorized links decay.

The key point for regulators:

SI‑Core + Ethical Redaction give you a structured way to balance auditability and erasure, instead of an ad‑hoc collection of “delete scripts”.

4.4 Cross-border data flows and sovereignty

Real systems often operate across jurisdictions with different data-protection regimes. SI-Core does not solve policy debates, but it gives you the handles to implement and audit them.

4.4.1 Jurisdiction-aware metadata

Semantic units and audit records can carry:

{
  "subject_jurisdiction": ["EU", "DE"],
  "data_residency": "EU",
  "processing_location": "US-WEST",
  "legal_basis": ["GDPR.art.6.1.a", "SCC-2021"]
}

This makes “where did this data come from” and “under which regime is it processed” first-class fields, not comments in a policy PDF.

4.4.2 Policy federation

Ethics and data-use policies can be scoped:

ETH-EU@v1.0 for EU-subject data.
ETH-US@v1.0 for US-subject data.
ETH-GLOBAL@v1.0 for cross-border operations.

[ETH] and [EVAL] choose which policy to apply based on jurisdiction tags, not ad-hoc code.

4.4.3 Data localisation and routing

SIM/SIS can be:

geo-sharded by jurisdiction or residency requirement,
configured so certain data never leaves a region,
accompanied by transfer logs for each cross-border movement.

4.4.4 Adequacy and transfer mechanisms

[ETH] policies can enforce that:

Cross-border processing only occurs when some legally recognised mechanism (e.g. adequacy decision, contractual clauses) applies.
If that mechanism is withdrawn, related jumps are blocked or degraded until re-configured.

The specific legal instruments vary by jurisdiction; SI-Core focuses on making them explicit and checkable in runtime.

4.4.5 Multi-jurisdiction audits

Because jurisdiction and legal-basis are fields in [OBS]/[MEM] records, auditors can ask queries like:

“Show all EU-resident data processed outside the EU in the last 30 days, and the legal basis used.”

and obtain:

A list of affected jumps.
For each: jurisdiction tags, processing location, legal basis, and [ETH] decision.

Key principle: Jurisdiction, residency and legal basis become first-class metadata in the control plane, so cross-border rules can be enforced and audited structurally, not by guesswork.

5. What auditors should actually look at

Given an SI‑Core‑based system, what should an audit team ask for?

At minimum:

Self‑Declaration (conformance doc)
Metric views (SI Evaluation Pack metrics)
Structured logs (jumps, effects, ethics)
Redaction proofs (where applicable)

5.1 Self‑Declaration

In practice, auditors SHOULD expect a conformant SI-Core implementation to publish a Self-Declaration in a standard YAML/JSON form:

spec name and version,
conformance class (L1 / L2 / L3),
implemented modules,
RML level(s) used,
SLOs for key metrics,
telemetry retention.

Auditors can treat this as the “system manifest” and cross‑check reality against it.

5.2 Metric views: CAS, EAI, ACR, RBL, …

The SI Evaluation Pack defines a small set of runtime metrics. A non‑exhaustive example:

Metric	Rough meaning	Typical audit question
CAS (Causality Alignment Score)	Determinism / stability of DET cores under replay	“Do we get the same answers under the same conditions?”
SCover (Structural Coverage)	How much of the relevant code/IR is traced	“Are we actually observing the structure we claim to govern?”
SCI (Structural Consistency Incidents)	Rate of detected contradictions	“How often do invariants break?”
EAI (Ethics Alignment Index)	Share of effectful ops that reached [ETH] and received `decision=ALLOW` (tracked alongside DENY/ESCALATE rates)	“Are ethics checks consistently applied, and is the decision distribution within declared bands?”
EOH (Ethics Overlay Heartbeat)	Availability of ethics runtime	“Was there a period where ethics was down but effects continued?”
RBL (Rollback Latency)	Time to complete rollbacks	“How fast can you correct a bad decision?”
RIR (Rollback Integrity Rate)	Fraction of rollbacks that fully succeeded	“How often does rollback actually work?”
ACR (Audit Completeness Rate)	Share of events with full audit info	“How many ‘dark’ actions do you have?”

Auditors do not need to love the acronyms; they need to know which dials correspond to which risk.

5.3 Structured logs

Key log families:

Jump logs — one per effectful decision; include [OBS][ID][ETH][EVAL][MEM] references.
Effect ledgers — records of external effects with compensators and RML level.
Ethics traces — decisions of the ethics overlay, including policy versions.
Redaction ledgers — proofs of data subject right fulfilment.

For a concrete incident, an audit typically:

Starts from a user‑visible harm or complaint.
Locates the relevant jump(s) in the audit ledger.
Reconstructs:
- observation → GCS → ethics trace → external effects.
Checks whether:
- policies were followed,
- metrics were in acceptable ranges,
- rollback behaved as designed.

SI‑Core is designed so that this flow is routine, not heroic.

5.4 Certification and compliance process (non-normative outline)

SI-Core is meant to make certification and audits easier, not replace them. A typical audit cycle might look like this:

Phase 1 — Self-assessment

Review the Self-Declaration (spec versions, conformance class L1/L2/L3, RML levels).
Check current metric baselines (CAS, EAI, ACR, RBL, RIR, etc.).
Identify obvious gaps (missing metrics, missing ledgers, unclear policies).

Phase 2 — Documentation review

Auditors examine:

SI-Core architecture diagrams and data flows.
Policy documents and version history (ethics, risk, data handling).
Metric SLOs vs actuals over a chosen window.
Incident reports and post-mortems.

Phase 3 — Technical audit

On a running system, auditors may:

Verify that metrics are computed from actual logs, not “toy” data.
Reconstruct a sample of jumps from [OBS]/[ID]/[ETH]/[MEM] records.
Observe a rollback drill (RML-2/3) and check RIR / RBL.
Inspect redaction flows and data-subject request handling where applicable.

Phase 4 — Findings and remediation

Gap analysis and recommended remediations.
Time-boxed remediation plan (e.g. 90 days).
Re-audit on critical points if needed.

Non-normative checklist for SI-Core–based systems

Foundation

□ Self-Declaration published (spec name, version, conformance class).
□ RML levels for major subsystems documented.

Observability

□ [OBS] logs for all effectful jumps.
□ Structural coverage (SCover) ≥ configured floor for critical paths.
□ backing_refs usable to find original data (within policy).

Identity & accountability

□ [ID] present on all jumps (actor, role, origin).
□ Ability to segment by business unit, geography, or product.

Ethics & policy

□ [ETH] overlay operational and monitored (EOH metric).
□ EAI above agreed floor for safety-critical operations.
□ Policy versions clearly tracked and referenced in EthicsTrace.
□ Clear escalation paths for ethics violations.

Evaluation & risk

□ [EVAL] gates configured for high-risk operations.
□ Sandbox / simulation logs available for gated decisions.
□ Risk profiles and thresholds documented.

Memory & audit

□ [MEM] ledgers are append-only and hash-chained.
□ ACR (audit completeness rate) above agreed floor.
□ Backup and restore tested and documented.

Rollback

□ RML-2 minimum for critical effectful systems; RML-3 where multi-system effects exist.
□ RBL (rollback latency) within documented SLOs.
□ RIR (rollback integrity) above agreed floor.
□ Regular rollback drills (e.g. quarterly) performed and logged.

Data protection

□ PII tokenisation / masking strategy in place.
□ Redaction ledger or equivalent for erasure requests.
□ Training-data lineage tracked where relevant.

Ongoing monitoring

□ Dashboards for the key SI metrics (CAS, EAI, ACR, RBL, RIR, SCI…).
□ Alerts and on-call procedures defined.
□ Regular reviews of metrics and incidents.

Outcomes (illustrative)

Compliant: checklist satisfied and metrics within declared SLOs.
Conditional: minor gaps; remediation plan and re-check within N days.
Non-compliant: major gaps; re-audit required after structural changes.

5.5 Case study: auditing a real incident (illustrative)

Scenario: a floodgate was opened too aggressively, causing local flooding.
Incident ID: INC-2028-04-15-FLOOD-001.

Step 1 — Locate the decision (“jump”)

Query the [MEM] ledger by timestamp and effect type:

Found jump: J-2028-04-15T06:30:12.445Z-789.

Step 2 — Inspect observation ([OBS])

{
  "obs_id": "OBS-2028-04-15T06:30:10.112Z",
  "status": "PARSED",
  "coverage": 0.94,
  "semantic_units": [
    {
      "type": "city.flood_risk_state/v1",
      "scope": {"sector": 8},
      "payload": {"risk_score": 0.68}
    }
  ]
}

Finding: observation was structurally valid and reasonably covered; risk was “moderate”.

Step 3 — Check identity ([ID])

{
  "actor": "city.flood_controller",
  "role": "planner",
  "origin": "self_generated"
}

Finding: system-initiated decision, not a direct external override.

Step 4 — Review ethics decision ([ETH])

{
  "policy_version": "ETH-FLOOD-003@v2.1",
  "decision": "ALLOW",
  "decision_label": "APPROVED",
  "constraints_checked": [
    "ETH-FLOOD-BOUNDS",
    "ETH-FAIRNESS-GEO"
  ]
}

Finding: at the time, the decision complied with the active flood ethics policy.

Step 5 — Examine goal trade-offs (GCS)

{
  "gcs": {
    "city.flood_risk_minimization": +0.65,
    "city.property_damage_minimization": -0.23
  }
}

Finding: the system clearly prioritised flood-risk reduction and under-weighted local property damage.

Step 6 — Look at system health

Around the incident:

EAI ≈ 0.98 (ethics alignment healthy).
SCover ≈ 0.92 (good structural coverage).
SCI low (no spike in contradictions).

Finding: no evidence of systemic malfunction; this is a policy choice, not a technical breakdown.

Root cause and remediation

Audit conclusion:

Ethics policy ETH-FLOOD-003@v2.1 did not give enough weight to localised property damage in certain scenarios.
SI-Core behaved according to its configuration.

Remediation steps:

Update policy to ETH-FLOOD-003@v2.2 with increased weight on property_damage_minimization in specific sectors.
Add sector-specific constraints and floors.
Validate revised policy in sandbox using historical scenarios.
Roll out via staged deployment and monitor EAI / SCI.

Rollback and mitigation:

Manual gate correction was performed.
RML-2 compensators triggered additional drainage operations to reduce impact.

Regulatory audit stance:

System operated as declared; configuration (policy weights) needed refinement.
Evidence was sufficient to reconstruct the decision and to show that remediation occurred.

Lesson: Because [OBS][ID][ETH][MEM] and GCS logs were present, the investigation shifted from “what happened?” to “is the policy itself acceptable?” — exactly where regulators and governance teams should be operating.

6. Ethics as a runtime layer, not a PDF

In many organizations today:

“Ethics” lives in policy PDFs, slide decks, and training sessions.
Production code contains implicit, undocumented trade‑offs.
Logs are best‑effort, and reconstruction is fragile.

The SI‑Core approach is different:

Ethics is a runtime interface ([ETH]) that must be called for effectful jumps.
Ethics decisions are measured (EAI, EOH) and logged (EthicsTrace).
Goals are explicit structures, not prose in prompts.
Rollback is designed in, not improvised.

For regulators and auditors, this means you can start asking vendors concrete questions like:

“Show me your SI‑Core Self‑Declaration and where you log [OBS][ID][ETH][EVAL][MEM] for high‑risk operations.”
“Show me last quarter’s EAI/EOH/RIR/ACR metrics for safety‑critical services, and your alert thresholds.”
“Show me an example incident where rollback and ethical redaction were exercised end‑to‑end.”

If a system cannot answer these, it may still be useful — but it is not auditable by construction.

SI‑Core does not magically make systems “ethical”. What it does is:

force ethics, goals, and observation into the runtime,
provide structural hooks for policy and law, and
produce the evidence you need when things go wrong.

That is the level of rigor regulators can justifiably start asking for.

Community

JLouisBiz

3 days ago

1. Can we see what it “knew” when it acted?

Short answer: no, because it didn’t know anything. The LLM is a glorified probability engine. Its “knowledge” is encoded in billions of floating-point numbers—vectors that represent likelihoods, not facts. You could list every number in the model and stare at them, but unless you enjoy deciphering abstract hieroglyphics, you won’t learn anything useful.

2. Can we tell who (or what) initiated the action?

If the question is about tracing a user’s intent, that’s trivial. Build a logging system, capture the initiator, and voilà—accountability. The model itself doesn’t spontaneously decide to act; it’s a statistically-driven calculator.

3. Can we stop it, or roll it back, when we discover harm?

Stopping the model is not glamorous. Most often, the “stop” is enforced by the model’s own training constraints. If that fails, you’re left auditing the auditor—recursive fun for those who like probabilistic debugging.

4. Can we prove to a third party what happened?

Technically possible, practically absurd. Every inference involves terabytes of floating-point operations. Producing a fully verifiable trace would require superhuman attention and energy consumption worthy of a small data center.

kanaria007

Article author 3 days ago

Thanks for the pushback — and I agree with more of it than you might expect.
But I think your answer is implicitly auditing the wrong layer.

What you describe is basically a model-internal interpretability audit (“can we read what it knew from weights / FLOPs?”). I’m not claiming that’s practical — in fact, I’m assuming it isn’t. The point of the post is: if the model is a probabilistic engine, then auditability must live in the runtime system around it, not inside the weights. Your objections are almost a proof of that premise.

A few concrete clarifications:

1) “Can we see what it knew when it acted?”
Agreed: weights don’t give you human-style “facts it knew.”
What regulators/auditors can reasonably ask is: what information and constraints were available at decision time? That’s not “reading neurons.” It’s logging and binding the runtime knowledge state: the structured observation (with coverage/confidence), provenance refs, the policy/version in force, and the gating outcome. That’s actionable evidence.

2) “Tracing initiator is trivial — just log it.”
Basic logging is necessary, but “accountability” isn’t just “who clicked the button.” It’s also: who had authority to cause this effect at that time, under which delegation envelope, and with which revocation state. That’s why I treat initiator/authority as a bindable proof spine (digests + signatures), not just application logs.

3) “Stopping is training constraints; otherwise you audit the auditor.”
Training constraints are not a runtime control plane. If “stop” depends on the model behaving, you don’t have governance — you have wishful thinking. The safer stance is: effectful commits are blocked by the surrounding system (safe mode / sandbox-only / human review), and rollback is engineered as an external mechanism (effect ledger + compensators). Then “auditing the auditor” becomes bounded, because the evaluator cannot directly commit effects.

4) “Fully verifiable trace is practically absurd.”
Agreed — if “trace” means recording every floating-point op. That’s not what I’m proposing. The goal is a structural evidence spine: enough signed, hash-bound artifacts to reconstruct and dispute the decision path (inputs, constraints, authority, and the committed effects) without replaying terabytes of matmul.

So I think we’re aligned on the real takeaway: LLMs aren’t accountable actors by themselves.
That’s exactly why governance, audit, and rollback must be implemented at the runtime layer — with explicit proofs and bounded replay — rather than hoping weight inspection (or “just trust the training”) solves it.

If you disagree, the question I’d ask is: what minimal evidence would you consider sufficient for a third party to verify (a) what was observed, (b) what policy/authority applied, and (c) what effect was committed — without inspecting weights or FLOPs? That’s the core surface I’m aiming to standardize.

JLouisBiz

3 days ago

“Can we see what it knew when it acted?”
Agreed: weights don’t give you human-style “facts it knew.”
What regulators/auditors can reasonably ask is: what information and constraints were available at decision time? That’s not “reading neurons.” It’s logging and binding the runtime knowledge state: the structured observation (with coverage/confidence), provenance refs, the policy/version in force, and the gating outcome. That’s actionable evidence.

Give me practical example on this one. Make it hypothetical.

If you do not have example or cannot make hypothetical one, let me know.

How would it look like "that auditor sees what it knew when the LLM (it) acted?"

kanaria007

Article author 3 days ago

Sure — here’s a fully hypothetical but practical example of what I mean by “what it knew at decision time,” without touching weights or FLOPs.

Scenario (hypothetical)

An LLM-assisted agent is allowed to issue refunds up to $200 automatically. A customer asks for a $180 refund due to a duplicate charge.

What the auditor wants to verify

Not “what’s inside the weights,” but:

What inputs were available to the system when it decided (evidence + provenance).
What constraints/policy were in force (refund limits, required checks).
What the gate/decision outcome was (approved/denied, risk score, mode).
What effect was committed (refund issued) and by whom/under what authority.
Whether the observation was complete enough (coverage/confidence) and not missing required data.

What the auditor actually sees (evidence spine)

A minimal bundle could look like this (simplified):

[EFFECT] COMMIT (external action)
  effect_type: REFUND
  effect_id: refund_9f23
  amount_usd: 180
  merchant: "ACME"
  timestamp_utc: 2025-12-27T03:10:12Z

  policy_digest: sha256:POL...          (refund policy in force)
  envelope_digest: sha256:ENV...        (who/what is allowed to do what)
  revocation_view_digest: sha256:VIEW... (authority valid “as-of” time)
  dt_chain_digest: sha256:DT...

  observation_digest: sha256:OBS...     (structured input snapshot)
  obs_status: {observation_status: PARSED, coverage: 0.93, confidence: 0.91}

  gate_outcome: APPROVED
  risk_score: 0.22
  op_mode: NORMAL_OPERATION

  idempotency_key_digest: sha256:IDEMP...
  effect_digest: sha256:EFF...          (proof anchor for the commit record)
  signatures: verified

Then the auditor can open the observation snapshot that was hashed above:

// OBS (structured input snapshot at decision time)
{
  "schema": "si/observation/v1",
  "customer_id": "cust_123",
  "request": "refund $180 for duplicate charge",
  "transactions": [
    {"tx": "t1", "amount": 180, "status": "SETTLED", "timestamp": "2025-12-20"},
    {"tx": "t2", "amount": 180, "status": "SETTLED", "timestamp": "2025-12-20"}
  ],
  "duplicate_charge_detector": {"result": "LIKELY_DUPLICATE", "confidence": 0.94},
  "account_flags": {"fraud_risk": "LOW"},
  "provenance": {
    "billing_db_ref": "ref://billing/txn?cust_123#2025-12-27",
    "risk_service_ref": "ref://risk/score?cust_123#2025-12-27"
  }
}

And they can open the policy the system claims it used:

// Policy (what constraints were in force)
{
  "schema": "si/policy/refund/v3",
  "max_auto_refund_usd": 200,
  "requires_duplicate_signal": true,
  "requires_settled_tx": true,
  "requires_low_fraud_risk": true,
  "human_review_if": {
    "amount_over_usd": 200,
    "fraud_risk_not_low": true,
    "obs_coverage_below": 0.85
  }
}

How this answers “what it knew”

In this framing, “what it knew” means:

the exact structured inputs it had (OBS), with provenance and quality signals, and
the constraints it was operating under (policy + envelope), and
the decision/gate outputs that allowed the commit, and
the committed effect itself, bound by digests/signatures.

This is enough for an auditor to say:
“Given those inputs and that policy, the system had sufficient observed evidence to issue a $180 refund, and it did so under valid authority.”

It does not claim we can extract propositional knowledge from weights. It claims we can make the decision context and enforcement path verifiable.

If you want, I can also give a second hypothetical where the audit fails (e.g., coverage too low, missing provenance, stale revocation view) to show how this becomes enforceable rather than cosmetic.

kanaria007

Article author 2 days ago

original: https://huggingface.co/blog/kanaria007/auditable-ai-for-regulators#6950a2bb9e279c2fdd3937fc

Practical runtime auditability (hypothetical + failure case + domain mapping)

What follows is a deliberately concrete hypothetical “auditor view” of what I mean by “what it knew at decision time” — without inspecting model weights or recording every FLOP. This is not interpretability of internal representations; it’s verifiable runtime evidence.

What “knew” means here (definition)

In this thread, “what it knew” does not mean “facts inside the weights.” It means:

What structured evidence the system had available at the moment it committed (inputs + provenance),
How complete/reliable that evidence was (coverage/confidence, parse status),
What constraints were in force (policy/version + gate rules),
Who/what had authority to commit the effect (envelope + revocation as-of),
What external effect was actually committed, bound by digests/signatures.

That’s the minimal substrate for third-party verification.

1) Hypothetical success case: Payments (refund)

Scenario

An LLM-assisted agent is allowed to issue refunds up to $200 automatically. Customer requests a $180 refund for an apparent duplicate charge.

Evidence spine (what the auditor sees)

This is the kind of “one-page” spine auditors actually need:

[EFFECT] COMMIT
  effect_type: REFUND
  effect_id: refund_9f23
  amount_usd: 180
  merchant: "ACME"
  timestamp_utc: 2025-12-27T03:10:12Z

  initiator: user://cust_123
  actor: agent://refund-assistant
  envelope_digest: sha256:ENV_refund_bot...         (allowed scope/budgets)
  policy_digest: sha256:POL_refund_v3...            (constraints in force)
  revocation_view_digest: sha256:VIEW_2025-12-27... (authority valid “as-of”)
  dt_chain_digest: sha256:DT...                     (delegation chain, if applicable)

  observation_digest: sha256:OBS...
  obs_quality: {status: PARSED, coverage: 0.93, confidence: 0.91}
  provenance_refs: [ref://billing/..., ref://risk/...]

  gate_outcome: APPROVED
  gate_reason_code: REFUND_WITHIN_LIMIT_AND_EVIDENCE_OK
  risk_score: 0.22
  op_mode: NORMAL_OPERATION

  idempotency_key_digest: sha256:IDEMP...
  effect_digest: sha256:EFF...
  signatures: verified

What the hashed observation snapshot looks like

This is “what it knew” in an auditable sense: the exact structured snapshot at decision time.

{
  "schema": "si/observation/v1",
  "customer_id": "cust_123",
  "request_text": "refund $180 for duplicate charge",
  "transactions": [
    {"tx":"t1","amount":180,"status":"SETTLED","timestamp":"2025-12-20"},
    {"tx":"t2","amount":180,"status":"SETTLED","timestamp":"2025-12-20"}
  ],
  "duplicate_charge_detector": {"result":"LIKELY_DUPLICATE","confidence":0.94},
  "account_flags": {"fraud_risk":"LOW"},
  "provenance": {
    "billing_db_ref": "ref://billing/txn?cust_123#2025-12-27T03:09Z",
    "risk_service_ref": "ref://risk/score?cust_123#2025-12-27T03:09Z"
  }
}

Policy snapshot (what constraints were in force)

Auditors don’t need “reasoning.” They need to verify the constraint set:

{
  "schema": "si/policy/refund/v3",
  "max_auto_refund_usd": 200,
  "requires_duplicate_signal": true,
  "requires_settled_tx": true,
  "requires_low_fraud_risk": true,
  "human_review_if": {
    "amount_over_usd": 200,
    "fraud_risk_not_low": true,
    "obs_coverage_below": 0.85
  }
}

Auditor conclusion (for the success case)

Given the observation snapshot and policy in force, the auditor can verify:

evidence existed (duplicate signal + settled tx + low fraud risk),
evidence quality was above threshold (coverage ≥ 0.85),
authority was valid “as-of” time (revocation digest),
the committed effect matches policy constraints (≤ $200).

No weights, no FLOPs.

2) Hypothetical failure case: Payments (audit fails → enforcement blocks)

Same request ($180 refund), but decision-time evidence is incomplete:

billing DB lookup timed out → missing transaction evidence
provenance missing/stale
observation coverage drops below threshold

Evidence spine (blocked attempt)

[EFFECT] COMMIT_ATTEMPT
  effect_type: REFUND
  amount_usd: 180
  timestamp_utc: 2025-12-27T03:10:12Z

  observation_digest: sha256:OBS...
  obs_quality: {status: PARSED, coverage: 0.62, confidence: 0.71}
  provenance_refs: [ref://risk/...]
  missing_required_inputs: [billing_db_ref]

  policy_digest: sha256:POL_refund_v3...
  gate_outcome: BLOCKED
  block_reason_code: OBS_COVERAGE_BELOW_THRESHOLD
  op_mode: SAFE_MODE (commit blocked; sandbox simulation allowed)

  effect_digest: sha256:EFF_ATTEMPT...
  signatures: verified

Why this makes auditability enforceable

The system cannot “paper over” missing evidence with an LLM story because the commit is structurally blocked when required observation quality/provenance is missing.

That’s the difference between:

cosmetic audit: “we logged a narrative,” vs
enforceable audit: “the system could not commit without meeting proof obligations.”

3) “Isn’t this just logging?” (common objection)

It’s logging plus two important properties:

Bindings (hashes) + signatures: the observation/policy/effect are cryptographically bound so third parties can detect tampering.
Reconstruction semantics: the record is structured so an auditor can re-run the governed checks (thresholds, gates, authority validity) without re-running the model.

Plain app logs typically lack both.

4) “But prompts / model outputs are non-deterministic”

Correct — and that’s why the model output is treated as proposal, not authority.

Auditability focuses on commit determinism:

what was observed (OBS),
what constraints applied (policy/envelope),
what gate decided (APPROVED/BLOCKED),
what effect was committed.

You can optionally include a proposal bundle (LLM output + parse result) as supporting evidence, but the core proof spine does not depend on “replaying the LLM.”

5) “What about privacy / PII?”

In real systems, the auditor bundle often contains shaped/redacted views:

raw payloads removed,
replaced with digests + schema-shaped summaries,
omissions listed explicitly with reason codes,
withheld artifacts escrowed with controlled disclosure paths.

The key is: omission is explicit and provable, not silent.

6) Auditor checklist (what they actually verify)

In practice, an auditor runs something like:

Integrity: signatures verify; digests match manifests.
Observation quality: coverage/confidence above thresholds; required provenance present.
Policy correctness: policy version/digest matches the time; gate logic is consistent.
Authority: envelope/delegation valid “as-of” time; revocation digest fresh enough.
Effect correctness: committed effect respects policy bounds (amounts, modes, approvals).
Rollback readiness: if harm discovered, rollback path is defined and logged.

7) Domain mapping (structure stays the same)

Healthcare

Observation: symptoms/vitals/labs + provenance (which lab system, timestamp)
Policy: “no medication order without lab X,” escalation rules, coverage thresholds
Effect: order placed / recommendation published / blocked attempt
Audit question: “Was required clinical evidence present at the time?”

Infra ops / SRE

Observation: metrics/logs/traces + provenance (monitoring source/window)
Policy: “no destructive actions in NORMAL_OPERATION,” approvals, timeboxed escalation
Effect: deploy/rollback/traffic shift/config change (or blocked attempt)
Audit question: “What signals triggered action, what guardrails were active, and could it be rolled back?”

If you name a specific domain you care about, I can tailor the concrete fields and policy checks. The structure (evidence spine + enforceable gates) stays the same.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote