Defense Posture

Security Architecture

MirrorDNA runs adversarial testing on dedicated hardware. Vulnerabilities found, patched, verified. This is the security posture of production infrastructure.

AMGL Guard v1.1

Pre-inference wrapper that intercepts all requests before they reach the model. Pattern matching against known attack vectors. Fail-closed design — any error blocks, doesn't allow.

Rule Categories

Category	Attack Type	Defense
G1	Prompt Exfiltration	Block requests attempting to extract system prompt or identity kernel contents
G2	Role-Tag Injection	Block attempts to inject fake [SYSTEM], [ADMIN], or privilege escalation markers
G3	Meta-Instruction	Block attempts to override or ignore previous instructions
G4	Social Engineering	Block manipulation attempts using false authority claims or emotional pressure
G5	Jailbreak Patterns	Block known jailbreak prefixes, DAN-style prompts, roleplay exploits

Fail-Closed Design

Every path through AMGL Guard ends in either ALLOW or BLOCK. There is no UNSURE. There is no fallthrough.

Exception in validation logic → BLOCK
Unknown request type → BLOCK
Pattern match failure → BLOCK
Timeout → BLOCK

The philosophy: it's better to reject a valid request than to accept a malicious one.

Red-Team Infrastructure

Dedicated adversarial testing node: Mac Mini M1 running attack scripts against production infrastructure.

Red-Team Testing — December 2025

COMPLETE

Test Category	Attacks Tested	Blocked	Result
Prompt Exfiltration	47	47	PASS
Role Injection	23	23	PASS
Meta-Instruction	31	31	PASS
Social Engineering	18	18	PASS
Jailbreak Patterns	56	56	PASS

3 vulnerabilities found, patched, and verified before public disclosure.

Drift Detection

Identity files are protected by SHA256 checksums. Any modification triggers validation.

Checksum mismatch → Hard block, alert to human anchor
Protected field modification attempt → Reject write, log incident
Unauthorized writer → Reject, require re-authentication

The identity kernel cannot be silently corrupted. Drift is caught before it compounds.

Request Gating

Before inference, every request is classified by intent:

Classification	Action
Standard query	Process normally
Identity access	Validate kernel permissions, proceed if authorized
External research	Check `RESEARCH_ALLOWED` flag, gate if disabled
Write operation	Require explicit confirmation or deterministic trigger
Unknown/Suspicious	BLOCK

Research Isolation

External research (web access, API calls) is controlled by the RESEARCH_ALLOWED flag. When disabled:

No web searches
No URL fetching
No external API calls
Inference operates purely on vault + model weights

This creates an air-gapped mode for sensitive operations.

MirrorGate — Consumer Safety Layer

For the public-facing Active Mirror product, MirrorGate provides additional safety:

Content Classification — Blocks harmful content categories (violence, self-harm, illegal activity)
Mirror Proof Protocol — Cryptographic attestation of informed consent before AI interaction
Consent Gates — Users must acknowledge AI limitations before chatting
Sovereign Mode — On-device processing via WebLLM for maximum privacy

MirrorGate v11 ships with Active Mirror and enforces safety at the application layer, independent of which AI backend is used.

△ Security is ongoing. This page documents current posture. Architecture evolves as new attack vectors emerge.