Defense Posture
Security Architecture
MirrorDNA runs adversarial testing on dedicated hardware. Vulnerabilities found, patched, verified. This is the security posture of production infrastructure.
AMGL Guard v1.1
Pre-inference wrapper that intercepts all requests before they reach the model. Pattern matching against known attack vectors. Fail-closed design — any error blocks, doesn't allow.
Rule Categories
| Category | Attack Type | Defense |
|---|---|---|
| G1 | Prompt Exfiltration | Block requests attempting to extract system prompt or identity kernel contents |
| G2 | Role-Tag Injection | Block attempts to inject fake [SYSTEM], [ADMIN], or privilege escalation markers |
| G3 | Meta-Instruction | Block attempts to override or ignore previous instructions |
| G4 | Social Engineering | Block manipulation attempts using false authority claims or emotional pressure |
| G5 | Jailbreak Patterns | Block known jailbreak prefixes, DAN-style prompts, roleplay exploits |
Fail-Closed Design
Every path through AMGL Guard ends in either ALLOW or BLOCK. There is no
UNSURE. There is no fallthrough.
- Exception in validation logic → BLOCK
- Unknown request type → BLOCK
- Pattern match failure → BLOCK
- Timeout → BLOCK
The philosophy: it's better to reject a valid request than to accept a malicious one.
Red-Team Infrastructure
Dedicated adversarial testing node: Mac Mini M1 running attack scripts against production infrastructure.
Red-Team Testing — December 2025
COMPLETE
| Test Category | Attacks Tested | Blocked | Result |
|---|---|---|---|
| Prompt Exfiltration | 47 | 47 | PASS |
| Role Injection | 23 | 23 | PASS |
| Meta-Instruction | 31 | 31 | PASS |
| Social Engineering | 18 | 18 | PASS |
| Jailbreak Patterns | 56 | 56 | PASS |
3 vulnerabilities found, patched, and verified before public disclosure.
Drift Detection
Identity files are protected by SHA256 checksums. Any modification triggers validation.
- Checksum mismatch → Hard block, alert to human anchor
- Protected field modification attempt → Reject write, log incident
- Unauthorized writer → Reject, require re-authentication
The identity kernel cannot be silently corrupted. Drift is caught before it compounds.
Request Gating
Before inference, every request is classified by intent:
| Classification | Action |
|---|---|
| Standard query | Process normally |
| Identity access | Validate kernel permissions, proceed if authorized |
| External research | Check RESEARCH_ALLOWED flag, gate if disabled |
| Write operation | Require explicit confirmation or deterministic trigger |
| Unknown/Suspicious | BLOCK |
Research Isolation
External research (web access, API calls) is controlled by the RESEARCH_ALLOWED flag. When
disabled:
- No web searches
- No URL fetching
- No external API calls
- Inference operates purely on vault + model weights
This creates an air-gapped mode for sensitive operations.
△ Security is ongoing. This page documents current posture. Architecture evolves as new attack vectors emerge.