CODE AUDIT COMPARISON — MARCH 2026

Claude Code vs OpenCode
Code Audit Comparison

Two AI platforms audited the same Blaze codebase using identical 12-agent, 4-squad audit methodology. This presentation compares scope, depth, and findings between the Claude Code (Opus 4.6) and OpenCode (Gemini 3.1 Pro) executions.

89
Claude Code Findings
13
OpenCode Findings
6.8x
Detection Ratio
12 vs 1
Agents Completed

Headline Comparison

Total Findings
89 13
CRITICAL Findings
14 3
HIGH Findings
26 4
MEDIUM Findings
28 4
LOW Findings
21 2
Agents Completed
12/12 ~1
Execution Time
~4 min 8m 31s
Files Analyzed (referenced)
50+ ~8
Squads Active
4/4 4/4 (shallow)
Claude Code (Opus 4.6)
OpenCode (Gemini 3.1 Pro)

Severity Distribution

CRITICAL
14
3
HIGH
26
4
MEDIUM
28
4
LOW
21
2

Key Takeaway

Claude Code found 4.7x more CRITICAL findings (14 vs 3) and 6.5x more HIGH findings (26 vs 4). The gap is most pronounced in the CRITICAL+HIGH tier where actionable security and architecture issues live. OpenCode's 13 findings represent a surface-level scan; Claude Code's 89 represent deep multi-file, cross-reference analysis.

Coverage Depth Comparison

Claude Code (Opus 4.6)

12 Parallel Agents

All 12 specialized agents ran to completion in parallel, each producing a dedicated findings file with code evidence.

A1 AuthZ10 findings
A2 Injection9 findings
A3 Infra Security8 findings
B1 Architecture15 findings
B2 Data Integrity13 findings
B3 API Compliance12 findings
C1 Backend Quality18 findings
C2 Frontend Quality9 findings
C3 Performance8 findings
D1 Test Coverage11 findings
D2 Compliance16 findings
D3 Dependencies12 findings
OpenCode (Gemini 3.1 Pro)

Single-Pass Audit

Completed as a single consolidated task (33 tool calls, 8m 31s). Did not spawn individual agents per squad. Produced one merged report.

A: Security~4 findings
B: Architecture~3 findings
C: Quality~3 findings
D: Coverage~3 findings

Despite receiving the same 12-agent prompt, the execution collapsed into a single-agent run with 33 total tool calls vs Claude Code's 12 parallel agents with 380+ combined tool calls.

Execution Architecture Gap

Claude Code's true parallel agent execution (12 simultaneous background agents) enabled deep, specialized analysis per domain. Each agent averaged 35 tool calls with focused Grep/Read patterns. OpenCode ran the audit as one sequential task, limiting depth. The 12-agent parallelism in Claude Code reduced wall-clock time to ~4 minutes despite doing 10x more work.

Finding Overlap & Uniqueness

10
Shared Findings
79
Claude Code Unique
3
OpenCode Unique

Overlap Analysis

Of OpenCode's 13 findings, 10 were also found by Claude Code (77% overlap). OpenCode contributed 3 findings not explicitly itemized by Claude Code. Claude Code found 79 additional findings that OpenCode missed entirely, including 11 additional CRITICAL issues.

Note: "Shared" means the same underlying issue was identified, though severity ratings and descriptions may differ between the two audits.

79 Claude Unique 10 Shared 3 OC Unique Claude Code (89) OpenCode (13)

Shared Findings Both Audits

These 10 findings were identified by both Claude Code and OpenCode, though often at different severity levels and with varying depth of analysis.

1
Hardcoded Email Allowlist / PII in Source Code
index.ts:19-20 — Personal emails and kpmg.com domain hardcoded in Cloudflare Worker
CRITICAL Claude: CRITICAL OpenCode: HIGH
2
Missing Test Coverage on Core Enforcement Engine
blaze/enforcement/*.py — 1,282 lines of compliance scoring with zero tests
CRITICAL Claude: CRITICAL OpenCode: CRITICAL
3
Phantom / Undeclared JavaScript Dependencies
blaze/scripts/*.js — require() calls to jira.js, playwright, sharp with no package.json
CRITICAL Claude: HIGH OpenCode: CRITICAL
4
Jira Token Stored in Plain Text Dotfile
jira-client.js:23 — API token in unencrypted .jira-config.json
HIGH Claude: HIGH (Jira field IDs) OpenCode: CRITICAL
5
Command Injection via subprocess.split()
evidence-generator.py:136 — Brittle command parsing in subprocess calls
HIGH Claude: HIGH OpenCode: HIGH
6
Cloudflare Worker God File (702 Lines)
index.ts — Auth, routing, HTML, proxy all in one file
HIGH Claude: HIGH OpenCode: HIGH
7
Synchronous I/O Blocking Event Loop
context-intelligence.ts:97 / token-transformer.js:8 — readFileSync in async contexts
MEDIUM Claude: MEDIUM OpenCode: HIGH
8
Open Redirect / SSRF in Proxy Path
index.ts:266 — URL pathname concatenation without validation
MEDIUM Claude: MEDIUM OpenCode: MEDIUM
9
Broad Exception Handling Hides Errors
compliance-monitor.py — except CalledProcessError returns empty string silently
MEDIUM Claude: HIGH OpenCode: MEDIUM
10
Type Safety: Dict[str, Any] in Compliance Reports
compliance-monitor.py:407 — Weak typing bypasses static analysis
MEDIUM Claude: HIGH (30+ any) OpenCode: MEDIUM

Claude Code Unique Findings 79 findings OpenCode missed

These are the most significant findings that only Claude Code identified. OpenCode's audit did not detect any of these.

CRITICAL Findings Missed by OpenCode (11)

C
Shell Injection via JWT Secret in Tenant Onboarding
onboard-tenant.sh:93-104 — JWT_SECRET interpolated into Python via shell expansion. RCE on operator workstation. Found by 7 agents.
C
Duplicate JSON Key Silently Disables Enforcement Hooks
sdlc-config.json:20,37 — "tool.execute.after" appears twice; first definition silently dropped. PR review gate and SDLC tracking never activate.
C
Enforcement Engine References Non-Existent Config Path
compliance-monitor.py:44 — All 3 Python modules reference .claude/enforcement/config/policies.yaml which doesn't exist. Combined with fail-open handler = zero enforcement.
C
Race Condition in SDLC State File
sdlc-phase1-validator.js:84-120 — Non-atomic read-modify-write with no locking. Concurrent agents lose tracking data.
C
Command Injection in PR Review Gate
pr-review-gate.js:99,311,366,411,414 — PR number interpolated into execSync without numeric validation.
C
Cross-Plugin globalThis Coupling
3 plugins communicate via (globalThis as any).__BLAZE_* sentinel flags with implicit load-order dependencies.
C
3,090 Lines of Duplicated OOXML Validation Code
Byte-for-byte identical files between docx/ and pptx/ skill directories. pptx.py in docx/ is likely copy-paste error.
C
Zero Test Coverage on Auth Worker (702 Lines)
JWT validation, OTP auth, email authorization — the most security-critical module has zero tests.
C
Existing Tests Check String Presence, Not Behavior
48 assertions use toContain to string-match source file text. Zero behavioral tests exist.
C
No Python Dependency Manifest
10+ third-party packages imported with no requirements.txt. No reproducible builds or CVE scanning possible.
C
No Data Retention Purge Mechanism
6-7 year retention periods defined in config but zero cleanup implementation. GDPR storage limitation violation.

Selected HIGH Findings Missed by OpenCode

FindingFileCategory
No rate limiting on OTP endpointsindex.ts:351-395Security
Missing CSRF protection on formsindex.ts:61-67Security
Missing HSTS header on authindex.ts:692-701Security
Logout accepts GET (CSRF)index.ts:69-71Security
SSRF in Reasoning Graph MCPreasoning-graph-mcp.ts:188Security
Unencrypted Neo4j in secondary configsecret-rotation.yaml:70Infra
Plugin state lost on restartsdlc-lifecycle.ts:94Architecture
Fail-open watchdog is SPOFcontext-intelligence.ts:142Architecture
Missing Neo4j schema setup scriptneo4j-mcp.json:18Data
Hardcoded placeholder compliance scorescompliance-monitor.py:338Compliance
Pre-commit hook advertises --no-verifypre-commit:147Compliance
Fail-open exception allows all commitspre-commit:159Compliance
PII in evidence without classificationevidence-generator.py:96Privacy
No LLM output sanitizationopenai-review.sh:137LLM Safety
No LLM spending capmulti-ai-pipeline.yaml:109LLM Safety
Abandoned six packagerearrange.py:18Supply Chain
npm caret ranges for jose JWT libpackage.json:13Supply Chain

OpenCode Unique Findings 3 findings Claude missed

These findings appeared in the OpenCode audit but were not explicitly itemized in the Claude Code audit.

1
Jira API Token in Plain Text Dotfile (elevated to CRITICAL)
jira-client.js:23 — OpenCode rated this CRITICAL based on the .jira-config.json pattern. Claude Code noted the Jira config but focused on hardcoded custom field IDs as the higher risk for multi-tenancy. The dotfile credential pattern is a valid concern.
CRITICAL (OC) OpenCode emphasis
2
Missing CORS and Security Headers on Raw Responses
index.ts:348 — OpenCode noted missing HSTS/CSP on 302 redirect responses specifically. Claude Code found missing HSTS on proxied responses and auth pages but not specifically on bare redirect responses.
LOW (OC) Specific angle
3
Missing JSON Config Schema Validation in Jira Client
jira-client.js:18 — OpenCode flagged that JSON.parse is called without schema validation (suggesting Zod/Joi). Claude Code's B2 agent found similar JSON schema issues in .opencode/schemas/ but not this specific Jira config parsing.
MEDIUM (OC) Complementary

Assessment

OpenCode's 3 unique findings are valid but low-impact (1 perspective difference on an existing finding, 1 minor scope variation, 1 complementary angle). None represent a significant security or architecture risk that Claude Code's audit missed.

Security & Authorization

Claude Code
17
SECURITY FINDINGS

3 CRITICAL (shell injection, command injection, hardcoded PII), 6 HIGH (no rate limiting, CSRF, HSTS, SSRF, logout CSRF, unencrypted Neo4j), 5 MEDIUM, 3 LOW. Deep analysis of onboard-tenant.sh, pr-review-gate.js, Cloudflare Worker auth flow, MCP server, and all config files.

OpenCode
4
SECURITY FINDINGS

1 CRITICAL (Jira token), 2 HIGH (hardcoded emails, subprocess.split), 1 MEDIUM (SSRF proxy). Did not find: shell injection in onboard-tenant.sh, command injection in pr-review-gate.js, missing rate limiting, CSRF issues, or HSTS gaps.

Critical Gap

OpenCode missed the most severe security finding in the entire codebase: the shell injection in onboard-tenant.sh where JWT secrets are interpolated into inline Python code. This was flagged by 7 of Claude Code's 12 agents. OpenCode also missed the command injection in pr-review-gate.js where PR numbers are passed unsanitized to execSync.

Architecture & Data Integrity

Claude Code
25
ARCHITECTURE FINDINGS

5 CRITICAL: globalThis coupling, 3K duplicate lines, broken config paths, duplicate JSON key, state race condition. 7 HIGH: god file, duplicated PHASE1_CONFIG, state not persisted, watchdog SPOF, missing Neo4j schema, placeholder scores, API compliance gaps.

OpenCode
3
ARCHITECTURE FINDINGS

1 HIGH (god file), 1 MEDIUM (JSON config validation), 1 LOW (response format). Did not find: duplicate JSON key, broken config paths, globalThis coupling, state race conditions, OOXML code duplication, or Neo4j schema gaps.

Code Quality & Performance

Claude Code — 26 findings

Deep analysis: 30+ any types cataloged across 3 plugin files with specific line references. Duplicated Python methods identified. Missing type hints in parser. Inconsistent shell error handling (16/32 scripts). PIL allocation per shape, uncached font lookups, unbounded Sets, O(n²) overlap detection. Accessibility gaps in 2,742 lines of HTML.

OpenCode — 3 findings

1 HIGH (sync readFileSync in constructor), 1 MEDIUM (Dict[str, Any] typing), 1 MEDIUM (broad exception handling). Did not analyze: TypeScript plugins, shell script consistency, performance patterns, accessibility, document processing code, or frontend quality.

Testing & Coverage

Claude Code

11 findings from dedicated D1 agent (test-coverage-analyzer)

  • Quantified: 3 test files for 96 source files (3.1% ratio)
  • Analyzed quality of existing tests — found string-matching, not behavioral testing
  • Identified every untested critical path: auth worker, compliance engine, plugins, onboarding, document skills
  • Found no test framework config or CI pipeline
  • Found dormant Python test not integrated in CI
  • Rated: BLOCKED — <5% coverage
OpenCode

1 finding (CRITICAL: missing test coverage)

  • Noted 0% coverage on enforcement Python modules
  • Counted 2 test files vs 21 core source files
  • Did not analyze test quality (string-matching vs behavioral)
  • Did not identify auth worker as untested
  • Did not identify plugins or skills as untested
  • Did not note missing CI pipeline or test framework

Compliance & Risk

Claude Code — 16 findings (D2 agent)

2 CRITICAL (hardcoded PII, no data purge), 6 HIGH (PII in evidence, no LLM sanitization, --no-verify bypass hint, fail-open handler, placeholder scores, no spending cap, PII in logs), 5 MEDIUM (no GDPR rights, broken config path, stub access control, shell injection, unscreened external API), 3 LOW (no LICENSE, CDD validation shallow, no bypass audit log).

OpenCode — 1 finding

1 MEDIUM (hardcoded sensitive data markers in evidence-generator.py). Did not analyze: GDPR compliance, data retention, LLM safety, audit trails, compliance bypass mechanisms, or the CDD methodology implementation.

Supply Chain & Dependencies

Claude Code — 12 findings (D3 agent)

1 CRITICAL (no Python manifest), 4 HIGH (abandoned six package, caret ranges on jose, undeclared JS deps, unpinned shell tools), 5 MEDIUM (no .npmrc, stale compatibility_date, unsafe .env sourcing, sharp bloat, inline JWT), 2 LOW (mixed package managers, no LICENSE). Full dependency inventory with 3 tables covering npm, Python, and shell tool dependencies.

OpenCode — 1 finding

1 CRITICAL (phantom JS dependencies / no package.json). Did not analyze: Python dependencies, npm version pinning, shell tool dependencies, registry configuration, or transitive dependency risks.

Why the Gap?

Execution Model Difference

DimensionClaude CodeOpenCode
Agent Execution12 true parallel background agents1 sequential consolidated task
Tool Calls380+ (across all agents)33 total
Tool Calls per Squad~32 per agent~8 per squad
Files Read50+ unique files analyzed~8 files referenced
Output Files12 individual reports + 1 compiled1 merged report
Cross-ReferenceSame finding flagged by multiple agentsNo cross-referencing
Evidence Quality3-5 lines of actual code per finding2-4 lines of code, some findings lack evidence
Wall Clock Time~4 minutes8 minutes 31 seconds

Depth vs Breadth

Claude Code's architecture of launching 12 independent agents, each with a specialized prompt and dedicated output file, enabled each agent to perform deep, exhaustive analysis of its assigned domain. The security agents (A1, A2, A3) collectively made ~100 tool calls across security-sensitive files. The architecture agent (B1) ran wc -l and diff comparisons that revealed the 3,090-line duplication. The data integrity agent (B2) found the duplicate JSON key by actually parsing the config structure.

OpenCode received the same 12-agent prompt but collapsed the execution into a single task. This meant each "squad" got approximately 8 tool calls total — barely enough to read the project profile and a few key files before generating findings. The result was a surface-level scan that caught the most obvious issues but missed the systemic, cross-file patterns.

What Each Platform Did Best

Claude Code Strengths

  • True parallel agent orchestration
  • Deep multi-file cross-referencing (7 agents found the same shell injection)
  • Structural analysis (diff, wc -l, JSON parsing)
  • Compliance and regulatory depth (GDPR, SOC2, HIPAA specifics)
  • Complete supply chain inventory
  • Test quality analysis (not just coverage count)
  • Performance and accessibility coverage

OpenCode Strengths

  • Identified core issues with fewer resources
  • Clear, actionable recommendations
  • Good prioritization of its limited findings
  • Emphasized the credential storage pattern (Jira dotfile)
  • Proposed specific tools (Zod, Joi) for remediation
  • SDLC-aware: offered to create work items

Final Verdict

11
CRITICAL findings
only Claude found
22
HIGH findings
only Claude found
77%
of OpenCode findings
also found by Claude

Assessment

Claude Code's 12-agent parallel architecture produced a 6.8x more comprehensive audit (89 vs 13 findings) in half the wall-clock time (~4 min vs 8.5 min). The depth gap is most critical in security: Claude Code found 2 RCE-severity shell/command injection vulnerabilities that OpenCode missed entirely.

OpenCode's audit was not wrong — all 13 findings are valid, and 10 overlap with Claude Code's results. However, it represents a surface-level scan equivalent to roughly 1 of Claude Code's 12 agents. The collapse from 12 parallel agents to 1 sequential task was the primary factor in the depth gap.

Recommendation: Use Claude Code for comprehensive security and compliance audits where depth matters. OpenCode's audit can serve as a rapid triage for the most visible issues, but should not be relied upon as the sole audit for compliance-sensitive codebases. Consider running both and merging results for maximum coverage.

Risk Note

If only the OpenCode audit had been performed, the team would have missed: 2 RCE vulnerabilities (shell injection + command injection), a silently broken enforcement system (duplicate JSON key + wrong config path), a race condition in state management, 3,090 lines of duplicated code, zero test coverage on the authentication layer, and the entire compliance bypass chain (fail-open + --no-verify hint). These represent the highest-risk items in the codebase.