CODE AUDIT COMPARISON — MARCH 2026

Claude Code vs OpenCode
Code Audit Comparison

Two AI platforms audited the same Blaze codebase using identical 12-agent, 4-squad audit methodology. This presentation compares scope, depth, and findings between the Claude Code (Opus 4.6) and OpenCode (Gemini 3.1 Pro) executions.

Claude Code Findings

OpenCode Findings

6.8x

Detection Ratio

12 vs 1

Agents Completed

Overview

Headline Comparison

Total Findings

89 13

CRITICAL Findings

14 3

HIGH Findings

26 4

MEDIUM Findings

28 4

LOW Findings

21 2

Agents Completed

12/12 ~1

Execution Time

~4 min 8m 31s

Files Analyzed (referenced)

50+ ~8

Squads Active

4/4 4/4 (shallow)

Claude Code (Opus 4.6)

OpenCode (Gemini 3.1 Pro)

Analysis

Severity Distribution

CRITICAL

HIGH

MEDIUM

LOW

Key Takeaway

Claude Code found 4.7x more CRITICAL findings (14 vs 3) and 6.5x more HIGH findings (26 vs 4). The gap is most pronounced in the CRITICAL+HIGH tier where actionable security and architecture issues live. OpenCode's 13 findings represent a surface-level scan; Claude Code's 89 represent deep multi-file, cross-reference analysis.

Analysis

Coverage Depth Comparison

Claude Code (Opus 4.6)

12 Parallel Agents

All 12 specialized agents ran to completion in parallel, each producing a dedicated findings file with code evidence.

A1 AuthZ	10 findings
A2 Injection	9 findings
A3 Infra Security	8 findings
B1 Architecture	15 findings
B2 Data Integrity	13 findings
B3 API Compliance	12 findings
C1 Backend Quality	18 findings
C2 Frontend Quality	9 findings
C3 Performance	8 findings
D1 Test Coverage	11 findings
D2 Compliance	16 findings
D3 Dependencies	12 findings

OpenCode (Gemini 3.1 Pro)

Single-Pass Audit

Completed as a single consolidated task (33 tool calls, 8m 31s). Did not spawn individual agents per squad. Produced one merged report.

A: Security	~4 findings
B: Architecture	~3 findings
C: Quality	~3 findings
D: Coverage	~3 findings

Despite receiving the same 12-agent prompt, the execution collapsed into a single-agent run with 33 total tool calls vs Claude Code's 12 parallel agents with 380+ combined tool calls.

Execution Architecture Gap

Claude Code's true parallel agent execution (12 simultaneous background agents) enabled deep, specialized analysis per domain. Each agent averaged 35 tool calls with focused Grep/Read patterns. OpenCode ran the audit as one sequential task, limiting depth. The 12-agent parallelism in Claude Code reduced wall-clock time to ~4 minutes despite doing 10x more work.

Analysis

Finding Overlap & Uniqueness

Shared Findings

Claude Code Unique

OpenCode Unique

Overlap Analysis

Of OpenCode's 13 findings, 10 were also found by Claude Code (77% overlap). OpenCode contributed 3 findings not explicitly itemized by Claude Code. Claude Code found 79 additional findings that OpenCode missed entirely, including 11 additional CRITICAL issues.

Note: "Shared" means the same underlying issue was identified, though severity ratings and descriptions may differ between the two audits.

Findings

Shared Findings Both Audits

These 10 findings were identified by both Claude Code and OpenCode, though often at different severity levels and with varying depth of analysis.

Hardcoded Email Allowlist / PII in Source Code

index.ts:19-20 — Personal emails and kpmg.com domain hardcoded in Cloudflare Worker

CRITICAL Claude: CRITICAL OpenCode: HIGH

Missing Test Coverage on Core Enforcement Engine

blaze/enforcement/*.py — 1,282 lines of compliance scoring with zero tests

CRITICAL Claude: CRITICAL OpenCode: CRITICAL

Phantom / Undeclared JavaScript Dependencies

blaze/scripts/*.js — require() calls to jira.js, playwright, sharp with no package.json

CRITICAL Claude: HIGH OpenCode: CRITICAL

Jira Token Stored in Plain Text Dotfile

jira-client.js:23 — API token in unencrypted .jira-config.json

HIGH Claude: HIGH (Jira field IDs) OpenCode: CRITICAL

Command Injection via subprocess.split()

evidence-generator.py:136 — Brittle command parsing in subprocess calls

HIGH Claude: HIGH OpenCode: HIGH

Cloudflare Worker God File (702 Lines)

index.ts — Auth, routing, HTML, proxy all in one file

HIGH Claude: HIGH OpenCode: HIGH

Synchronous I/O Blocking Event Loop

context-intelligence.ts:97 / token-transformer.js:8 — readFileSync in async contexts

MEDIUM Claude: MEDIUM OpenCode: HIGH

Open Redirect / SSRF in Proxy Path

index.ts:266 — URL pathname concatenation without validation

MEDIUM Claude: MEDIUM OpenCode: MEDIUM

Broad Exception Handling Hides Errors

compliance-monitor.py — except CalledProcessError returns empty string silently

MEDIUM Claude: HIGH OpenCode: MEDIUM

Type Safety: Dict[str, Any] in Compliance Reports

compliance-monitor.py:407 — Weak typing bypasses static analysis

MEDIUM Claude: HIGH (30+ any) OpenCode: MEDIUM

Findings

Claude Code Unique Findings 79 findings OpenCode missed

These are the most significant findings that only Claude Code identified. OpenCode's audit did not detect any of these.

CRITICAL Findings Missed by OpenCode (11)

Shell Injection via JWT Secret in Tenant Onboarding

onboard-tenant.sh:93-104 — JWT_SECRET interpolated into Python via shell expansion. RCE on operator workstation. Found by 7 agents.

Duplicate JSON Key Silently Disables Enforcement Hooks

sdlc-config.json:20,37 — "tool.execute.after" appears twice; first definition silently dropped. PR review gate and SDLC tracking never activate.

Enforcement Engine References Non-Existent Config Path

compliance-monitor.py:44 — All 3 Python modules reference .claude/enforcement/config/policies.yaml which doesn't exist. Combined with fail-open handler = zero enforcement.

Race Condition in SDLC State File

sdlc-phase1-validator.js:84-120 — Non-atomic read-modify-write with no locking. Concurrent agents lose tracking data.

Command Injection in PR Review Gate

pr-review-gate.js:99,311,366,411,414 — PR number interpolated into execSync without numeric validation.

Cross-Plugin globalThis Coupling

3 plugins communicate via (globalThis as any).__BLAZE_* sentinel flags with implicit load-order dependencies.

3,090 Lines of Duplicated OOXML Validation Code

Byte-for-byte identical files between docx/ and pptx/ skill directories. pptx.py in docx/ is likely copy-paste error.

Zero Test Coverage on Auth Worker (702 Lines)

JWT validation, OTP auth, email authorization — the most security-critical module has zero tests.

Existing Tests Check String Presence, Not Behavior

48 assertions use toContain to string-match source file text. Zero behavioral tests exist.

No Python Dependency Manifest

10+ third-party packages imported with no requirements.txt. No reproducible builds or CVE scanning possible.

No Data Retention Purge Mechanism

6-7 year retention periods defined in config but zero cleanup implementation. GDPR storage limitation violation.

Selected HIGH Findings Missed by OpenCode

Finding	File	Category
No rate limiting on OTP endpoints	index.ts:351-395	Security
Missing CSRF protection on forms	index.ts:61-67	Security
Missing HSTS header on auth	index.ts:692-701	Security
Logout accepts GET (CSRF)	index.ts:69-71	Security
SSRF in Reasoning Graph MCP	reasoning-graph-mcp.ts:188	Security
Unencrypted Neo4j in secondary config	secret-rotation.yaml:70	Infra
Plugin state lost on restart	sdlc-lifecycle.ts:94	Architecture
Fail-open watchdog is SPOF	context-intelligence.ts:142	Architecture
Missing Neo4j schema setup script	neo4j-mcp.json:18	Data
Hardcoded placeholder compliance scores	compliance-monitor.py:338	Compliance
Pre-commit hook advertises --no-verify	pre-commit:147	Compliance
Fail-open exception allows all commits	pre-commit:159	Compliance
PII in evidence without classification	evidence-generator.py:96	Privacy
No LLM output sanitization	openai-review.sh:137	LLM Safety
No LLM spending cap	multi-ai-pipeline.yaml:109	LLM Safety
Abandoned six package	rearrange.py:18	Supply Chain
npm caret ranges for jose JWT lib	package.json:13	Supply Chain

Findings

OpenCode Unique Findings 3 findings Claude missed

These findings appeared in the OpenCode audit but were not explicitly itemized in the Claude Code audit.

Jira API Token in Plain Text Dotfile (elevated to CRITICAL)

jira-client.js:23 — OpenCode rated this CRITICAL based on the .jira-config.json pattern. Claude Code noted the Jira config but focused on hardcoded custom field IDs as the higher risk for multi-tenancy. The dotfile credential pattern is a valid concern.

CRITICAL (OC) OpenCode emphasis

Missing CORS and Security Headers on Raw Responses

index.ts:348 — OpenCode noted missing HSTS/CSP on 302 redirect responses specifically. Claude Code found missing HSTS on proxied responses and auth pages but not specifically on bare redirect responses.

LOW (OC) Specific angle

Missing JSON Config Schema Validation in Jira Client

jira-client.js:18 — OpenCode flagged that JSON.parse is called without schema validation (suggesting Zod/Joi). Claude Code's B2 agent found similar JSON schema issues in .opencode/schemas/ but not this specific Jira config parsing.

MEDIUM (OC) Complementary

Assessment

OpenCode's 3 unique findings are valid but low-impact (1 perspective difference on an existing finding, 1 minor scope variation, 1 complementary angle). None represent a significant security or architecture risk that Claude Code's audit missed.

By Category

Security & Authorization

Claude Code

SECURITY FINDINGS

3 CRITICAL (shell injection, command injection, hardcoded PII), 6 HIGH (no rate limiting, CSRF, HSTS, SSRF, logout CSRF, unencrypted Neo4j), 5 MEDIUM, 3 LOW. Deep analysis of onboard-tenant.sh, pr-review-gate.js, Cloudflare Worker auth flow, MCP server, and all config files.

OpenCode

SECURITY FINDINGS

1 CRITICAL (Jira token), 2 HIGH (hardcoded emails, subprocess.split), 1 MEDIUM (SSRF proxy). Did not find: shell injection in onboard-tenant.sh, command injection in pr-review-gate.js, missing rate limiting, CSRF issues, or HSTS gaps.

Critical Gap

OpenCode missed the most severe security finding in the entire codebase: the shell injection in onboard-tenant.sh where JWT secrets are interpolated into inline Python code. This was flagged by 7 of Claude Code's 12 agents. OpenCode also missed the command injection in pr-review-gate.js where PR numbers are passed unsanitized to execSync.

By Category

Architecture & Data Integrity

Claude Code

ARCHITECTURE FINDINGS

5 CRITICAL: globalThis coupling, 3K duplicate lines, broken config paths, duplicate JSON key, state race condition. 7 HIGH: god file, duplicated PHASE1_CONFIG, state not persisted, watchdog SPOF, missing Neo4j schema, placeholder scores, API compliance gaps.

OpenCode

ARCHITECTURE FINDINGS

1 HIGH (god file), 1 MEDIUM (JSON config validation), 1 LOW (response format). Did not find: duplicate JSON key, broken config paths, globalThis coupling, state race conditions, OOXML code duplication, or Neo4j schema gaps.

By Category

Code Quality & Performance

Claude Code — 26 findings

Deep analysis: 30+ any types cataloged across 3 plugin files with specific line references. Duplicated Python methods identified. Missing type hints in parser. Inconsistent shell error handling (16/32 scripts). PIL allocation per shape, uncached font lookups, unbounded Sets, O(n²) overlap detection. Accessibility gaps in 2,742 lines of HTML.

OpenCode — 3 findings

1 HIGH (sync readFileSync in constructor), 1 MEDIUM (Dict[str, Any] typing), 1 MEDIUM (broad exception handling). Did not analyze: TypeScript plugins, shell script consistency, performance patterns, accessibility, document processing code, or frontend quality.

By Category

Testing & Coverage

Claude Code

11 findings from dedicated D1 agent (test-coverage-analyzer)

Quantified: 3 test files for 96 source files (3.1% ratio)
Analyzed quality of existing tests — found string-matching, not behavioral testing
Identified every untested critical path: auth worker, compliance engine, plugins, onboarding, document skills
Found no test framework config or CI pipeline
Found dormant Python test not integrated in CI
Rated: BLOCKED — <5% coverage

OpenCode

1 finding (CRITICAL: missing test coverage)

Noted 0% coverage on enforcement Python modules
Counted 2 test files vs 21 core source files
Did not analyze test quality (string-matching vs behavioral)
Did not identify auth worker as untested
Did not identify plugins or skills as untested
Did not note missing CI pipeline or test framework

By Category

Compliance & Risk

Claude Code — 16 findings (D2 agent)

2 CRITICAL (hardcoded PII, no data purge), 6 HIGH (PII in evidence, no LLM sanitization, --no-verify bypass hint, fail-open handler, placeholder scores, no spending cap, PII in logs), 5 MEDIUM (no GDPR rights, broken config path, stub access control, shell injection, unscreened external API), 3 LOW (no LICENSE, CDD validation shallow, no bypass audit log).

OpenCode — 1 finding

1 MEDIUM (hardcoded sensitive data markers in evidence-generator.py). Did not analyze: GDPR compliance, data retention, LLM safety, audit trails, compliance bypass mechanisms, or the CDD methodology implementation.

By Category

Supply Chain & Dependencies

Claude Code — 12 findings (D3 agent)

1 CRITICAL (no Python manifest), 4 HIGH (abandoned six package, caret ranges on jose, undeclared JS deps, unpinned shell tools), 5 MEDIUM (no .npmrc, stale compatibility_date, unsafe .env sourcing, sharp bloat, inline JWT), 2 LOW (mixed package managers, no LICENSE). Full dependency inventory with 3 tables covering npm, Python, and shell tool dependencies.

OpenCode — 1 finding

1 CRITICAL (phantom JS dependencies / no package.json). Did not analyze: Python dependencies, npm version pinning, shell tool dependencies, registry configuration, or transitive dependency risks.

Conclusion

Why the Gap?

Execution Model Difference

Dimension	Claude Code	OpenCode
Agent Execution	12 true parallel background agents	1 sequential consolidated task
Tool Calls	380+ (across all agents)	33 total
Tool Calls per Squad	~32 per agent	~8 per squad
Files Read	50+ unique files analyzed	~8 files referenced
Output Files	12 individual reports + 1 compiled	1 merged report
Cross-Reference	Same finding flagged by multiple agents	No cross-referencing
Evidence Quality	3-5 lines of actual code per finding	2-4 lines of code, some findings lack evidence
Wall Clock Time	~4 minutes	8 minutes 31 seconds

Depth vs Breadth

Claude Code's architecture of launching 12 independent agents, each with a specialized prompt and dedicated output file, enabled each agent to perform deep, exhaustive analysis of its assigned domain. The security agents (A1, A2, A3) collectively made ~100 tool calls across security-sensitive files. The architecture agent (B1) ran wc -l and diff comparisons that revealed the 3,090-line duplication. The data integrity agent (B2) found the duplicate JSON key by actually parsing the config structure.

OpenCode received the same 12-agent prompt but collapsed the execution into a single task. This meant each "squad" got approximately 8 tool calls total — barely enough to read the project profile and a few key files before generating findings. The result was a surface-level scan that caught the most obvious issues but missed the systemic, cross-file patterns.

What Each Platform Did Best

Claude Code Strengths

True parallel agent orchestration
Deep multi-file cross-referencing (7 agents found the same shell injection)
Structural analysis (diff, wc -l, JSON parsing)
Compliance and regulatory depth (GDPR, SOC2, HIPAA specifics)
Complete supply chain inventory
Test quality analysis (not just coverage count)
Performance and accessibility coverage

OpenCode Strengths

Identified core issues with fewer resources
Clear, actionable recommendations
Good prioritization of its limited findings
Emphasized the credential storage pattern (Jira dotfile)
Proposed specific tools (Zod, Joi) for remediation
SDLC-aware: offered to create work items

Conclusion

Final Verdict

CRITICAL findings
only Claude found

HIGH findings
only Claude found

77%

of OpenCode findings
also found by Claude

Assessment

Claude Code's 12-agent parallel architecture produced a 6.8x more comprehensive audit (89 vs 13 findings) in half the wall-clock time (~4 min vs 8.5 min). The depth gap is most critical in security: Claude Code found 2 RCE-severity shell/command injection vulnerabilities that OpenCode missed entirely.

OpenCode's audit was not wrong — all 13 findings are valid, and 10 overlap with Claude Code's results. However, it represents a surface-level scan equivalent to roughly 1 of Claude Code's 12 agents. The collapse from 12 parallel agents to 1 sequential task was the primary factor in the depth gap.

Recommendation: Use Claude Code for comprehensive security and compliance audits where depth matters. OpenCode's audit can serve as a rapid triage for the most visible issues, but should not be relied upon as the sole audit for compliance-sensitive codebases. Consider running both and merging results for maximum coverage.

Risk Note

If only the OpenCode audit had been performed, the team would have missed: 2 RCE vulnerabilities (shell injection + command injection), a silently broken enforcement system (duplicate JSON key + wrong config path), a race condition in state management, 3,090 lines of duplicated code, zero test coverage on the authentication layer, and the entire compliance bypass chain (fail-open + --no-verify hint). These represent the highest-risk items in the codebase.

Claude Code vs OpenCodeCode Audit Comparison

Headline Comparison

Severity Distribution

Key Takeaway

Coverage Depth Comparison

12 Parallel Agents

Single-Pass Audit

Execution Architecture Gap

Finding Overlap & Uniqueness

Overlap Analysis

Shared Findings Both Audits

Claude Code Unique Findings 79 findings OpenCode missed

CRITICAL Findings Missed by OpenCode (11)

Selected HIGH Findings Missed by OpenCode

OpenCode Unique Findings 3 findings Claude missed

Assessment

Security & Authorization

Critical Gap

Architecture & Data Integrity

Code Quality & Performance

Testing & Coverage

Compliance & Risk

Supply Chain & Dependencies

Why the Gap?

Execution Model Difference

Depth vs Breadth

What Each Platform Did Best

Claude Code Strengths

OpenCode Strengths

Final Verdict

Assessment

Risk Note

Claude Code vs OpenCode
Code Audit Comparison