Claude Code vs OpenCode
Code Audit Comparison
Two AI platforms audited the same Blaze codebase using identical 12-agent, 4-squad audit methodology. This presentation compares scope, depth, and findings between the Claude Code (Opus 4.6) and OpenCode (Gemini 3.1 Pro) executions.
Headline Comparison
Severity Distribution
Key Takeaway
Claude Code found 4.7x more CRITICAL findings (14 vs 3) and 6.5x more HIGH findings (26 vs 4). The gap is most pronounced in the CRITICAL+HIGH tier where actionable security and architecture issues live. OpenCode's 13 findings represent a surface-level scan; Claude Code's 89 represent deep multi-file, cross-reference analysis.
Coverage Depth Comparison
12 Parallel Agents
All 12 specialized agents ran to completion in parallel, each producing a dedicated findings file with code evidence.
| A1 AuthZ | 10 findings |
| A2 Injection | 9 findings |
| A3 Infra Security | 8 findings |
| B1 Architecture | 15 findings |
| B2 Data Integrity | 13 findings |
| B3 API Compliance | 12 findings |
| C1 Backend Quality | 18 findings |
| C2 Frontend Quality | 9 findings |
| C3 Performance | 8 findings |
| D1 Test Coverage | 11 findings |
| D2 Compliance | 16 findings |
| D3 Dependencies | 12 findings |
Single-Pass Audit
Completed as a single consolidated task (33 tool calls, 8m 31s). Did not spawn individual agents per squad. Produced one merged report.
| A: Security | ~4 findings |
| B: Architecture | ~3 findings |
| C: Quality | ~3 findings |
| D: Coverage | ~3 findings |
Despite receiving the same 12-agent prompt, the execution collapsed into a single-agent run with 33 total tool calls vs Claude Code's 12 parallel agents with 380+ combined tool calls.
Execution Architecture Gap
Claude Code's true parallel agent execution (12 simultaneous background agents) enabled deep, specialized analysis per domain. Each agent averaged 35 tool calls with focused Grep/Read patterns. OpenCode ran the audit as one sequential task, limiting depth. The 12-agent parallelism in Claude Code reduced wall-clock time to ~4 minutes despite doing 10x more work.
Finding Overlap & Uniqueness
Overlap Analysis
Of OpenCode's 13 findings, 10 were also found by Claude Code (77% overlap). OpenCode contributed 3 findings not explicitly itemized by Claude Code. Claude Code found 79 additional findings that OpenCode missed entirely, including 11 additional CRITICAL issues.
Note: "Shared" means the same underlying issue was identified, though severity ratings and descriptions may differ between the two audits.
Claude Code Unique Findings 79 findings OpenCode missed
These are the most significant findings that only Claude Code identified. OpenCode's audit did not detect any of these.
CRITICAL Findings Missed by OpenCode (11)
onboard-tenant.sh:93-104 — JWT_SECRET interpolated into Python via shell expansion. RCE on operator workstation. Found by 7 agents.sdlc-config.json:20,37 — "tool.execute.after" appears twice; first definition silently dropped. PR review gate and SDLC tracking never activate.compliance-monitor.py:44 — All 3 Python modules reference .claude/enforcement/config/policies.yaml which doesn't exist. Combined with fail-open handler = zero enforcement.sdlc-phase1-validator.js:84-120 — Non-atomic read-modify-write with no locking. Concurrent agents lose tracking data.pr-review-gate.js:99,311,366,411,414 — PR number interpolated into execSync without numeric validation.(globalThis as any).__BLAZE_* sentinel flags with implicit load-order dependencies.docx/ and pptx/ skill directories. pptx.py in docx/ is likely copy-paste error.toContain to string-match source file text. Zero behavioral tests exist.Selected HIGH Findings Missed by OpenCode
| Finding | File | Category |
|---|---|---|
| No rate limiting on OTP endpoints | index.ts:351-395 | Security |
| Missing CSRF protection on forms | index.ts:61-67 | Security |
| Missing HSTS header on auth | index.ts:692-701 | Security |
| Logout accepts GET (CSRF) | index.ts:69-71 | Security |
| SSRF in Reasoning Graph MCP | reasoning-graph-mcp.ts:188 | Security |
| Unencrypted Neo4j in secondary config | secret-rotation.yaml:70 | Infra |
| Plugin state lost on restart | sdlc-lifecycle.ts:94 | Architecture |
| Fail-open watchdog is SPOF | context-intelligence.ts:142 | Architecture |
| Missing Neo4j schema setup script | neo4j-mcp.json:18 | Data |
| Hardcoded placeholder compliance scores | compliance-monitor.py:338 | Compliance |
| Pre-commit hook advertises --no-verify | pre-commit:147 | Compliance |
| Fail-open exception allows all commits | pre-commit:159 | Compliance |
| PII in evidence without classification | evidence-generator.py:96 | Privacy |
| No LLM output sanitization | openai-review.sh:137 | LLM Safety |
| No LLM spending cap | multi-ai-pipeline.yaml:109 | LLM Safety |
| Abandoned six package | rearrange.py:18 | Supply Chain |
| npm caret ranges for jose JWT lib | package.json:13 | Supply Chain |
OpenCode Unique Findings 3 findings Claude missed
These findings appeared in the OpenCode audit but were not explicitly itemized in the Claude Code audit.
jira-client.js:23 — OpenCode rated this CRITICAL based on the .jira-config.json pattern. Claude Code noted the Jira config but focused on hardcoded custom field IDs as the higher risk for multi-tenancy. The dotfile credential pattern is a valid concern.index.ts:348 — OpenCode noted missing HSTS/CSP on 302 redirect responses specifically. Claude Code found missing HSTS on proxied responses and auth pages but not specifically on bare redirect responses.jira-client.js:18 — OpenCode flagged that JSON.parse is called without schema validation (suggesting Zod/Joi). Claude Code's B2 agent found similar JSON schema issues in .opencode/schemas/ but not this specific Jira config parsing.Assessment
OpenCode's 3 unique findings are valid but low-impact (1 perspective difference on an existing finding, 1 minor scope variation, 1 complementary angle). None represent a significant security or architecture risk that Claude Code's audit missed.
Security & Authorization
3 CRITICAL (shell injection, command injection, hardcoded PII), 6 HIGH (no rate limiting, CSRF, HSTS, SSRF, logout CSRF, unencrypted Neo4j), 5 MEDIUM, 3 LOW. Deep analysis of onboard-tenant.sh, pr-review-gate.js, Cloudflare Worker auth flow, MCP server, and all config files.
1 CRITICAL (Jira token), 2 HIGH (hardcoded emails, subprocess.split), 1 MEDIUM (SSRF proxy). Did not find: shell injection in onboard-tenant.sh, command injection in pr-review-gate.js, missing rate limiting, CSRF issues, or HSTS gaps.
Critical Gap
OpenCode missed the most severe security finding in the entire codebase: the shell injection in onboard-tenant.sh where JWT secrets are interpolated into inline Python code. This was flagged by 7 of Claude Code's 12 agents. OpenCode also missed the command injection in pr-review-gate.js where PR numbers are passed unsanitized to execSync.
Architecture & Data Integrity
5 CRITICAL: globalThis coupling, 3K duplicate lines, broken config paths, duplicate JSON key, state race condition. 7 HIGH: god file, duplicated PHASE1_CONFIG, state not persisted, watchdog SPOF, missing Neo4j schema, placeholder scores, API compliance gaps.
1 HIGH (god file), 1 MEDIUM (JSON config validation), 1 LOW (response format). Did not find: duplicate JSON key, broken config paths, globalThis coupling, state race conditions, OOXML code duplication, or Neo4j schema gaps.
Code Quality & Performance
Deep analysis: 30+ any types cataloged across 3 plugin files with specific line references. Duplicated Python methods identified. Missing type hints in parser. Inconsistent shell error handling (16/32 scripts). PIL allocation per shape, uncached font lookups, unbounded Sets, O(n²) overlap detection. Accessibility gaps in 2,742 lines of HTML.
1 HIGH (sync readFileSync in constructor), 1 MEDIUM (Dict[str, Any] typing), 1 MEDIUM (broad exception handling). Did not analyze: TypeScript plugins, shell script consistency, performance patterns, accessibility, document processing code, or frontend quality.
Testing & Coverage
11 findings from dedicated D1 agent (test-coverage-analyzer)
- Quantified: 3 test files for 96 source files (3.1% ratio)
- Analyzed quality of existing tests — found string-matching, not behavioral testing
- Identified every untested critical path: auth worker, compliance engine, plugins, onboarding, document skills
- Found no test framework config or CI pipeline
- Found dormant Python test not integrated in CI
- Rated: BLOCKED — <5% coverage
1 finding (CRITICAL: missing test coverage)
- Noted 0% coverage on enforcement Python modules
- Counted 2 test files vs 21 core source files
- Did not analyze test quality (string-matching vs behavioral)
- Did not identify auth worker as untested
- Did not identify plugins or skills as untested
- Did not note missing CI pipeline or test framework
Compliance & Risk
2 CRITICAL (hardcoded PII, no data purge), 6 HIGH (PII in evidence, no LLM sanitization, --no-verify bypass hint, fail-open handler, placeholder scores, no spending cap, PII in logs), 5 MEDIUM (no GDPR rights, broken config path, stub access control, shell injection, unscreened external API), 3 LOW (no LICENSE, CDD validation shallow, no bypass audit log).
1 MEDIUM (hardcoded sensitive data markers in evidence-generator.py). Did not analyze: GDPR compliance, data retention, LLM safety, audit trails, compliance bypass mechanisms, or the CDD methodology implementation.
Supply Chain & Dependencies
1 CRITICAL (no Python manifest), 4 HIGH (abandoned six package, caret ranges on jose, undeclared JS deps, unpinned shell tools), 5 MEDIUM (no .npmrc, stale compatibility_date, unsafe .env sourcing, sharp bloat, inline JWT), 2 LOW (mixed package managers, no LICENSE). Full dependency inventory with 3 tables covering npm, Python, and shell tool dependencies.
1 CRITICAL (phantom JS dependencies / no package.json). Did not analyze: Python dependencies, npm version pinning, shell tool dependencies, registry configuration, or transitive dependency risks.
Why the Gap?
Execution Model Difference
| Dimension | Claude Code | OpenCode |
|---|---|---|
| Agent Execution | 12 true parallel background agents | 1 sequential consolidated task |
| Tool Calls | 380+ (across all agents) | 33 total |
| Tool Calls per Squad | ~32 per agent | ~8 per squad |
| Files Read | 50+ unique files analyzed | ~8 files referenced |
| Output Files | 12 individual reports + 1 compiled | 1 merged report |
| Cross-Reference | Same finding flagged by multiple agents | No cross-referencing |
| Evidence Quality | 3-5 lines of actual code per finding | 2-4 lines of code, some findings lack evidence |
| Wall Clock Time | ~4 minutes | 8 minutes 31 seconds |
Depth vs Breadth
Claude Code's architecture of launching 12 independent agents, each with a specialized prompt and dedicated output file, enabled each agent to perform deep, exhaustive analysis of its assigned domain. The security agents (A1, A2, A3) collectively made ~100 tool calls across security-sensitive files. The architecture agent (B1) ran wc -l and diff comparisons that revealed the 3,090-line duplication. The data integrity agent (B2) found the duplicate JSON key by actually parsing the config structure.
OpenCode received the same 12-agent prompt but collapsed the execution into a single task. This meant each "squad" got approximately 8 tool calls total — barely enough to read the project profile and a few key files before generating findings. The result was a surface-level scan that caught the most obvious issues but missed the systemic, cross-file patterns.
What Each Platform Did Best
Claude Code Strengths
- True parallel agent orchestration
- Deep multi-file cross-referencing (7 agents found the same shell injection)
- Structural analysis (diff, wc -l, JSON parsing)
- Compliance and regulatory depth (GDPR, SOC2, HIPAA specifics)
- Complete supply chain inventory
- Test quality analysis (not just coverage count)
- Performance and accessibility coverage
OpenCode Strengths
- Identified core issues with fewer resources
- Clear, actionable recommendations
- Good prioritization of its limited findings
- Emphasized the credential storage pattern (Jira dotfile)
- Proposed specific tools (Zod, Joi) for remediation
- SDLC-aware: offered to create work items
Final Verdict
only Claude found
only Claude found
also found by Claude
Assessment
Claude Code's 12-agent parallel architecture produced a 6.8x more comprehensive audit (89 vs 13 findings) in half the wall-clock time (~4 min vs 8.5 min). The depth gap is most critical in security: Claude Code found 2 RCE-severity shell/command injection vulnerabilities that OpenCode missed entirely.
OpenCode's audit was not wrong — all 13 findings are valid, and 10 overlap with Claude Code's results. However, it represents a surface-level scan equivalent to roughly 1 of Claude Code's 12 agents. The collapse from 12 parallel agents to 1 sequential task was the primary factor in the depth gap.
Recommendation: Use Claude Code for comprehensive security and compliance audits where depth matters. OpenCode's audit can serve as a rapid triage for the most visible issues, but should not be relied upon as the sole audit for compliance-sensitive codebases. Consider running both and merging results for maximum coverage.
Risk Note
If only the OpenCode audit had been performed, the team would have missed: 2 RCE vulnerabilities (shell injection + command injection), a silently broken enforcement system (duplicate JSON key + wrong config path), a race condition in state management, 3,090 lines of duplicated code, zero test coverage on the authentication layer, and the entire compliance bypass chain (fail-open + --no-verify hint). These represent the highest-risk items in the codebase.