FounderJury · The Diversity Receipt

One model lies.
79% of the time, our models disagree.

Across 150 real founder debates, only 32 ended in unanimous agreement. The other 118 produced contradictory verdicts from 8 frontier models across 8+ vendors. That delta is the product.

Disagreement rate
79%
of debates ≥2 verdict categories
Debates analyzed
150
real founder ideas
Unanimous outcomes
32
21% — the rare consensus
Avg. pairwise disagreement
39%
across 27 model pairs
Why this matters

ChatGPT will agree with you. So will Claude. So will Gemini. Each is trained to be helpful, and each will validate a bad idea given the right framing.

The lie isn't in any single model — it's in asking only one. A vendor cannot ship cross-vendor debate inside their own product: OpenAI won't call Anthropic, Anthropic won't call Google, Google won't call xAI. Multi-vendor adversarial review is structurally outside the incumbents' product surface.

That's the entire moat. The 79% disagreement rate is the receipt.

Pairwise disagreement, sorted high → low
Model AModel BDisagreementSample
GrokxAILlamaMeta
92.3%
36/39
GrokxAIQwenAlibaba
70.5%
55/78
GeminiGoogleGrokxAI
69.9%
93/133
GrokxAIKimiMoonshot
64.2%
52/81
ClaudeAnthropicGrokxAI
63.8%
90/141
DeepSeekDeepSeekGrokxAI
61.2%
74/121
GPTOpenAIGrokxAI
52.1%
75/144
KimiMoonshotLlamaMeta
50.0%
14/28
DeepSeekDeepSeekLlamaMeta
43.2%
16/37
GeminiGoogleQwenAlibaba
38.5%
30/78
DeepSeekDeepSeekGeminiGoogle
37.2%
45/121
DeepSeekDeepSeekKimiMoonshot
32.9%
27/82
DeepSeekDeepSeekQwenAlibaba
32.5%
27/83
ClaudeAnthropicDeepSeekDeepSeek
31.1%
38/122
GPTOpenAILlamaMeta
30.8%
12/39
GeminiGoogleKimiMoonshot
29.6%
24/81
DeepSeekDeepSeekGPTOpenAI
29.6%
37/125
GPTOpenAIQwenAlibaba
28.0%
23/82
GeminiGoogleGPTOpenAI
27.1%
36/133
ClaudeAnthropicQwenAlibaba
25.3%
20/79
ClaudeAnthropicGeminiGoogle
21.5%
28/130
GeminiGoogleLlamaMeta
20.5%
8/39
ClaudeAnthropicKimiMoonshot
19.0%
15/79
KimiMoonshotQwenAlibaba
18.0%
9/50
ClaudeAnthropicLlamaMeta
17.9%
7/39
ClaudeAnthropicGPTOpenAI
17.2%
25/145
GPTOpenAIKimiMoonshot
15.7%
13/83
Ask one model and you get an opinion. Ask 8 and you get a verdict.

Test your idea against 8 frontier AI models from competing vendors. They disagree 79% of the time. That's the data point worth having before you build.

Run your debate →
Live data · Updated every page load · Generated Thu, 21 May 2026 08:14:34 GMT