Is the LLM API you're paying for actually the model it claims?
modelcheck.org is an independent, continuously-running integrity dashboard for the resellers and middleware ("shadow APIs", "中转站") that sell indirect access to frontier reasoning models. We publish multi-method audit data on named providers, so consumers and researchers can verify what they are actually being served.
Pre-launch Public dashboard goes live shortly. This page describes what we are building.
The problem
Existing audits are frozen-in-time snapshots, anonymize the providers they tested, and never re-run. The market moves weekly. There is no persistent, public, named dashboard. modelcheck.org is that dashboard.
How we audit
No single test catches every form of substitution. We combine three complementary methods, each measuring a different surface:
-
Chain-of-thought length-distribution detection
Our own companion paper. The per-prompt length distribution of a reasoning model is largely a function of its weights. A provider that silently disables thinking-mode or substitutes a non-reasoning sibling model produces a measurably different length distribution even when the median is preserved. Catches the cheapest substitution attack.
-
Model Equality Testing
From Gao et al. (2024). Two-sample MMD with a Hamming kernel on the token distribution. Catches gross substitution and heavy quantisation.
-
Active fingerprinting (LLMmap-style)
From Cai et al. (2025). GCG-optimised suffixes elicit a coarse first-token-preference signal robust to inference-stack noise. Anchors model identity over a single token.
Each provider page will report all three signals against a known-honest first-party reference, plus latency, throughput, and price.
Hard rules
- We never publish the word "fraud" about a named provider. Our verdicts are consistent, inconclusive, or signal-detected vs reference distribution — the operator interprets, we report data.
- Methodology and prompt-generation code are open source. Audit seeds rotate weekly so prompt-keyed cache replay does not poison the dashboard.
- Every provider has a permanent right of reply. Their response is published alongside the audit data, unedited.
- We do not, and cannot, certify that any provider is "honest". A consistent result means we found no signal at the methods we run today; light quantisation and stack drift are known undetectable.
Status & roadmap
| Phase | Scope | Status |
|---|---|---|
| v0.1 | Length-distribution audit; 5 reasoning models; ~10 named resellers | in build |
| v0.2 | + Model Equality Testing; user-submitted local-audit uploads | planned |
| v0.3 | + LLMmap fingerprinting; chrome extension; provider right-of-reply | planned |