SubQ promises to break AI’s context bottleneck: 12 million tokens and much lower cost
SubQ promises to break AI’s context bottleneck: 12 million tokens and much lower cost
I came across Subquadratic because the company suddenly appeared with one of the more interesting AI claims right now: the Miami startup has emerged from stealth and says it has built a new kind of language model that attacks one of the biggest bottlenecks in modern AI.
The claim is easy to understand, but hard to prove: today’s transformer models get expensive when context gets long. Subquadratic says its first model, SubQ 1M-Preview, uses a fully subquadratic architecture where cost grows far more efficiently as the context window expands.
If that is true, this is not just another model. It is an attack on the infrastructure we have built around the limitations of today’s models.
The problem: attention gets expensive
Modern language models such as GPT, Claude and Gemini are built on the transformer architecture. The core mechanism is attention. The model evaluates how tokens relate to other tokens in the text.
That is powerful. It is also expensive.
When a model has to compare many tokens against one another, the math grows quickly. If you double the amount of context, you do not simply double the work. In classic attention, compute and memory can grow roughly quadratically. That is why long documents, entire codebases, large contract archives and long meeting histories quickly become expensive and heavy to send directly into a model.
This is why so much of today’s AI stack consists of workarounds: RAG, vector databases, chunking, ranking, reranking, agent routing and prompt rules. We do not send everything to the model. We try to find the right small excerpt first.
That often works well enough. But it is also brittle.
What Subquadratic claims
Subquadratic says SubQ 1M-Preview is the first language model built on a fully subquadratic architecture. The company calls the technique Subquadratic Sparse Attention, or SSA.
The point is that the model should not perform every token-to-token comparison. It should learn which relationships actually matter, and spend compute there. According to the company, the selection is content-dependent, based on meaning rather than fixed patterns or position.
The concrete claims are big:
- SubQ 1M-Preview is supposed to offer a 1 million token context window in private beta.
- The company points to a research result of up to 12 million tokens.
- At 12 million tokens, it claims nearly 1,000 times lower attention compute than other frontier models.
- It is launching an API, SubQ Code for codebases through a CLI, and SubQ Search for long-context search.
- It has raised $29 million in seed funding.
Subquadratic has also published benchmark numbers. It claims 95 percent on RULER 128K, roughly in line with Claude Opus 4.6. On MRCR v2, it reports 65.9 for the production model, while GPT-5.5 is listed at 74 and Claude Opus 4.7 at 32.2. On SWE-Bench Verified, it reports 81.8, slightly above Claude Opus 4.6.
Those are strong numbers. Almost too strong to swallow without water.
Why this could matter
If SubQ actually makes long context cheap and stable, it moves an important boundary.
AI systems could read entire codebases in one pass. Not just relevant files found by search. The whole repository. They could read large contract sets, policy archives, technical documentation, support histories, order data and meeting histories without losing context at the boundaries between chunks.
For Norwegian companies, this is interesting because many sit on large amounts of unstructured knowledge in legacy systems, documents, emails, PDFs and wikis. Today’s AI solutions often spend too much energy trying to retrieve the right fragment. A model with genuinely long, cheap and functional context could do more of the work directly.
That could mean less RAG infrastructure. Fewer fragile pipelines. Less need for agent orchestration just to compensate for the model not seeing enough.
That is why this is interesting for CIOs. Not because one Miami startup is suddenly going to replace OpenAI, Anthropic or Google. But because it points to a possible new economy for AI systems.
But this needs a sober read
This is also a story where the marketing is running ahead of the proof.
SubQ is closed. Access is private beta. There is not yet a broad independent model card. The benchmarks are narrow and hit exactly the areas where a long-context model should shine: retrieval, long context and code.
VentureBeat also points to several red flags that should be taken seriously. Some tests may have been run only a small number of times because of cost. The production model scores lower than the research result on MRCR v2. The company has confirmed that it uses open model weights as a starting point. And the industry has fresh examples of companies promising enormous context windows without later proving the technology broadly in the market.
That does not mean this is nonsense. It means it is early.
The right assessment is: very interesting, but do not buy the story until independent tests show that the model survives real workflows.
What leaders should watch for
The most important question is not whether SubQ wins a benchmark. The question is whether the architecture works in practical enterprise situations.
Three tests matter most:
- Can the model actually use the whole context, or only accept it?
Many models have large context windows on paper. The key question is whether they can find, connect and reason over information far out in the context without becoming unstable.
- Is the cost low enough for production?
A long-context model is only useful if it can be run often. If every call turns into a budget meeting, the technology ends up in the demo drawer.
- Can the business control data, access and logging?
When whole codebases, contracts or histories are sent into one model, the requirements for access control, logging, data processing agreements and policy increase. Long context does not make governance less important. It makes it more important.
My assessment
This is one of the AI stories worth following closely.
Not because Subquadratic has already proved everything. It has not. But because the problem it attacks is real. Quadratic scaling, expensive context windows and fragile RAG architecture are among the biggest practical barriers to AI in production.
If SubQ delivers on its promise, it could change how we build AI systems. Entire codebases, document archives and long-running work processes could become the model’s normal working surface, instead of something we have to cut into pieces and glue back together with more and more infrastructure.
But until independent testing is available, this should be treated as a strong candidate, not a conclusion.
The AI industry needs breakthroughs in efficiency more than it needs another chatbot with a slightly better tone. SubQ could be such a breakthrough. Or it could be another reminder that huge context windows are easy to market and hard to prove.
Either way, the signal matters: the next round of the AI race is not just about smarter models. It is about who can make intelligence cheap enough, long enough and stable enough for real operations.
Sources and media
- Subquadratic: “Introducing SubQ: The First Fully Subquadratic LLM”, published May 5, 2026. https://subq.ai/introducing-subq
- VentureBeat: “Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof”, Michael Nuñez, published May 5, 2026. https://venturebeat.com/technology/miami-startup-subquadratic-claims-1-000x-ai-efficiency-gain-with-subq-model-researchers-demand-independent-proof
- Illustration/thumbnail: generated with OpenAI Image 2 for hogby.ai.
📬 Likte du denne?
AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.