Doubleword logo black
Product
Products
Doubleword API
NEW
Inference built for scale
Doubleword Inference Stack
High performance inference stack
Use Cases
Async Agents
Long running background agents
Synthetic Data Generation
Generate high volumes of data for fine- tuning
Data Processing
Apply intelligence to large volumes of data
Resources
Documentation
Technical docs and API reference
Workbooks
Ready-to-run examples
Seen in the Wild
Community content and projects
Resource Centre
All our blogs and guides
Technical Blog
Our blog on building inference systems
Al Dictionary
Key Al terms explained
Savings Calculator
See how much you save with Doubleword
Solutions
By Deployment Option
On-premiseCloudHybrid
By Team
AI, ML & Data SciencePlatform, DevOps & ITCompliance & Cyber
Pricing
Docs
Pricing
Get started - Free
Get started - Free
Resources
/
Blog
/
Why Long Context Length is Not the Death of RAG
March 1, 2024

Why Long Context Length is Not the Death of RAG

Meryem Arik
Share:
https://doubleword.ai/resources/why-long-context-length-is-not-the-death-of-rag
Copied
To Webinar
•

Google recently announced their new Gemini model can handle over 1 million tokens of context length - a huge leap in AI capabilities. Many have proclaimed this advancement spells the end of Retrieve and Generate (RAG) systems. However, we don't believe long context length represents the demise of RAG, for several key reasons:

Cost and Speed

Long context lengths are extremely expensive to run computationally. The more context provided, the slower and more resource intensive the model inference becomes. RAG systems help reduce the tokens needing processing by retrieving the most relevant passages upfront, enabling faster and cheaper overall results.

Unproven Performance

While impressive in scale, it is still undetermined how accurate Gemini's recall abilities are over such vast contexts. RAG systems aim to optimize the entire pipeline - search, embeddings and ranking - to feed prompts relevant content. Gemini's memory performance over 1 million tokens requires further evaluation.

Loss of Auditability

A major advantage of RAG systems is providing audit trails showing what content was deemed relevant as input. This grant some explainability into the otherwise "black box" workings of AI. With ultra long contexts like Gemini's, auditability is lost by sheer volume, hampering its usefulness for many enterprise use cases.In summary, while an exciting advancement showing AI's potential, long context length alone is unlikely to make RAG obsolete quite yet. The strengths around cost, performance optimization and auditability mean RAG still has significant value in operational environments. We look forward to seeing how these capabilities evolve together over time.

Footnotes

Table of contents:

Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
"
Learn more about self-hosted AI Inference
Subscribe to our newsletter
Thanks you for subscription!
Oops! Something went wrong while submitting the form.

Stop overpaying for inference.

Teams use Doubleword to run low-cost, large-scale inference pipelines for async jobs.
‍
Free credits available to get started.

Get started - Free
Doubleword logo black
AI Inference, Built for Scale.
Products
Doubleword APIDoubleword Inference Stack
Use Cases
Async AgentsSynthetic Data GenerationData Processing
Resources
Seen in the WildDocumentationPricingAsync Pipeline BuilderResource CentreTechnical BlogAI Dictionary
Company
AboutPrivacy PolicyTerms of ServiceData Usage Policy
Careers
Hiring!
Contact
© 2026 Doubleword. All rights reserved.
We use cookies to ensure you get the best experience on our website.
Accept
Deny