Doubleword logo black
Product
Products
Doubleword API
NEW
Inference built for scale
Doubleword Inference Stack
High performance inference stack
Use Cases
Async Agents
Long running background agents
Synthetic Data Generation
Generate high volumes of data for fine- tuning
Data Processing
Apply intelligence to large volumes of data
Resources
Documentation
Technical docs and API reference
Workbooks
Ready-to-run examples
Seen in the Wild
Community content and projects
Resource Centre
All our blogs and guides
Technical Blog
Our blog on building inference systems
Al Dictionary
Key Al terms explained
Savings Calculator
See how much you save with Doubleword
Solutions
By Deployment Option
On-premiseCloudHybrid
By Team
AI, ML & Data SciencePlatform, DevOps & ITCompliance & Cyber
Pricing
Docs
Pricing
Get started - Free
Get started - Free
Resources
/
Blog
/
Announcing Titan Takeoff 0.7.0
December 9, 2023

Announcing Titan Takeoff 0.7.0

Meryem Arik
Share:
https://doubleword.ai/resources/announcing-titan-takeoff-0-7-0
Copied
To Webinar
•

We are delighted to announce Titan Takeoff 0.7.0 to our clients. This release comes with a number of features that allow our users to build more scalable and higher throughput systems.

What's new?

Continuous batching

Continuous batching is an algorithm that increases the throughput of LLM serving. It allows the size of the batch that your ML model is working on to grow and shrink dynamically over time, which means that responses are served to users more quickly at high load - dramatically improving throughput. More info here (https://docs.titanml.co/docs/next/titan-takeoff/pro-features/batching).

Batched sampling

Previously, the batching implementation in the Titan Takeoff Inference Server meant that only requests with the same generation parameters (including JSON schema, regex strings) could be batched together to be worked on in the server. This release removes that restriction making it easier to deploy multiple applications at scale.

Request cancellation

This release allows requests to be cancelled in flight. No more waiting for the playground to finish processing a request you don't care about - this was a much requested feature!

Multi-GPU

We have made a number of changes to our Multi-GPU offering which is required when deploying larger models. This release makes changes to this backend to improve performance.

Other things!

  • Licence keys - the Titan Takeoff Inference Server now has a new way of distributing licence keys
  • Better error handling in the frontend

We are excited to release this to our customers. We do this so our clients always have access to the best technology so they can move forward with confidence. We have lots of exciting features and releases under development at the moment, so stay tuned for Takeoff 0.8.0!

Footnotes

Table of contents:

Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
"
Learn more about self-hosted AI Inference
Subscribe to our newsletter
Thanks you for subscription!
Oops! Something went wrong while submitting the form.

Stop overpaying for inference.

Teams use Doubleword to run low-cost, large-scale inference pipelines for async jobs.
‍
Free credits available to get started.

Get started - Free
Doubleword logo black
AI Inference, Built for Scale.
Products
Doubleword APIDoubleword Inference Stack
Use Cases
Async AgentsSynthetic Data GenerationData Processing
Resources
Seen in the WildDocumentationPricingAsync Pipeline BuilderResource CentreTechnical BlogAI Dictionary
Company
AboutPrivacy PolicyTerms of ServiceData Usage Policy
Careers
Hiring!
Contact
© 2026 Doubleword. All rights reserved.
We use cookies to ensure you get the best experience on our website.
Accept
Deny