Doubleword logo black
Product
Products
Doubleword API
NEW
Inference built for scale
Doubleword Inference Stack
High performance inference stack
Use Cases
Async Agents
Long running background agents
Synthetic Data Generation
Generate high volumes of data for fine- tuning
Data Processing
Apply intelligence to large volumes of data
Resources
Documentation
Technical docs and API reference
Workbooks
Ready-to-run examples
Seen in the Wild
Community content and projects
Resource Centre
All our blogs and guides
Technical Blog
Our blog on building inference systems
Al Dictionary
Key Al terms explained
Savings Calculator
See how much you save with Doubleword
Solutions
By Deployment Option
On-premiseCloudHybrid
By Team
AI, ML & Data SciencePlatform, DevOps & ITCompliance & Cyber
Pricing
Docs
Pricing
Get started - Free
Get started - Free
Resources
/
Blog
/
Introducing the Titan Takeoff Inference Server 🛫
July 17, 2023

Introducing the Titan Takeoff Inference Server 🛫

Fergus Finn
Share:
https://doubleword.ai/resources/introducing-the-titan-takeoff-inference-server
Copied
To Webinar
•

Experience unprecedented speed in inference of large language models (LLMs) — even on a CPU.

Just last week, we showcased a Falcon 7B operating with real-time inference on a standard CPU (🤯). Our demonstration caught the attention of data scientists and ML engineers who were astounded at the feasibility of such a process. They wanted to achieve this kind of memory compression and speed up for themselves!

Today we introduce, The Titan Takeoff Inference Server!

Our mission with the Titan Takeoff Inference Server is to make it remarkably straightforward to perform rapid real-time inference even with large open source language models. We’ve incorporated features that allow you to experiment with these models swiftly — it’s the fastest way to evaluate models on your preferred hardware!

Use cases

The Titan Takeoff Server opens up new use cases by making large language models more accessible. Real-time inference on low cost, readily available devices will drastically change the landscape of LLM powered applications. As the cost of fine tuning comes down, the capabilities of small models will only improve over time.

Here are just a few examples of apps that our team has built on top of the Takeoff server over the last few weeks:

  • An automated technical article summarization tool.
  • A writing assistant, designed to identify negative writing habits and correct them on the fly.
  • A knowledge graph extraction tool for news articles.

These are just the tip of the iceberg, showcasing applications that demand swift and accurate inference built on the robust TitanML inference and fine-tuning infrastructure.

Performance benchmarks

We have benchmarked the inference server on GPU and CPU. We have seen speeds up to 10x faster with 4x lower memory requirements compared to running the base model implementation 🤯.

We have a lot of worked lined up to improve these benchmarks even more, so stay tuned!

Getting started is a breeze

You can jump-start your journey with Titan Takeoff by creating a free TitanML account. Then, you’re just a few lines of code away from unlocking its power:

#install the local python package pip install titan-iris #launch the takeoff server with your model of choice iris takeoff --model tiiuae/falcon-7b-instruct --device cpu

Check out our comprehensive documentation here, and join our vibrant Discord community here. Also, don’t miss our end-to-end demo here:

Some first project ideas…

As well as good generalist models, like the Falcon 7b instruct model, there are a number of models designed for a number of use cases that you could try…

  • Build chatbots with models like Vicuna (lmsys/vicuna-7b-v1.3)
  • Create summarisers using Bart (facebook/bart-large-cnn)
  • Develop locally-run coding assistants with models like CodeGen (NumbersStation/nsql-2B)

We can’t wait to hear about what you build with the Titan Takeoff Inference Server!

About TitanML

TitanML is an NLP development platform and service focusing on the deployability of LLMs — allowing businesses to build smaller and cheaper deployments of language models in just days (95% cheaper than using existing models like Open AI’s GPT). TitanML uses proprietary automated efficient fine-tuning and inference optimisation techniques, allowing businesses to build state-of-the-art language models in house effortlessly.

Our documentation and Discord community are here for your support.

A quick note about licensing — the Titan Takeoff Inference Server is free to use in personal/academic projects (please credit us if you write it up publically! 😉) — message us at hello@titanml.co if you would explore using the inference server for commercial purposes.

Written by Meryem Arik

Footnotes

Table of contents:

Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
"
Learn more about self-hosted AI Inference
Subscribe to our newsletter
Thanks you for subscription!
Oops! Something went wrong while submitting the form.

Stop overpaying for inference.

Teams use Doubleword to run low-cost, large-scale inference pipelines for async jobs.
‍
Free credits available to get started.

Get started - Free
Doubleword logo black
AI Inference, Built for Scale.
Products
Doubleword APIDoubleword Inference Stack
Use Cases
Async AgentsSynthetic Data GenerationData Processing
Resources
Seen in the WildDocumentationPricingAsync Pipeline BuilderResource CentreTechnical BlogAI Dictionary
Company
AboutPrivacy PolicyTerms of ServiceData Usage Policy
Careers
Hiring!
Contact
© 2026 Doubleword. All rights reserved.
We use cookies to ensure you get the best experience on our website.
Accept
Deny