Doubleword logo black
Product
Products
Doubleword API
NEW
Inference built for scale
Doubleword Inference Stack
High performance inference stack
Use Cases
Async Agents
Long running background agents
Synthetic Data Generation
Generate high volumes of data for fine- tuning
Data Processing
Apply intelligence to large volumes of data
Resources
Documentation
Technical docs and API reference
Workbooks
Ready-to-run examples
Seen in the Wild
Community content and projects
Resource Centre
All our blogs and guides
Technical Blog
Our blog on building inference systems
Al Dictionary
Key Al terms explained
Savings Calculator
See how much you save with Doubleword
Solutions
By Deployment Option
On-premiseCloudHybrid
By Team
AI, ML & Data SciencePlatform, DevOps & ITCompliance & Cyber
Pricing
Docs
Pricing
Book a demo
Book a demo
Careers
/
Member of Technical Staff: LLM Inference Systems

Member of Technical Staff: LLM Inference Systems

Description: Develop cutting edge inference technology at all levels of the inference stack.

Location: London, UK (Hybrid)

Compensation: Competitive with equity.

How to Apply: Send your CV to fergus.finn@doubleword.ai

‍

Apply for this job
Share:
https://doubleword.ai/careers/senior-research-engineer-llm-inference-systems
Copied

About the Role

We’re seeking a Senior Research Engineer to join our mission of solving the hardest inference challenges in generative AI. You’ll be responsible for developing cutting edge inference technology at all levels of the inference stack. This could involve writing custom kernels for inference, or designing of compute clusters for unique inference needs, or contributing to state of the art open source inference engines.

What You’ll Do

Examples of projects you might work on:

  1. Building and optimizing infrastructure for batch inference workloads: focusing on high throughput, cost-efficient processing
  2. Inferencing fine tuned models as scale: using tools like multi LoRA and multi PEFT inference engines.
  3. Optimizing open source inference engines for offloading-based inference: implementing inference optimizations for severely memory constrained environments.

What We’re Looking For

Note: A good candidate will have 80% of the following quantities. Please apply, even if the following doesn’t describe you perfectly.

‍Core Technical Skills:

  • Strong programming fundamentals
  • Understanding of GPU architectures and their performance characteristics
  • Deep understanding of LLM inference workloads, performance characteristics, and optimization techniques
  • Familiarity with Inference tooling and deep learning libraries (PyTorch, TensorRT, vLLM, SGLang, TensorRT-LLM)

Research Mindset:

  • Curiosity about emerging hardware trends and ML optimization techniques
  • Ability to understand complex research requirements and translate them into infrastructure needs
  • Comfort with ambiguity and rapidly evolving technical landscapes
  • Experience supporting research workflows and experimental systems

About Us

We’re dedicated to making large language models faster, cheaper, and more accessible. Our infrastructure team is laser-focused on LLM inference optimization, pushing the boundaries of what’s possible in terms of performance and cost efficiency while maintaining the reliability needed to serve these models at scale.

We provide competitive compensation, comprehensive benefits, and opportunities for professional growth in one of the most exciting fields in technology.

‍

Doubleword logo black
AI Inference, Built for Scale.
Products
Doubleword APIDoubleword Inference Stack
Use Cases
Async AgentsSynthetic Data GenerationData Processing
Resources
Seen in the WildDocumentationPricingAsync Pipeline BuilderResource CentreTechnical BlogAI Dictionary
Company
AboutPrivacy PolicyTerms of ServiceData Usage Policy
Careers
Hiring!
Contact
© 2026 Doubleword. All rights reserved.
We use cookies to ensure you get the best experience on our website.
Accept
Deny