Member of Technical Staff Batched Inference Server

Doubleword

Member of Technical Staff Batched Inference Server Overview

Company Name Doubleword
Job Role Member of Technical Staff Batched Inference Server
Qualifications Not Specified
Category General Jobs
Job Type Full Time
Location London

This role sits within a company building a high-throughput inference platform for asynchronous AI workloads. The product is aimed at teams running long-lived agents, batch jobs, evaluations, document pipelines, data enrichment, synthetic data generation, and other background tasks where throughput and cost matter more than low-latency responses. The engineering focus is on creating an inference stack that makes the best possible use of GPUs, keeps large-scale workloads moving efficiently, and lowers the cost of serving models at volume.

The work is centered on the batched inference server and the broader infrastructure behind Doubleword’s API and inference products. The platform is designed to support OpenAI-compatible usage patterns, including chat-style requests, responses-style requests, batch processing, tool calling, and structured generation. The team is building systems that can reliably handle queued and chained workloads, process millions of requests, and support production use cases such as classification, extraction, summarization, embeddings, image analysis, model benchmarking, and synthetic dataset creation.

What you would work on

  • Develop and improve the batched inference server and the underlying systems that power large-scale asynchronous model serving.
  • Build infrastructure that increases GPU utilization, improves throughput, and reduces the cost of serving AI workloads at scale.
  • Help support background agents, offline jobs, and other non-interactive workloads that can tolerate longer turnaround times in exchange for lower cost.
  • Contribute to the platform’s OpenAI-compatible API surface, including chat completions, responses-style requests, batch workflows, tool use, and structured outputs.
  • Work on reliability and scaling for workloads that may run continuously, process large document sets, or execute multi-step workflows without human supervision.
  • Support production templates and reusable workbooks for common high-volume tasks such as classification, data processing, enrichment, embeddings, image processing, model evaluation, structured generation, and synthetic data generation.
  • Take part in engineering efforts that improve inference performance and efficiency, including research-informed optimizations such as speculative KV compression, faster cold starts, and queue speculation.
  • Help make the platform suitable for technical customers in applied machine learning, data platforms, LLM infrastructure, and research engineering.

What the company emphasizes

  • The platform is built for workloads where users are not waiting on each individual response, which allows the system to trade latency for lower cost and higher throughput.
  • The company positions its APIs as efficient for a range of service levels, including real-time, asynchronous, and batch modes.
  • It highlights compatibility with common developer workflows so teams can migrate more easily and use familiar request patterns.
  • The product is intended to help customers run background agents and large-scale inference jobs at a fraction of the usual cost.

Experience and skills the role appears to call for

  • Strong experience in inference systems engineering or a closely related infrastructure role.
  • Deep systems engineering ability, especially around production services that must handle high request volume reliably.
  • Practical understanding of GPU inference, batching, throughput optimization, and the engineering tradeoffs between latency and cost.
  • Ability to design and maintain backend systems for AI workloads that are large, continuous, or autonomous.
  • Familiarity with modern model-serving patterns such as OpenAI-compatible APIs, structured generation, and tool calling.
  • Comfort working on infrastructure used by applied ML, data platform, LLM infrastructure, and research engineering teams.
  • Interest in large-scale AI use cases such as data processing pipelines, evaluations, synthetic data generation, document extraction, and other high-volume workloads.

Product and customer context

The platform is presented as a lower-cost alternative for high-volume inference, with pricing examples showing substantial savings compared with other model providers for comparable intelligence. It supports asynchronous agents, synthetic data generation, data processing pipelines, embeddings, async evaluations, bug detection, dataset labeling, structured extraction, image summarization, personal assistants, event listeners, ETL and sanitization workflows, and deep research tasks. The company also provides workbooks and templates that customers can run directly for common use cases.

Doubleword describes itself as being built by inference systems engineers and highlights research work related to speculative KV compression, cold-start improvements, and queue speculation. The company also points to a range of public examples and community use cases showing the platform being used for batch inference, document processing, model evaluation, and multimodal workflows.

What is offered

  • Access to a platform that can be significantly cheaper than comparable model-serving options for asynchronous workloads.
  • No requirement to enter payment details before getting started.
  • No minimum usage commitment.
  • Usage-based billing where customers pay only for the tokens they consume.
  • OpenAI-compatible APIs to simplify migration and integration.
  • Support for tool calling and structured generation.
  • Multiple service modes so customers can choose between latency-optimized, asynchronous, and batch-oriented operation depending on the job.
  • Reusable workbooks and templates for common production workloads.
  • The ability to request access to models that are not currently listed.

Visa and sponsorship information

The page does not provide any statement about UK visa sponsorship, work authorization, or whether sponsorship is available for this role.

How to apply

The page invites interested candidates to get started, run a sample job, read the documentation, or talk to the team. It also includes a careers link indicating that the company is hiring.


Degree Requirement: Not Specified

Visa Sponsorship May be

To apply for this job please visit doubleword.ai.

admin
the authoradmin