Skip to main content

llm-d components

The llm-d ecosystem consists of multiple interconnected components that work together to provide distributed inference capabilities for large language models.

Latest Release: v0.3.1​

Released: November 6, 2025

Components​

ComponentDescriptionRepositoryVersionDocumentation
Inference SchedulerThis scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.llm-d/llm-d-inference-schedulerv0.3.2View Docs
Modelservicemodelservice is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).llm-d-incubation/llm-d-modelservicellm-d-modelservice-v0.2.10View Docs
Routing SidecarA reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header.llm-d/llm-d-routing-sidecarv0.3.0View Docs
Inference SimA light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.llm-d/llm-d-inference-simv0.6.1View Docs
InfraA helm chart for deploying gateway and gateway related infrastructure assets for llm-d.llm-d-incubation/llm-d-infrav1.3.3View Docs
Kv Cache ManagerThis repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.llm-d/llm-d-kv-cache-managerv0.3.0View Docs
BenchmarkThis repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.llm-d/llm-d-benchmarkv0.3.0View Docs

Getting Started​

Each component has its own detailed documentation page accessible from the links above. For a comprehensive view of how these components work together, see the main Architecture Overview.

Previous Releases​

For information about previous versions and their features, visit the GitHub Releases page.

Contributing​

To contribute to any of these components, visit their respective repositories and follow their contribution guidelines. Each component maintains its own development workflow and contribution process.