Model Serving

The infrastructure for hosting trained models and handling prediction requests at scale. Serving systems manage load balancing, batching, auto-scaling, and hardware allocation. Tools like vLLM, TGI, and Triton are popular for serving large language models.