ENZH

AI Infrastructure Optimizations for Agentic LLM Workflows

A Comprehensive Survey — Systems Research Visual Guide
Covering 2023–2026 top-venue papers, 7 problem areas, papers

OSDI / SOSP / ISCA / FAST MLSys / NeurIPS / ICML EuroSys / ASPLOS / SIGCOMM

Survey Overview

LLM-powered autonomous agents have exposed fundamental mismatches between existing serving infrastructure and agentic workloads. This survey organizes 2023–2026 systems research into 7 problem areas with depth on KV cache management, retention policies, and scheduling.

AI Infra Basics: The Life of a Request
Source-code-level deep dives into vLLM V1, SGLang, LMCache, NVIDIA Dynamo, and Vidur Simulator.
Enter Deep Dive →

Five Characteristics of Agentic Workloads

1
Multi-turn Depth
50–200+ calls vs 1–3 turns
2
Tool Pauses
Tool calls fragment GPU utilization
3
Context Growth
Monotonic growth strains KV cache
4
Redundant Prefill
Cross-turn / cross-agent redundancy
5
CPU Latency
Tool exec dwarfs LLM inference

Tag Legend

OSDI'24 Venue
2511.02230 arXiv
repo OSS
★ Best Paper
Agentic ✓ For Agents
Agentic ~ Partial
PD Disagg P/D Split
Distributed
Loading papers...