How each strategy handles KV cache during tool-call gaps under high memory pressure (KV cache >90%)
KV freed immediately. Blocks evicted by other requests → full recompute on return.
KV pinned in VRAM. Tool returns before TTL expires → cache hit, zero recompute.
KV offloaded to DRAM, GPU freed. On tool return, reload from DRAM.
KV offloaded to DRAM, GPU freed. Prefetch KV back before tool returns → zero wait.
KV freed immediately. Blocks evicted → full recompute on return.
KV pinned, but TTL expires before tool returns → blocks unpinned and evicted → must recompute.
KV safe in DRAM regardless of tool duration. On return, reload from DRAM.
KV safe in DRAM. Prefetch completes before tool returns — same performance as short tool case.