⏱️ 09/10 (Wed.) 14:00-14:30 at 2nd Conference Room
Large Language Model (LLM) inference efficiency directly shapes both user experience and deployment costs.
In this lightning talk, we’ll show how vLLM boosts inference throughput through optimized KV cache management and parallelization, and compare it against Ollama, a popular local inference tool.
We’ll walk through a practical, reproducible benchmark that delivers actionable performance insights.
By the end, attendees will have a ready-to-use testing workflow and a clear framework for choosing the right serving stack for their needs.
😊 Share this page to friends:
😊 Share this page to friends:
😊 Share this page to friends: