VIP / Speakers

Lighting Show Unlocking LLM Inference with vLLM: A Quick Benchmark Guide

Time / Place:

⏱️ 09/10 (Wed.) 14:00-14:30 at 2nd Conference Room

Abstract:

Large Language Model (LLM) inference efficiency directly shapes both user experience and deployment costs.

In this lightning talk, we’ll show how vLLM boosts inference throughput through optimized KV cache management and parallelization, and compare it against Ollama, a popular local inference tool.

We’ll walk through a practical, reproducible benchmark that delivers actionable performance insights.

By the end, attendees will have a ready-to-use testing workflow and a clear framework for choosing the right serving stack for their needs.

😊 Share this page to friends:

Biography:

莊翔甯
Website: https://nickzhuang.com
承羽智慧 / C.Y.Intelligence / 研發部 - 技術長 / RDD - CTO
- CTO of C.Y. Intelligence — leading AI infrastructure and deployment strategy
- Author of 《圖像生成式 AI 的生存指南—以 Stable Diffusion 為例》 — sharing practical insights on generative AI image workflows
- Specialist in infrastructure & container orchestration — experienced with Docker, Kubernetes, Rancher, and cloud-native environments
- Developer of AI-enabled platforms — integrated tools like Stable Diffusion, Whisper, and RAG systems into production environments
- Background in Mathematics — strong foundation in analysis and mechanics, enabling deep technical understanding of AI architectures
- Hands-on with performance benchmarking — from AI model serving optimization to large-scale GPU workload management
- Public speaker on generative AI — shared expertise in open-source AI tools, attention mechanisms, and future AI hardware/software architecture trends

😊 Share this page to friends:

Previous « 林庭箴 - 品牌行銷如何應用AI來創造更深度的消費者溝通

Next賴穎萱 - AI驅動AIoT×GeoAI智慧防災應用 »

Cooperation or Ask questions: ticket@aiacademy.tw