Tiny-LLM – a course of serving LLM on Apple Silicon for systems engineers

Source | HN Comments

该课程名为 "Tiny-LLM"，旨在为系统工程师提供在 Apple Silicon 上部署 LLM 的教程。课程基于 MLX，重点讲解 LLM serving 的技术细节，从基础的 Attention 机制到 KV Cache、量化、Flash Attention 等优化技术。目前仍在开发中，目标是帮助学习者高效地 serving LLM 模型，例如 Qwen2 模型。课程提供在线文档和社区支持。

Tiny-LLM：面向系统工程师的 Apple Silicon LLM Serving 课程

(🚧 WIP) 一个面向系统工程师，关于如何在 Apple Silicon 上部署 LLM 的课程。

skyzh.github.io/tiny-llm/

License

Apache-2.0 license 500 stars 25 forks

skyzh/tiny-llm

main Branches Tags Go to file Code

Folders and files

Name| Name| Last commit message| Last commit date ---|---|---|---

Latest commit

History

Repository files navigation

tiny-llm - 一周实现 LLM Serving

仍在开发中，处于非常早期的阶段。这是一个使用 MLX 为系统工程师提供 LLM serving 的教程。该代码库完全（几乎！）基于 MLX 数组/矩阵 API，没有任何高级神经网络 API，因此我们可以从头开始构建模型 serving 基础设施并深入研究优化。

目标是学习高效 serving 一个 LLM 模型（例如，Qwen2 模型）背后的技术。

Book

tiny-llm 的 Book 可以在 https://skyzh.github.io/tiny-llm/ 找到。您可以按照指南开始构建。

Community

您可以加入 skyzh 的 Discord 服务器，并与 tiny-llm 社区一起学习。

Roadmap

Week + Chapter | Topic | Code | Test | Doc ---|---|---|---|--- 1.1 | Attention | ✅ | ✅ | ✅ 1.2 | RoPE | ✅ | ✅ | ✅ 1.3 | Grouped Query Attention | ✅ | 🚧 | 🚧 1.4 | RMSNorm and MLP | ✅ | 🚧 | 🚧 1.5 | Transformer Block | ✅ | 🚧 | 🚧 1.6 | Load the Model | ✅ | 🚧 | 🚧 1.7 | Generate Responses (aka Decoding) | ✅ | ✅ | 🚧 2.1 | KV Cache | ✅ | 🚧 | 🚧 2.2 | Quantized Matmul and Linear - CPU | ✅ | 🚧 | 🚧 2.3 | Quantized Matmul and Linear - GPU | ✅ | 🚧 | 🚧 2.4 | Flash Attention and Other Kernels | 🚧 | 🚧 | 🚧 2.5 | Continuous Batching | 🚧 | 🚧 | 🚧 2.6 | Speculative Decoding | 🚧 | 🚧 | 🚧 2.7 | Prompt/Prefix Cache | 🚧 | 🚧 | 🚧 3.1 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 3.2 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 3.3 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 3.4 | Scheduler | 🚧 | 🚧 | 🚧 3.5 | Parallelism | 🚧 | 🚧 | 🚧 3.6 | AI Agent | 🚧 | 🚧 | 🚧 3.7 | Streaming API Server | 🚧 | 🚧 | 🚧

Other topics not covered: quantized/compressed kv cache

About

(🚧 WIP) 一个面向系统工程师，关于如何在 Apple Silicon 上部署 LLM 的课程。 skyzh.github.io/tiny-llm/

Releases

No releases published

Tiny-LLM – a course of serving LLM on Apple Silicon for systems engineers

Tiny-LLM：面向系统工程师的 Apple Silicon LLM Serving 课程

License

skyzh/tiny-llm

Folders and files

Latest commit

History

Repository files navigation

tiny-llm - 一周实现 LLM Serving

Book

Community

Roadmap

About

Resources

License

Stars

Watchers

Forks

Releases

Languages