Tiny-LLM:面向系统工程师的 Apple Silicon LLM Serving 课程

(🚧 WIP) 一个面向系统工程师,关于如何在 Apple Silicon 上部署 LLM 的课程。

skyzh.github.io/tiny-llm/

License

Apache-2.0 license 500 stars 25 forks

skyzh/tiny-llm

main BranchesTags Go to file Code

Folders and files

Name| Name| Last commit message| Last commit date ---|---|---|---

Latest commit

History

50 Commits .github/workflows| .github/workflows .vscode| .vscode book| book src| src tests| tests tests_ref_impl_week1| tests_ref_impl_week1 tests_ref_impl_week2| tests_ref_impl_week2 .clang-format| .clang-format .gitignore| .gitignore LICENSE| LICENSE README.md| README.md build_ext.sh| build_ext.sh check.py| check.py main.py| main.py pdm.lock| pdm.lock pyproject.toml| pyproject.toml View all files

Repository files navigation

tiny-llm - 一周实现 LLM Serving

CI (main)

仍在开发中,处于非常早期的阶段。这是一个使用 MLX 为系统工程师提供 LLM serving 的教程。该代码库完全(几乎!)基于 MLX 数组/矩阵 API,没有任何高级神经网络 API,因此我们可以从头开始构建模型 serving 基础设施并深入研究优化。

目标是学习高效 serving 一个 LLM 模型(例如,Qwen2 模型)背后的技术。

Book

tiny-llm 的 Book 可以在 https://skyzh.github.io/tiny-llm/ 找到。您可以按照指南开始构建。

Community

您可以加入 skyzh 的 Discord 服务器,并与 tiny-llm 社区一起学习。

Join skyzh's Discord Server

Roadmap

Week + Chapter | Topic | Code | Test | Doc ---|---|---|---|--- 1.1 | Attention | ✅ | ✅ | ✅ 1.2 | RoPE | ✅ | ✅ | ✅ 1.3 | Grouped Query Attention | ✅ | 🚧 | 🚧 1.4 | RMSNorm and MLP | ✅ | 🚧 | 🚧 1.5 | Transformer Block | ✅ | 🚧 | 🚧 1.6 | Load the Model | ✅ | 🚧 | 🚧 1.7 | Generate Responses (aka Decoding) | ✅ | ✅ | 🚧 2.1 | KV Cache | ✅ | 🚧 | 🚧 2.2 | Quantized Matmul and Linear - CPU | ✅ | 🚧 | 🚧 2.3 | Quantized Matmul and Linear - GPU | ✅ | 🚧 | 🚧 2.4 | Flash Attention and Other Kernels | 🚧 | 🚧 | 🚧 2.5 | Continuous Batching | 🚧 | 🚧 | 🚧 2.6 | Speculative Decoding | 🚧 | 🚧 | 🚧 2.7 | Prompt/Prefix Cache | 🚧 | 🚧 | 🚧 3.1 | Paged Attention - Part 1 | 🚧 | 🚧 | 🚧 3.2 | Paged Attention - Part 2 | 🚧 | 🚧 | 🚧 3.3 | Prefill-Decode Separation | 🚧 | 🚧 | 🚧 3.4 | Scheduler | 🚧 | 🚧 | 🚧 3.5 | Parallelism | 🚧 | 🚧 | 🚧 3.6 | AI Agent | 🚧 | 🚧 | 🚧 3.7 | Streaming API Server | 🚧 | 🚧 | 🚧

Other topics not covered: quantized/compressed kv cache

About

(🚧 WIP) 一个面向系统工程师,关于如何在 Apple Silicon 上部署 LLM 的课程。 skyzh.github.io/tiny-llm/

Resources

Readme

License

Apache-2.0 license Activity

Stars

500 stars

Watchers

10 watching

Forks

25 forks Report repository

Releases

No releases published

Languages