Block Diffusion：在自回归模型和扩散模型之间插值

Block Diffusion: Interpolating Between Autoregressive and Diffusion Models

Source | HN Comments

该研究提出了名为 Block Diffusion 的语言模型，它在自回归模型和 Diffusion 模型之间进行插值。Block Diffusion 结合了 Diffusion 模型的并行生成和可控性优势，并克服了其似然建模和固定长度生成的局限性。通过支持灵活长度生成、KV caching 和并行 token 采样，提高了推理效率。研究提出了一种有效的训练算法、梯度方差估计器和数据驱动的噪声调度，并在语言建模基准测试中取得了 Diffusion 模型的最先进性能，能够生成任意长度的序列。

Computer Science > Machine Learning

arXiv:2503.09573 (cs) [Submitted on 12 Mar 2025]

Title:Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Authors:Marianne Arriola, Aaron Gokaslan, Justin T Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov View a PDF of the paper titled Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models, by Marianne Arriola and 7 other authors View PDF

Abstract:由于其并行生成和可控性的潜力，Diffusion language models 比自回归模型具有独特的优势，但在似然建模方面落后，并且仅限于固定长度的生成。在这项工作中，我们引入了一类 block diffusion language models，它在离散去噪扩散模型和自回归模型之间进行插值。Block diffusion 通过支持灵活长度的生成，并通过 KV caching 和并行 token 采样提高推理效率，克服了这两种方法的关键限制。我们提出了一种构建有效的 block diffusion 模型的方案，其中包括一种有效的训练算法、梯度方差的估计器和数据驱动的噪声调度，以最小化方差。Block diffusion 在语言建模基准测试中，在 diffusion models 中设定了新的最先进的性能，并能够生成任意长度的序列。我们在项目页面上提供了代码以及模型权重和博客文章：this https URL Comments: | ICLR 2025 Oral. We provide the code at this https URL
---|---
Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: | arXiv:2503.09573 [cs.LG]
(or arXiv:2503.09573v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2503.09573 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Marianne Arriola [view email] [v1] Wed, 12 Mar 2025 17:43:40 UTC (294 KB) Full-text links:

Access Paper:

View a PDF of the paper titled Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models, by Marianne Arriola and 7 other authors