Block Diffusion:在自回归模型和扩散模型之间插值
Computer Science > Machine Learning
arXiv:2503.09573 (cs) [Submitted on 12 Mar 2025]
Title:Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Authors:Marianne Arriola, Aaron Gokaslan, Justin T Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov View a PDF of the paper titled Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models, by Marianne Arriola and 7 other authors View PDF
Abstract:由于其并行生成和可控性的潜力,Diffusion language models 比自回归模型具有独特的优势,但在似然建模方面落后,并且仅限于固定长度的生成。在这项工作中,我们引入了一类 block diffusion language models,它在离散去噪扩散模型和自回归模型之间进行插值。Block diffusion 通过支持灵活长度的生成,并通过 KV caching 和并行 token 采样提高推理效率,克服了这两种方法的关键限制。我们提出了一种构建有效的 block diffusion 模型的方案,其中包括一种有效的训练算法、梯度方差的估计器和数据驱动的噪声调度,以最小化方差。Block diffusion 在语言建模基准测试中,在 diffusion models 中设定了新的最先进的性能,并能够生成任意长度的序列。我们在项目页面上提供了代码以及模型权重和博客文章:this https URL Comments: | ICLR 2025 Oral. We provide the code at this https URL
---|---
Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: | arXiv:2503.09573 [cs.LG]
(or arXiv:2503.09573v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2503.09573 Focus to learn more arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Marianne Arriola [view email] [v1] Wed, 12 Mar 2025 17:43:40 UTC (294 KB) Full-text links:
Access Paper:
View a PDF of the paper titled Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models, by Marianne Arriola and 7 other authors