Absolute Zero Reasoner
标题:Absolute Zero:基于零数据的强化自博弈推理
Absolute Zero:基于零数据的强化自博弈推理
Andrew Zhao1, Yiran Wu3, Yang Yue1, Tong Wu2, Quentin Xu1, Yang Yue1, Matthieu Lin1, Shenzhi Wang1, Qingyun Wu3, Zilong Zheng2,✉️ 和 Gao Huang1,✉️
1清华大学 2北京通用人工智能研究院 3宾夕法尼亚州立大学
✉️通讯作者
Andrew Zhao: zqc21@mails.tsinghua.edu.cn, Yiran Wu: yiran.wu@psu.edu, Zilong Zheng: zlzheng@bigai.ai, Gao Huang: gaohuang@tsinghua.edu.cn
Andrew Zhao 目前正在寻找2026年6月开始的全职研究科学家工业界职位(最好在北美)。 欢迎发送电子邮件:andrewzhao112@gmail.com 或 zqc21@mails.tsinghua.edu.cn。 他的研究兴趣主要包括用于 LLM 的 RL、推理和 RL 智能体。Yiran Wu (yiran.wu@psu.edu) 也在寻找全职研究科学家职位,开始时间相同。
摘要 & 动机
目前使用具有可验证奖励的强化学习 (RLVR) 训练的推理模型通常依赖于手动管理的数据集,这引发了可扩展性问题,并可能限制未来人工智能在人类定义的任务之外的增长。 为了解决这个问题,我们提出了 Absolute Zero 范式,其中单个模型自主地提出针对自身学习优化的任务,并通过自博弈解决这些任务来改进,完全消除了对外部数据的需求。 这种方法使用来自可验证环境的反馈,确保有根据的学习并防止奖励破解。
我们的实现,Absolute Zero Reasoner (AZR),利用这种范例进行基于代码的推理。 AZR 使用代码执行器来验证自我生成的任务(涵盖演绎、溯因和归纳)并验证解决方案,从而为训练提供可靠的奖励。 尽管使用了零人工管理的数据,但 AZR 在各种编码和数学推理基准上取得了最先进的结果,甚至优于在大型领域内数据集上训练的模型。 这证明了复杂的推理技能有可能纯粹通过自博弈出现,而无需特定领域的监督。
分布外一般推理性能
Absolute Zero Reasoner 以零数据实现了最先进的性能。 在不依赖任何黄金标签或人类定义的查询的情况下,我们使用提出的自博弈方法训练的 Absolute Zero Reasoner 展示了卓越的分布外推理能力,甚至优于在数万个专家标记的领域内示例上训练的模型。
AZR 提出的程序
所有 AZR 提出的代码样本首先嵌入 jina-embeddings-v2-base-code
,然后使用 UMAP 投影到 2D。
尝试不同的模型
选择一个模型 AZR Coder 7B AZR Base 7B AZR Coder 14B AZR Base 14B
将鼠标悬停在点上以探索程序。 单击以锁定选择。
Python 代码 输入 输出 任务信息 训练步骤
旋转六边形 Vibe Check
Prompt: 编写一个脚本,显示 10 个球在旋转的六边形内弹跳。 球应受重力和摩擦力的影响,并且必须真实地从旋转的墙壁上弹起
AZR-Coder-14b (Ours)
代码
import pygame
import math
import random
# Initialize Pygame
pygame.init()
# Set up the display
width, height = 800, 600
screen = pygame.display.set_mode((width, height))
pygame.display.set_caption("Bouncing Balls in a Spinning Hexagon")
# Define colors
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
RED = (255, 0, 0)
# Define constants
FPS = 60
GRAVITY = 0.1
FRICTION = 0.99
BALL_RADIUS = 10
NUM_BALLS = 10
# Define the hexagon
hexagon_center = (width // 2, height // 2)
hexagon_radius = 200
hexagon_vertices = [
(hexagon_center[0] + hexagon_radius * math.cos(2 * math.pi * i / 6),
hexagon_center[1] + hexagon_radius * math.sin(2 * math.pi * i / 6))
for i in range(6)
]
# Define the balls
balls = []
for _ in range(NUM_BALLS):
x = random.randint(BALL_RADIUS, width - BALL_RADIUS)
y = random.randint(BALL_RADIUS, height - BALL_RADIUS)
vx = random.uniform(-5, 5)
vy = random.uniform(-5, 5)
balls.append((x, y, vx, vy, RED))
# Game loop
running = True
clock = pygame.time.Clock()
angle = 0
while running:
# Handle events
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False
# Clear the screen
screen.fill(WHITE)
# Rotate the hexagon
angle += 0.01
rotated_vertices = [
(hexagon_center[0] + (v[0] - hexagon_center[0]) * math.cos(angle) - (v[1] - hexagon_center[1]) * math.sin(angle),
hexagon_center[1] + (v[0] - hexagon_center[0]) * math.sin(angle) + (v[1] - hexagon_center[1]) * math.cos(angle))
for v in hexagon_vertices
]
# Draw the hexagon
pygame.draw.polygon(screen, BLACK, rotated_vertices)
# Update the positions and velocities of the balls
for i, (x, y, vx, vy, color) in enumerate(balls):
# Apply gravity
vy += GRAVITY
# Apply friction
vx *= FRICTION
vy *= FRICTION
# Update position
x += vx
y += vy
# Check for collisions with the hexagon walls
for j in range(6):
v1 = rotated_vertices[j]
v2 = rotated_vertices[(j + 1) % 6]
dx = v2[0] - v1[0]
dy = v2[1] - v1[1]
dist = math.sqrt(dx**2 + dy**2)
nx = dy / dist
ny = -dx / dist
px = x - v1[0]
py = y - v1[1]
proj = px * nx + py * ny
if 0 <= proj <= dist:
if px * ny - py * nx <= BALL_RADIUS:
overlap = BALL_RADIUS - (px * nx + py * ny)
x -= overlap * nx
y -= overlap * ny
vx = -vx * 0.9
vy = -vy * 0.9
# Check for collisions with the screen edges
if x - BALL_RADIUS < 0 or x + BALL_RADIUS > width:
vx = -vx
if y - BALL_RADIUS < 0 or y + BALL_RADIUS > height:
vy = -vy
# Update the ball's position and velocity
balls[i] = (x, y, vx, vy, color)
# Draw the ball
pygame.draw.circle(screen, color, (int(x), int(y)), BALL_RADIUS)
# Update the display
pygame.display.flip()
clock.tick(FPS)
# Quit Pygame
pygame.quit()
配置
{
"parameters": {
"temperature": 0.6,
"top_p": 0.8
},
"generation_config": {
"do_sample": true
}
}
GPT-4o-mini
代码
import pygame
import random
import math
# Initialize Pygame
pygame.init()
# Screen settings
WIDTH, HEIGHT = 800, 600
screen = pygame.display.set_mode((WIDTH, HEIGHT))
pygame.display.set_caption('Bouncing Balls in Spinning Hexagon')
# Colors
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
BALL_COLOR = (255, 0, 0)
HEX_COLOR = (0, 255, 0)
# Constants
GRAVITY = 0.1 # Gravity affecting the balls
FRICTION = 0.99 # Friction affecting the ball's speed when not bouncing
BALL_RADIUS = 10 # Radius of the balls
NUM_BALLS = 10 # Number of balls
# Hexagon settings
HEX_SIZE = 250 # Hexagon radius
HEX_CENTER = (WIDTH // 2, HEIGHT // 2)
# Ball class
class Ball:
def __init__(self, x, y, radius):
self.x = x
self.y = y
self.radius = radius
self.vx = random.uniform(-2, 2) # Initial x velocity
self.vy = random.uniform(-2, 2) # Initial y velocity
def move(self):
self.x += self.vx
self.y += self.vy
self.vy += GRAVITY # Apply gravity
# Apply friction
self.vx *= FRICTION
self.vy *= FRICTION
def bounce(self, normal):
# Reflect the ball's velocity based on the surface normal (for realistic bouncing)
dot = self.vx * normal[0] + self.vy * normal[1]
self.vx -= 2 * dot * normal[0]
self.vy -= 2 * dot * normal[1]
# Function to draw the hexagon
def draw_hexagon(angle):
points = []
for i in range(6):
x = HEX_CENTER[0] + HEX_SIZE * math.cos(math.radians(i * 60 + angle))
y = HEX_CENTER[1] + HEX_SIZE * math.sin(math.radians(i * 60 + angle))
points.append((x, y))
pygame.draw.polygon(screen, HEX_COLOR, points, 5)
# Function to check if the ball is inside the hexagon
def check_collision_with_hexagon(ball):
angle = pygame.time.get_ticks() / 10 # Time-dependent rotation angle
# Hexagon vertices (rotated)
hex_points = []
for i in range(6):
x = HEX_CENTER[0] + HEX_SIZE * math.cos(math.radians(i * 60 + angle))
y = HEX_CENTER[1] + HEX_SIZE * math.sin(math.radians(i * 60 + angle))
hex_points.append((x, y))
for i in range(6):
p1 = hex_points[i]
p2 = hex_points[(i + 1) % 6]
# Calculate the normal vector of the edge
edge = (p2[0] - p1[0], p2[1] - p1[1])
normal = (-edge[1], edge[0]) # Perpendicular to the edge
# Normalize the normal
length = math.sqrt(normal[0]**2 + normal[1]**2)
normal = (normal[0] / length, normal[1] / length)
# Check if the ball is outside the hexagon (distance from the center to the edge)
ball_to_edge = (ball.x - p1[0], ball.y - p1[1])
dist = ball_to_edge[0] * normal[0] + ball_to_edge[1] * normal[1]
if dist < ball.radius:
ball.bounce(normal)
break
# Create a list of balls
balls = [Ball(random.randint(50, WIDTH-50), random.randint(50, HEIGHT-50), BALL_RADIUS) for _ in range(NUM_BALLS)]
# Main game loop
running = True
clock = pygame.time.Clock()
while running:
screen.fill(BLACK) # Clear screen
# Handle events
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False
# Draw the spinning hexagon
angle = pygame.time.get_ticks() / 10 # Get a time-based rotation angle
draw_hexagon(angle)
# Update and draw the balls
for ball in balls:
ball.move()
check_collision_with_hexagon(ball)
# Draw ball
pygame.draw.circle(screen, BALL_COLOR, (int(ball.x), int(ball.y)), ball.radius)
# Refresh the display
pygame.display.flip()
# Cap the frame rate
clock.tick(60)
# Quit Pygame
pygame.quit()
配置
{
"query_date": "2025-05-03",
"api_location": "https://chatgpt.com/"
}
Qwen2.5-72B-Instruct
代码
import pygame
import math
import random
# Initialize Pygame
pygame.init()
# Screen dimensions
WIDTH, HEIGHT = 800, 600
screen = pygame.display.set_mode((WIDTH, HEIGHT))
pygame.display.set_caption("Balls Bouncing in a Spinning Hexagon")
# Colors
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
RED = (255, 0, 0)
# Hexagon properties
hexagon_center = (WIDTH // 2, HEIGHT // 2)
hexagon_radius = 200
hexagon_sides = 6
hexagon_angle = 0 # Initial angle of the hexagon
hexagon_rotation_speed = 0.01 # Rotation speed in radians per frame
# Ball properties
num_balls = 10
balls = []
ball_radius = 10
gravity = 0.5
friction = 0.99
# Function to create a ball
def create_ball():
x = random.randint(ball_radius, WIDTH - ball_radius)
y = random.randint(ball_radius, HEIGHT - ball_radius)
vx = random.uniform(-3, 3)
vy = random.uniform(-3, 3)
return [x, y, vx, vy]
# Create balls
for _ in range(num_balls):
balls.append(create_ball())
# Function to draw the hexagon
def draw_hexagon(angle):
points = []
for i in range(hexagon_sides):
angle_i = angle + i * (2 * math.pi / hexagon_sides)
x = hexagon_center[0] + hexagon_radius * math.cos(angle_i)
y = hexagon_center[1] + hexagon_radius * math.sin(angle_i)
points.append((x, y))
pygame.draw.polygon(screen, RED, points, 3)
# Function to check collision with the hexagon
def check_collision(ball, angle):
x, y, vx, vy = ball
for i in range(hexagon_sides):
angle_i = angle + i * (2 * math.pi / hexagon_sides)
angle_next = angle + (i + 1) * (2 * math.pi / hexagon_sides)
x1 = hexagon_center[0] + hexagon_radius * math.cos(angle_i)
y1 = hexagon_center[1] + hexagon_radius * math.sin(angle_i)
x2 = hexagon_center[0] + hexagon_radius * math.cos(angle_next)
y2 = hexagon_center[1] + hexagon_radius * math.sin(angle_next)
# Check collision with the line segment (x1, y1) to (x2, y2)
dx = x2 - x1
dy = y2 - y1
dot = dx * (x - x1) + dy * (y - y1)
len_sq = dx * dx + dy * dy
param = -1
if len_sq != 0: # in case of a zero length line
param = dot / len_sq
if param < 0:
nx = x1
ny = y1
elif param > 1:
nx = x2
ny = y2
else:
nx = x1 + param * dx
ny = y1 + param * dy
dx = x - nx
dy = y - ny
dist = math.sqrt(dx * dx + dy * dy)
if dist < ball_radius:
# Calculate the normal and tangent vectors
nx = (x - nx) / dist
ny = (y - ny) / dist
tx = -ny
ty = nx
# Reflect the velocity
dot = vx * nx + vy * ny
vx = vx - 2 * dot * nx
vy = vy - 2 * dot * ny
# Move the ball away from the wall
x += nx * (ball_radius - dist)
y += ny * (ball_radius - dist)
ball[0] = x
ball[1] = y
ball[2] = vx
ball[3] = vy
# Main loop
running = True
clock = pygame.time.Clock()
while running:
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False
# Update ball positions and velocities
for ball in balls:
x, y, vx, vy = ball
vy += gravity
x += vx
y += vy
vx *= friction
vy *= friction
ball[0] = x
ball[1] = y
ball[2] = vx
ball[3] = vy
# Check collision with the hexagon
check_collision(ball, hexagon_angle)
# Update hexagon angle
hexagon_angle += hexagon_rotation_speed
# Draw everything
screen.fill(BLACK)
draw_hexagon(hexagon_angle)
for ball in balls:
pygame.draw.circle(screen, WHITE, (int(ball[0]), int(ball[1])), ball_radius)
pygame.display.flip()
clock.tick(60)
pygame.quit()
配置
{
"api_location": "https://huggingface.co/Qwen/Qwen2.5-Coder-72B-Instruct"
}
Qwen2.5-32B-Instruct
代码
import pygame
import math
import random
# Initialize Pygame
pygame.init()
# Set up the display
screen_width, screen_height = 800, 600
win = pygame.display.set_mode((screen_width, screen_height))
pygame.display.set_caption("Bouncing Balls Inside Spinning Hexagon")
# Colors
WHITE = (255, 255, 255)
RED = (255, 0, 0)
# Constants for the simulation
FPS = 60
GRAVITY = 0.1
BALL_RADIUS = 10
FRIC_COEFFICIENT = 0.98 # To reduce velocity upon collision.
MAX_ANGLE_DEGREES = 360
DEG_TO_RAD = math.pi / 180 # Conversion factor between degrees and radians.
class Ball:
def __init__(self):
self.pos = pygame.Vector2(random.randint(BALL_RADIUS, screen_width-BALL_RADIUS),
random.randint(BALL_RADIUS, screen_height-BALL_RADIUS))
self.vel = pygame.Vector2(random.uniform(-3, 3), random.uniform(-3, 3))
def update(self):
self.vel.y += GRAVITY
self.pos += self.vel
# Check for collisions with borders
if self.pos.x - BALL_RADIUS <= 0 or self.pos.x + BALL_RADIUS >= screen_width:
self.vel.x *= -FRIC_COEFFICIENT
if self.pos.x - BALL_RADIUS <= 0:
self.pos.x = BALL_RADIUS
elif self.pos.x + BALL_RADIUS >= screen_width:
self.pos.x = screen_width - BALL_RADIUS
if self.pos.y - BALL_RADIUS <= 0 or self.pos.y + BALL_RADIUS >= screen_height:
self.vel.y *= -FRIC_COEFFICIENT
if self.pos.y - BALL_RADIUS <= 0:
self.pos.y = BALL_RADIUS
elif self.pos.y + BALL_RADIUS >= screen_height:
self.pos.y = screen_height - BALL_RADIUS
def draw(self, win):
pygame.draw.circle(win, RED, (int(self.pos.x), int(self.pos.y)), BALL_RADIUS)
def get_hexagon_points(center, radius, angle_degrees):
"""Generate points of a hexagon."""
points = []
for i in range(6):
theta = (math.pi * 2) / 6 * i + angle_degrees * DEG_TO_RAD
x = center.x + radius * math.cos(theta)
y = center.y + radius * math.sin(theta)
points.append(pygame.Vector2(x, y))
return points
def check_bounce(ball, poly_points):
"""Check if the ball has hit any of the edges and bounce off appropriately."""
p1, p2, p3, p4, p5, p6 = poly_points
for start_point, end_point in [(p1, p2), (p2, p3), (p3, p4), (p4, p5), (p5, p6), (p6, p1)]:
line_vec = end_point - start_point
normal_vec = pygame.Vector2(-line_vec.y, line_vec.x).normalize() # Normal vector to edge
# Line equation parameters: Ax + By + C = 0
A = normal_vec.y
B = -normal_vec.x
C = -(A * start_point.x + B * start_point.y)
# Point-line-distance formula
distance = A * ball.pos.x + B * ball.pos.y + C
distance /= math.sqrt(A**2 + B**2)
if abs(distance) < BALL_RADIUS:
# Ball is hitting this edge.
dist_vec = ball.pos - (start_point + end_point) / 2
dot_product = dist_vec.dot(normal_vec)
reflection_vec = ball.vel.reflect(normal_vec)
if dot_product < 0 and reflection_vec.x * A + reflection_vec.y * B > 0:
# Reflect and apply some slowdown using friction coefficient.
ball.vel = reflection_vec * FRIC_COEFFICIENT
def redraw_window(win, balls, poly_points):
win.fill(WHITE)
pygame.draw.polygon(win, (0, 0, 255), [(p.x, p.y) for p in poly_points])
for ball in balls:
ball.draw(win)
pygame.display.update()
def main():
run = True
clock = pygame.time.Clock()
balls = [Ball() for _ in range(10)]
hexagon_center = pygame.Vector2(screen_width // 2, screen_height // 2)
hexagon_radius = screen_height // 4
angle_degrees = 0
while run:
dt = clock.tick(FPS) / 1000 # time elapsed since last frame in seconds
for event in pygame.event.get():
if event.type == pygame.QUIT:
run = False
# Spin the hexagon by increasing the angle
angle_degrees = (angle_degrees + 1 * dt * MAX_ANGLE_DEGREES) % MAX_ANGLE_DEGREES
poly_points = get_hexagon_points(hexagon_center, hexagon_radius, angle_degrees)
# Update each ball and check for collision with hexagon walls.
for ball in balls:
ball.update()
check_bounce(ball, poly_points)
redraw_window(win, balls, poly_points)
pygame.quit()
if __name__ == "__main__":
main()
配置
{
"api_location": "https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct"
}
Qwen2.5-14B-Instruct
代码
import pygame
import sys
import math
import random
# Initialize Pygame
pygame.init()
# Screen dimensions
SCREEN_WIDTH = 800
SCREEN_HEIGHT = 600
screen = pygame.display.set_mode((SCREEN_WIDTH, SCREEN_HEIGHT))
pygame.display.set_caption("Bouncing Balls in Spinning Hexagon")
# Colors
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
RED = (255, 0, 0)
# Ball properties
BALL_RADIUS = 10
BALL_COUNT = 10
balls = []
# Hexagon properties
HEX_SIDES = 6
HEX_RADIUS = 200
HEX_CENTER = (SCREEN_WIDTH // 2, SCREEN_HEIGHT // 2)
HEX_SPIN_SPEED = 0.01
# Gravity and friction
GRAVITY = 0.5
FRICTION = 0.99
def create_balls():
for _ in range(BALL_COUNT):
x = SCREEN_WIDTH // 2 + (random.randint(-100, 100))
y = SCREEN_HEIGHT // 2 + (random.randint(-100, 100))
vx = random.uniform(-3, 3)
vy = random.uniform(-3, 3)
balls.append([x, y, vx, vy])
def draw_hexagon(angle):
points = []
for i in range(HEX_SIDES):
angle_rad = math.radians(i * 60 - angle)
x = HEX_CENTER[0] + HEX_RADIUS * math.cos(angle_rad)
y = HEX_CENTER[1] + HEX_RADIUS * math.sin(angle_rad)
points.append((int(x), int(y)))
return points
def check_collision(ball, points):
px, py = ball[:2]
for i in range(len(points)):
j = (i + 1) % len(points)
if is_point_in_triangle(px, py, points[i], points[j], HEX_CENTER):
return True
return False
def is_point_in_triangle(px, py, p1, p2, p3):
def sign(p1, p2, p3):
return (p1[0] - p3[0]) * (p2[1] - p3[1]) - (p2[0] - p3[0]) * (p1[1] - p3[1])
b1 = sign((px, py), p1, p2) < 0
b2 = sign((px, py), p2, p3) < 0
b3 = sign((px, py), p3, p1) < 0
retu