6502程序员使用的奇技淫巧 (2019)

Dirty tricks 6502 programmers use (2019)

Source | HN Comments

这篇文章分享了在 C64 编程竞赛中使用的汇编技巧，目标是用最少字节绘制两条线。文章介绍了基础知识，包括屏幕RAM和颜色RAM的结构。核心技巧包括：使用滚动避免16位地址计算、自修改代码优化、利用上电状态的寄存器和零页数据、以及更小的启动方式。这些技巧帮助程序员在字节数上进行极致优化，最终实现代码的精简。

6502程序员使用的奇技淫巧

Janne Hellsten 于 2019 年 8 月 18 日

这篇文章回顾了我在一个小型 C64 编程竞赛中使用的一些 C64 编程技巧。比赛规则很简单：创建一个 C64 可执行文件 (PRG)，绘制两条线形成下图。目标是用尽可能少的字节完成。参赛作品以 Twitter 回复和私信的形式发布，仅包含 PRG 字节长度和 PRG 文件的 MD5 哈希值。

以下是参赛者列表，以及他们提交作品的源代码链接：

Philip Heron (code - 34 bytes - 比赛获胜者)
Geir Straume (code - 34 bytes)
Petri Häkkinen (code - 37 bytes)
Mathlev Raxenblatz (code - 38 bytes)
Jan Achrenius (code - 48 bytes)
Jamie Fuller (code - 50 bytes)
David A. Gershman (code - 53 bytes)
Janne Hellsten (code - 56 bytes)

(如果我遗漏了某人，请告诉我，我会更新这篇文章。)

这篇文章的其余部分重点介绍比赛提交作品中使用的一些汇编编程技巧。

基础知识

C64 默认的图形模式是 40x25 字符集模式。帧缓冲区分为 RAM 中的两个数组：

$0400 (Screen RAM, 40x25 bytes)
$d800 (Color RAM, 40x25 bytes)

要设置一个字符，您需要将一个字节存储到位于 $0400 的 Screen RAM 中（例如，$0400+y*40+x）。 Color RAM 默认初始化为浅蓝色（颜色 14），这恰好是我们用于线条的颜色，这意味着我们可以保持 Color RAM 不受影响。

您可以使用内存映射 I/O 寄存器在 $d020（边框）和 $d021（背景）来控制边框和背景颜色。

由于我们可以为固定的线条斜率进行硬编码，因此绘制两条线非常容易。这是一个 C 实现，它绘制线条并将屏幕内容转储到标准输出（寄存器写入被存根化，并且屏幕 RAM 被 malloc() 以使其在 PC 上运行）：

[](https://nurpax.github.io/posts/<#cb1-1>)#include <stdint.h>
[](https://nurpax.github.io/posts/<#cb1-2>)#include <stdio.h>
[](https://nurpax.github.io/posts/<#cb1-3>)#include <stdlib.h>
[](https://nurpax.github.io/posts/<#cb1-4>)

[](https://nurpax.github.io/posts/<#cb1-5>)void dump(const uint8_t* screen) {
[](https://nurpax.github.io/posts/<#cb1-6>)  const uint8_t* s = screen;
[](https://nurpax.github.io/posts/<#cb1-7>)  for (int y = 0; y < 25; y++) {
[](https://nurpax.github.io/posts/<#cb1-8>)    for (int x = 0; x < 40; x++, s++) {
[](https://nurpax.github.io/posts/<#cb1-9>)      printf("%c", *s == 0xa0 ? '#' : '.');
[](https://nurpax.github.io/posts/<#cb1-10>)    }
[](https://nurpax.github.io/posts/<#cb1-11>)    printf("\n");
[](https://nurpax.github.io/posts/<#cb1-12>)  }
[](https://nurpax.github.io/posts/<#cb1-13>)}
[](https://nurpax.github.io/posts/<#cb1-14>)

[](https://nurpax.github.io/posts/<#cb1-15>)void setreg(uintptr_t dst, uint8_t v) {
[](https://nurpax.github.io/posts/<#cb1-16>)// *((uint8_t *)dst) = v;
[](https://nurpax.github.io/posts/<#cb1-17>)}
[](https://nurpax.github.io/posts/<#cb1-18>)

[](https://nurpax.github.io/posts/<#cb1-19>)int main() {
[](https://nurpax.github.io/posts/<#cb1-20>)// uint8_t* screenRAM = (uint_8*)0x0400;
[](https://nurpax.github.io/posts/<#cb1-21>)  uint8_t* screenRAM = (uint8_t *)calloc(40*25, 0x20);
[](https://nurpax.github.io/posts/<#cb1-22>)

[](https://nurpax.github.io/posts/<#cb1-23>)  setreg(0xd020, 0); // Set border color
[](https://nurpax.github.io/posts/<#cb1-24>)  setreg(0xd021, 0); // Set background color
[](https://nurpax.github.io/posts/<#cb1-25>)

[](https://nurpax.github.io/posts/<#cb1-26>)  int yslope = (25<<8)/40;
[](https://nurpax.github.io/posts/<#cb1-27>)  int yf = yslope/2;
[](https://nurpax.github.io/posts/<#cb1-28>)  for (int x = 0; x < 40; x++) {
[](https://nurpax.github.io/posts/<#cb1-29>)    int yi = yf >> 8;
[](https://nurpax.github.io/posts/<#cb1-30>)    // First line
[](https://nurpax.github.io/posts/<#cb1-31>)    screenRAM[x + yi*40] = 0xa0;
[](https://nurpax.github.io/posts/<#cb1-32>)    // Second line (X-mirrored)
[](https://nurpax.github.io/posts/<#cb1-33>)    screenRAM[(39-x) + yi*40] = 0xa0;
[](https://nurpax.github.io/posts/<#cb1-34>)    yf += yslope;
[](https://nurpax.github.io/posts/<#cb1-35>)  }
[](https://nurpax.github.io/posts/<#cb1-36>)

[](https://nurpax.github.io/posts/<#cb1-37>)  dump(screenRAM);
[](https://nurpax.github.io/posts/<#cb1-38>)}

上面使用的屏幕代码是：$20 (blank) 和 $a0 (8x8 filled block)。如果你运行它，你应该看到两条线的 ASCII 艺术：

##....................................##
..#..................................#..
...##..............................##...
.....#............................#.....
......##........................##......
........##....................##........
..........#..................#..........
...........##..............##...........
.............#............#.............
..............##........##..............
................##....##................
..................#..#..................
...................##...................
..................#..#..................
................##....##................
..............##........##..............
.............#............#.............
...........##..............##...........
..........#..................#..........
........##....................##........
......##........................##......
.....#............................#.....
...##..............................##...
..#..................................#..
##....................................##

使用 6502 汇编和汇编伪指令，我们可以很容易地在汇编中实现相同的功能：

[](https://nurpax.github.io/posts/<#cb3-1>)!include "c64.asm"
[](https://nurpax.github.io/posts/<#cb3-2>)

[](https://nurpax.github.io/posts/<#cb3-3>)+c64::basic_start(entry)
[](https://nurpax.github.io/posts/<#cb3-4>)

[](https://nurpax.github.io/posts/<#cb3-5>)entry: {
[](https://nurpax.github.io/posts/<#cb3-6>)  lda #0   ; black color
[](https://nurpax.github.io/posts/<#cb3-7>)  sta $d020  ; set border to 0
[](https://nurpax.github.io/posts/<#cb3-8>)  sta $d021  ; set background to 0
[](https://nurpax.github.io/posts/<#cb3-9>)

[](https://nurpax.github.io/posts/<#cb3-10>)  ; clear the screen
[](https://nurpax.github.io/posts/<#cb3-11>)  ldx #0
[](https://nurpax.github.io/posts/<#cb3-12>)  lda #$20
[](https://nurpax.github.io/posts/<#cb3-13>)clrscr:
[](https://nurpax.github.io/posts/<#cb3-14>)!for i in [0, $100, $200, $300] {
[](https://nurpax.github.io/posts/<#cb3-15>)  sta $0400 + i, x
[](https://nurpax.github.io/posts/<#cb3-16>)}
[](https://nurpax.github.io/posts/<#cb3-17>)  inx
[](https://nurpax.github.io/posts/<#cb3-18>)  bne clrscr
[](https://nurpax.github.io/posts/<#cb3-19>)

[](https://nurpax.github.io/posts/<#cb3-20>)  ; line drawing, completely unrolled
[](https://nurpax.github.io/posts/<#cb3-21>)  ; with assembly pseudos
[](https://nurpax.github.io/posts/<#cb3-22>)  lda #$a0
[](https://nurpax.github.io/posts/<#cb3-23>)

[](https://nurpax.github.io/posts/<#cb3-24>)  !for i in range(40) {
[](https://nurpax.github.io/posts/<#cb3-25>)    !let y0 = Math.floor(25/40*(i+0.5))
[](https://nurpax.github.io/posts/<#cb3-26>)    sta $0400 + y0*40 + i
[](https://nurpax.github.io/posts/<#cb3-27>)    sta $0400 + (24-y0)*40 + i
[](https://nurpax.github.io/posts/<#cb3-28>)  }
[](https://nurpax.github.io/posts/<#cb3-29>)inf: jmp inf ; halt
[](https://nurpax.github.io/posts/<#cb3-30>)}

这完全展开了线条绘制部分，从而产生了一个相当大的 286 字节 PRG。

在深入研究优化的变体之前，让我们做一些观察：首先，我们正在 C64 上运行，并且 ROM 例程已调入。 ROM 中有很多子例程可能对我们的小程序有用。例如，您可以使用 JSR $E544 清除屏幕。其次，在像 6502 这样的 8 位 CPU 上进行地址计算可能很麻烦并且会占用很多字节。该 CPU 也没有乘法器，因此计算像 y*40+i 这样的东西通常涉及大量的逻辑移位或查找表，这又会占用字节。为了避免乘以 40，我们可以改为增量地前进屏幕指针：

[](https://nurpax.github.io/posts/<#cb4-1>)  int yslope = (25<<8)/40;
[](https://nurpax.github.io/posts/<#cb4-2>)  int yf = yslope/2;
[](https://nurpax.github.io/posts/<#cb4-3>)  uint8_t* dst = screenRAM;
[](https://nurpax.github.io/posts/<#cb4-4>)  for (int x = 0; x < 40; x++) {
[](https://nurpax.github.io/posts/<#cb4-5>)    dst[x] = 0xa0;
[](https://nurpax.github.io/posts/<#cb4-6>)    dst[(39-x)] = 0xa0;
[](https://nurpax.github.io/posts/<#cb4-7>)    yf += yslope;
[](https://nurpax.github.io/posts/<#cb4-8>)    if (yf & 256) { // Carry set?
[](https://nurpax.github.io/posts/<#cb4-9>)      dst += 40;
[](https://nurpax.github.io/posts/<#cb4-10>)      yf &= 255;
[](https://nurpax.github.io/posts/<#cb4-11>)    }
[](https://nurpax.github.io/posts/<#cb4-12>)  }

我们不断将线条斜率添加到定点计数器 yf，当 8 位加法设置进位标志时，加上 40。

这是在汇编中实现的增量方法：

[](https://nurpax.github.io/posts/<#cb5-1>)!include "c64.asm"
[](https://nurpax.github.io/posts/<#cb5-2>)

[](https://nurpax.github.io/posts/<#cb5-3>)+c64::basic_start(entry)
[](https://nurpax.github.io/posts/<#cb5-4>)

[](https://nurpax.github.io/posts/<#cb5-5>)!let screenptr = $20
[](https://nurpax.github.io/posts/<#cb5-6>)!let x0 = $40
[](https://nurpax.github.io/posts/<#cb5-7>)!let x1 = $41
[](https://nurpax.github.io/posts/<#cb5-8>)!let yf = $60

[](https://nurpax.github.io/posts/<#cb5-9>)
[](https://nurpax.github.io/posts/<#cb5-10>)entry: {
[](https://nurpax.github.io/posts/<#cb5-11>)    lda #0
[](https://nurpax.github.io/posts/<#cb5-12>)    sta x0
[](https://nurpax.github.io/posts/<#cb5-13>)    sta $d020
[](https://nurpax.github.io/posts/<#cb5-14>)    sta $d021
[](https://nurpax.github.io/posts/<#cb5-15>)

[](https://nurpax.github.io/posts/<#cb5-16>)    ; kernal clear screen
[](https://nurpax.github.io/posts/<#cb5-17>)    jsr $e544
[](https://nurpax.github.io/posts/<#cb5-18>)

[](https://nurpax.github.io/posts/<#cb5-19>)    ; set screenptr = $0400
[](https://nurpax.github.io/posts/<#cb5-20>)    lda #<$0400
[](https://nurpax.github.io/posts/<#cb5-21>)    sta screenptr+0
[](https://nurpax.github.io/posts/<#cb5-22>)    lda #>$0400
[](https://nurpax.github.io/posts/<#cb5-23>)    sta screenptr+1
[](https://nurpax.github.io/posts/<#cb5-24>)

[](https://nurpax.github.io/posts/<#cb5-25>)    lda #80
[](https://nurpax.github.io/posts/<#cb5-26>)    sta yf
[](https://nurpax.github.io/posts/<#cb5-27>)

[](https://nurpax.github.io/posts/<#cb5-28>)    lda #39
[](https://nurpax.github.io/posts/<#cb5-29>)    sta x1
[](https://nurpax.github.io/posts/<#cb5-30>)xloop:
[](https://nurpax.github.io/posts/<#cb5-31>)    lda #$a0
[](https://nurpax.github.io/posts/<#cb5-32>)    ldy x0
[](https://nurpax.github.io/posts/<#cb5-33>)    ; screenRAM[x] = 0xA0
[](https://nurpax.github.io/posts/<#cb5-34>)    sta (screenptr), y
[](https://nurpax.github.io/posts/<#cb5-35>)    ldy x1
[](https://nurpax.github.io/posts/<#cb5-36>)    ; screenRAM[39-x] = 0xA0
[](https://nurpax.github.io/posts/<#cb5-37>)    sta (screenptr), y
[](https://nurpax.github.io/posts/<#cb5-38>)

[](https://nurpax.github.io/posts/<#cb5-39>)    clc
[](https://nurpax.github.io/posts/<#cb5-40>)    lda #160 ; line slope
[](https://nurpax.github.io/posts/<#cb5-41>)    adc yf
[](https://nurpax.github.io/posts/<#cb5-42>)    sta yf
[](https://nurpax.github.io/posts/<#cb5-43>)    bcc no_add
[](https://nurpax.github.io/posts/<#cb5-44>)

[](https://nurpax.github.io/posts/<#cb5-45>)    ; advance screen ptr by 40
[](https://nurpax.github.io/posts/<#cb5-46>)    clc
[](https://nurpax.github.io/posts/<#cb5-47>)    lda screenptr
[](https://nurpax.github.io/posts/<#cb5-48>)    adc #40
[](https://nurpax.github.io/posts/<#cb5-49>)    sta screenptr
[](https://nurpax.github.io/posts/<#cb5-50>)    lda screenptr+1
[](https://nurpax.github.io/posts/<#cb5-51>)    adc #0
[](https://nurpax.github.io/posts/<#cb5-52>)    sta screenptr+1
[](https://nurpax.github.io/posts/<#cb5-53>)

[](https://nurpax.github.io/posts/<#cb5-54>)no_add:
[](https://nurpax.github.io/posts/<#cb5-55>)    inc x0
[](https://nurpax.github.io/posts/<#cb5-56>)    dec x1
[](https://nurpax.github.io/posts/<#cb5-57>)    bpl xloop
[](https://nurpax.github.io/posts/<#cb5-58>)

[](https://nurpax.github.io/posts/<#cb5-59>)inf:  jmp inf
[](https://nurpax.github.io/posts/<#cb5-60>)}

82 字节，这仍然相当庞大。一些明显的尺寸问题来自 16 位地址计算：为间接索引寻址设置 screenptr 值：

[](https://nurpax.github.io/posts/<#cb6-1>)    ; set screenptr = $0400
[](https://nurpax.github.io/posts/<#cb6-2>)    lda #<$0400
[](https://nurpax.github.io/posts/<#cb6-3>)    sta screenptr+0
[](https://nurpax.github.io/posts/<#cb6-4>)    lda #>$0400
[](https://nurpax.github.io/posts/<#cb6-5>)    sta screenptr+1

通过加 40 将 screenptr 前进到下一行：

[](https://nurpax.github.io/posts/<#cb7-1>)    ; advance screen ptr by 40
[](https://nurpax.github.io/posts/<#cb7-2>)    clc
[](https://nurpax.github.io/posts/<#cb7-3>)    lda screenptr
[](https://nurpax.github.io/posts/<#cb7-4>)    adc #40
[](https://nurpax.github.io/posts/<#cb7-5>)    sta screenptr
[](https://nurpax.github.io/posts/<#cb7-6>)    lda screenptr+1
[](https://nurpax.github.io/posts/<#cb7-7>)    adc #0
[](https://nurpax.github.io/posts/<#cb7-8>)    sta screenptr+1

当然，这段代码可能会变得更小，但是如果我们一开始就不需要操作 16 位地址怎么办？让我们看看是否可以避免这种情况。

技巧 1：滚动！

我们不是在屏幕 RAM 上绘制线条，而是仅在最后 Y=24 屏幕行上绘制，并通过调用“向上滚动”ROM 函数 JSR $E8EA 来向上滚动整个屏幕！

x 循环变为：

[](https://nurpax.github.io/posts/<#cb8-1>)    lda #0
[](https://nurpax.github.io/posts/<#cb8-2>)    sta x0
[](https://nurpax.github.io/posts/<#cb8-3>)    lda #39
[](https://nurpax.github.io/posts/<#cb8-4>)    sta x1
[](https://nurpax.github.io/posts/<#cb8-5>)xloop:
[](https://nurpax.github.io/posts/<#cb8-6>)    lda #$a0
[](https://nurpax.github.io/posts/<#cb8-7>)    ldx x0
[](https://nurpax.github.io/posts/<#cb8-8>)    ; hardcoded absolute address to last screen line
[](https://nurpax.github.io/posts/<#cb8-9>)    sta $0400 + 24*40, x
[](https://nurpax.github.io/posts/<#cb8-10>)    ldx x1
[](https://nurpax.github.io/posts/<#cb8-11>)    sta $0400 + 24*40, x
[](https://nurpax.github.io/posts/<#cb8-12>)

[](https://nurpax.github.io/posts/<#cb8-13>)    adc yf
[](https://nurpax.github.io/posts/<#cb8-14>)    sta yf
[](https://nurpax.github.io/posts/<#cb8-15>)    bcc no_scroll
[](https://nurpax.github.io/posts/<#cb8-16>)    ; scroll screen up!
[](https://nurpax.github.io/posts/<#cb8-17>)    jsr $e8ea
[](https://nurpax.github.io/posts/<#cb8-18>)no_scroll:
[](https://nurpax.github.io/posts/<#cb8-19>)    inc x0
[](https://nurpax.github.io/posts/<#cb8-20>)    dec x1
[](https://nurpax.github.io/posts/<#cb8-21>)    bpl xloop

以下是使用此技巧的线条渲染器的进度：这个技巧是本次比赛中我最喜欢的技巧之一。几乎所有参与者都独立发现了它。

技巧 2：自修改代码

存储像素值的代码最终大约是：

[](https://nurpax.github.io/posts/<#cb9-1>)    ldx x1
[](https://nurpax.github.io/posts/<#cb9-2>)    ; hardcoded absolute address to last screen line
[](https://nurpax.github.io/posts/<#cb9-3>)    sta $0400 + 24*40, x
[](https://nurpax.github.io/posts/<#cb9-4>)    ldx x0
[](https://nurpax.github.io/posts/<#cb9-5>)    sta $0400 + 24*40, x
[](https://nurpax.github.io/posts/<#cb9-6>)    inc x0
[](https://nurpax.github.io/posts/<#cb9-7>)    dec x1

这将编码为以下 14 字节序列：

0803: A6 22        LDX $22
0805: 9D C0 07      STA $07C0,X
0808: A6 20        LDX $20
080A: 9D C0 07      STA $07C0,X
080D: E6 22        INC $22
080F: C6 20        DEC $20

有一种更紧凑的方式可以使用自修改代码 (SMC) 来编写此代码。

[](https://nurpax.github.io/posts/<#cb11-1>)    ldx x1
[](https://nurpax.github.io/posts/<#cb11-2>)    sta $0400 + 24*40, x
[](https://nurpax.github.io/posts/<#cb11-3>)addr0: sta $0400 + 24*40
[](https://nurpax.github.io/posts/<#cb11-4>)    ; advance the second x-coord with SMC
[](https://nurpax.github.io/posts/<#cb11-5>)    inc addr0+1
[](https://nurpax.github.io/posts/<#cb11-6>)    dec x1

..它编码为 13 字节：

0803: A6 22        LDX $22
0805: 9D C0 07      STA $07C0,X
0808: 8D C0 07      STA $07C0
080B: EE 09 08      INC $0809
080E: C6 22        DEC $22

技巧 3：利用上电状态

在这个比赛中，对运行环境做出大胆的假设被认为是 OK 的：线条绘制 PRG 是 C64 上电后运行的第一件事，并且没有要求干净地退出回到 BASIC 提示符。因此，您可以并且应该利用在进入 PRG 时从初始环境中找到的任何东西。以下是一些在进入 PRG 时被认为是“常量”的东西：

A, X, Y 寄存器被假定为全部为零
所有 CPU 标志已清除
零页（地址 $00-$ff）内容

同样，如果您调用任何 KERNAL ROM 例程，您可以完全利用它们可能产生的任何副作用：返回的 CPU 标志、设置到零页的临时值等。在经过最初的几次大小优化之后，每个人都将目光投向了这个机器监视器视图，以寻找任何有趣的值：实际上，零页包含一些对我们有用的值：

$d5: 39/$27 == 线长 - 1
$22: 64/$40 == 线条斜率计数器的初始值

您可以使用它们在初始化时减少几个字节。例如：

[](https://nurpax.github.io/posts/<#cb13-1>)!let x0 = $20
[](https://nurpax.github.io/posts/<#cb13-2>)    lda #39   ; 0801: A9 27  LDA #$27
[](https://nurpax.github.io/posts/<#cb13-3>)    sta x0    ; 0803: 85 20  STA $20
[](https://nurpax.github.io/posts/<#cb13-4>)xloop:
[](https://nurpax.github.io/posts/<#cb13-5>)    dec x0    ; 0805: C6 20  DEC $20
[](https://nurpax.github.io/posts/<#cb13-6>)    bpl xloop  ; 0807: 10 FC  BPL $0805

由于 $d5 包含值 39，您可以将您的 x0 计数器映射到指向 $d5 并跳过 LDA/STA 对：

[](https://nurpax.github.io/posts/<#cb14-1>)!let x0 = $d5
[](https://nurpax.github.io/posts/<#cb14-2>)    ; nothing here!
[](https://nurpax.github.io/posts/<#cb14-3>)xloop:
[](https://nurpax.github.io/posts/<#cb14-4>)    dec x0    ; 0801: C6 D5  DEC $D5
[](https://nurpax.github.io/posts/<#cb14-5>)    bpl xloop  ; 0803: 10 FC  BPL $0801

Philip 的获奖作品将其发挥到了极致。回想一下最后一个字符行 $07C0 (==$0400+24*40) 的地址。该值在初始化时不存在于零页中。但是，作为 ROM “向上滚动”子例程如何使用零页临时变量的副作用，地址 $D1-$D2 将在此函数返回时包含 $07C0。因此，您可以使用一个字节更短的间接索引寻址模式存储 STA ($D1),y，而不是使用 STA $07C0,x 来存储像素。

技巧 4：更小的启动

典型的 C64 PRG 二进制文件包含以下内容：

前 2 个字节：加载地址（通常为 $0801）
12