Elm 测试中的分布考量

Elm Test Distributions

Source | HN Comments

文章探讨了在 Elm 中进行基于属性的测试时，如何确保测试用例覆盖各种情况。核心在于使用 `Test.Distribution`，它允许开发者测量和强制测试生成特定类别值的频率。文章介绍了 `Test.reportDistribution` 用于生成分布报告，以及 `Test.expectDistribution` 用于验证分布是否符合预期，从而检测测试覆盖的不足。文章还提到了 `Fuzz.examples` 和 `Fuzz.labelExamples` 用于直观检查生成的值。

Martin Janiczek

Elm 测试中的分布考量

2025-05-01

... 我将告诉你如何确保你的基于属性的测试确实测试到了有意思的用例。

最近，我在 Elm Slack 上与 Jeroen Engels 讨论了一篇关于 TigerBeetle 集群测试的文章，其中一段话是：

例如，我们上面测试的一个弱点是我们选择以相同的概率进行 pop 和 push 操作。结果，我们的队列平均来说非常短。我们从未练习过大型队列！

他问：

如何检测基于属性的测试在实践中涵盖或未涵盖哪些情况？例如，什么时候你会说“我们拥有的分布不涵盖这个用例”？

实际上该怎么做呢！你可以使用 Fuzz.examples 来直观地检查生成的值对你是否有意义：

-- inside Elm REPL
> import Fuzz
> Fuzz.examples 10 (Fuzz.intRange 0 10)
[4,6,3,6,9,9,9,3,3,6]
  : List Int

但你是只是运气不好，没有看到 0 和 10，还是它们根本不会被生成？

为了进一步激发你的兴趣，让我们尝试从 TigerBeetle 博客文章中了解这个问题。假设我们有一个 Queue 实现（细节无关紧要）：

type Queue a
empty : Queue a
push  : a -> Queue a -> Queue a
pop  : Queue a -> (Maybe a, Queue a)
length : Queue a -> Int

现在让我们尝试测试它！

type QueueOp
  = Push Int
  | Pop
queueOpFuzzer : Fuzzer QueueOp
queueOpFuzzer =
  Fuzz.oneOf
    [ Fuzz.map Push Fuzz.int
    , Fuzz.constant Pop
    ]
applyOp : QueueOp -> Queue Int -> Queue Int
applyOp op queue =
  case op of
    Push n ->
      Queue.push n queue
    Pop -> 
      Queue.pop queue
        |> Tuple.second
queueFuzzer : Fuzzer (Queue Int)
queueFuzzer =
  Fuzz.list queueOpFuzzer
    -- would generate [ Push 10, Pop, Pop, Push 5 ] etc.
    |> Fuzz.map (\ops -> List.foldl applyOp Queue.empty ops)
    -- instead generates a queue with the ops applied

queueFuzzer 通过操作进行某种随机游走，以获得一个随机的 Queue。

现在，如果我们担心我们没有测试到非常有趣的用例，我们可以调试打印它们的长度，仔细查看日志，并凭直觉判断是否可以，但难道这感觉有点不太靠谱吗？

实际上，你可以得到这张漂亮的表格：

Distribution report:
====================
 length 2-5:   37% (370x) ███████████░░░░░░░░░░░░░░░░░░░
 length 0:   29.6% (296x) █████████░░░░░░░░░░░░░░░░░░░░░
 length 1:   22.8% (228x) ███████░░░░░░░░░░░░░░░░░░░░░░░
 length 6-10:  9.7%  (97x) ███░░░░░░░░░░░░░░░░░░░░░░░░░░░
 length 11+:  0.9%  (9x) ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

当你在测试中使用 Test.reportDistribution 时：

Test.reportDistribution
  [ ( "length 0",  \q -> length q == 0 )
  , ( "length 1",  \q -> length q == 1 )
  , ( "length 2-5", \q -> length q >= 2 && length q <= 5 )
  , ( "length 6-10", \q -> length q >= 6 && length q <= 10 )
  , ( "length 11+", \q -> length q >= 11 )
  ]

更重要的是，你还可以让测试在某些东西没有被充分测试时失败：

✗ Queue example 2
  Distribution of label "length 11+" was insufficient:
   expected: 10.000%
   got:    1.400%.

  (Generated 1000 values.)

使用 Test.expectDistribution：

Test.expectDistribution
  [ ( Test.Distribution.atLeast 10, "length 0",  \q -> length q == 0 )
  , ( Test.Distribution.atLeast 10, "length 1",  \q -> length q == 1 )
  , ( Test.Distribution.atLeast 10, "length 2-5", \q -> length q >= 2 && length q <= 5 )
  , ( Test.Distribution.atLeast 10, "length 6-10", \q -> length q >= 6 && length q <= 10 )
  , ( Test.Distribution.atLeast 10, "length 11+", \q -> length q >= 11 )
  ]

揭示了所有秘密之后，现在让我正式向你介绍 Test.Distribution。它是 Elm 测试库 API 中相对较新的补充（在 v2.0.0 中添加，已经 3 年了，哇），它允许你测量或_强制执行_每个有趣的情况需要发生的频率。

这来自 Haskell QuickCheck (当然)，其中使用 label 和 checkCoverage 等函数来完成，并且有一个精彩的演讲 "Building on developers’ intuitions to create effective property-based tests" by John Hughes (当然) 进一步解释了这个想法。

在我介绍实际的 Test.Distribution 内容之前，让我再说一下，除了之前提到的 Fuzz.examples 之外，还有 Fuzz.labelExamples，你可以在 REPL 中使用它来查看每个标记用例的示例（如果它发生）：

Fuzz.labelExamples 100
  [ ( "Lower boundary (1)",  \n -> n == 1 )
  , ( "Upper boundary (20)",  \n -> n == 20 )
  , ( "In the middle (2..19)", \n -> n > 1 && n < 20 )
  , ( "Outside boundaries??", \n -> n < 1 || n > 20 )
  ]
  (Fuzz.intRange 1 20)
-->
[ ( [ "Lower boundary (1)" ],  Just 1 )
, ( [ "Upper boundary (20)" ],  Just 20 )
, ( [ "In the middle (2..19)" ], Just 3 )
, ( [ "Outside boundaries??" ], Nothing )
]

如你所见，每个用例都包含一个标签和一个谓词。这些可以重叠：

Fuzz.labelExamples 100
  [ ( "fizz", \n -> (n |> modBy 3) == 0 )
  , ( "buzz", \n -> (n |> modBy 5) == 0 )
  ]
  (Fuzz.intRange 1 20)
-->
[ ( [ "fizz" ],    Just 3 )
, ( [ "buzz" ],    Just 10 )
, ( [ "fizz, buzz" ], Just 15 )
]

你可以在测试套件中使用这些分类器：Test.fuzzWith 有一个 distribution 字段，你可以在以下之间进行选择：

noDistribution: 默认值
reportDistribution: 显示哪个标签发生的频率的直方图
expectDistribution: 如果标记的用例没有按指定发生，则测试失败：
- atLeast: N% 的时间或更多
- zero: 从不
- moreThanZero: 至少一次

让我们看更多例子。 Test.reportDistribution 以下列方式使用：

Test.fuzzWith
  { runs = 10000
  , distribution =
    Test.reportDistribution
      [ ( "fizz", \n -> (n |> modBy 3) == 0 )
      , ( "buzz", \n -> (n |> modBy 5) == 0 )
      , ( "even", \n -> (n |> modBy 2) == 0 )
      , ( "odd", \n -> (n |> modBy 2) == 1 )
      ]
  }
  (Fuzz.intRange 1 20)
  "Fizz buzz even odd"
  (\n -> Expect.pass)

将显示以下直方图：

Distribution report:
====================
 even:       50.2% (5017x) ███████████████░░░░░░░░░░░░░░░
 odd:       49.8% (4983x) ███████████████░░░░░░░░░░░░░░░
 fizz:       30.1% (3011x) █████████░░░░░░░░░░░░░░░░░░░░░
 buzz:       19.2% (1924x) ██████░░░░░░░░░░░░░░░░░░░░░░░░
Combinations (included in the above base counts):
 fizz, even:    15.2% (1524x) █████░░░░░░░░░░░░░░░░░░░░░░░░░
 fizz, odd:    10.1% (1013x) ███░░░░░░░░░░░░░░░░░░░░░░░░░░░
 buzz, even:    9.5%  (949x) ███░░░░░░░░░░░░░░░░░░░░░░░░░░░
 buzz, odd:      5%  (501x) ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░
 fizz, buzz, odd:  4.7%  (474x) █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

正如你所期望的那样，在 1..20 范围内的 20 个数字中，

有 10 个偶数和 10 个奇数
- 标签 even 和 odd 应该以 10/20 的概率发生（50% 的时间），尽管由于随机性，实际计数会有轻微变化
有 6 个 3 的倍数
- 标签 fizz 应该以 6/20 的概率发生（30% 的时间）
有 4 个 5 的倍数
- 标签 buzz 应该以 4/20 的概率发生（20% 的时间）

请注意，这些组合在某种意义上是不相交的：fizz, buzz, odd 的命中不计入 fizz, odd 中，这就是为什么 fizz, odd 仅显示大约 10% 的概率，而不是预期的 15%：fizz, buzz, odd 已经从它那里窃取了缺失的 5%，作为标签的更具体的组合。

当你强制执行分布而不是仅仅报告它们时，分布更有用。使用 Test.expectDistribution：

Test.fuzzWith
  { runs = 100
  , distribution =
    Test.expectDistribution
      [ ( Test.Distribution.atLeast 4,  "low",    \n -> n == 1 )
      , ( Test.Distribution.atLeast 4,  "high",    \n -> n == 20 )
      , ( Test.Distribution.atLeast 80,  "in between", \n -> n > 1 && n < 20 )
      , ( Test.Distribution.zero,     "outside",  \n -> n < 1 || n > 20 )
      , ( Test.Distribution.moreThanZero, "one",    \n -> n == 1 )
      ]
  }
  (Fuzz.intRange 1 20)
  "Int range boundaries - mandatory"
  (\n -> Expect.pass)

在上面的测试中，我们期望数字 1..20 的均匀 fuzzer 至少在 4% 的时间内生成数字 1。如果实际概率为 2%，则测试会因分布问题而失败，即使实际测试函数始终通过。

实际上，数字 1 将在 5% 的时间内发生（1/20；Fuzz.intRange 是均匀的），但强制执行将发生的实际概率并不是最好的主意，因为该库会尝试运行 fuzzer，直到它在统计上确定（109 次运行中有 1 次假阳性）达到分布。这意味着，它可能会生成数千甚至数百万个值来确保，而不是默认的 100 个模糊值。因此，稍微偏离实际概率有助于保持测试套件的快速运行。

Test.expectDistribution 不会显示该表，并且通常会保持沉默，但如果未达到所需的分布，它会大声抱怨并使测试失败（即使实际测试函数通过），就像在以下示例中一样，我将生成数字 1 的预期概率提高到 10%：

✗ Int range boundaries - mandatory
  Distribution of label "low" was insufficient:
   expected: 10.000%
   got:    5.405%.
  (Generated 2146 values.)

你可以看到它生成了 2146 个值以确保结果，而不是指定的 100 个。

差不多就是这些了！这篇文章主要想表明这可以在 Elm PBT 测试世界中完成；如果你想深入了解，我衷心推荐 John Hughes 提到的 YouTube 演讲。

TL;DR：使用 Test.Distribution，你可以测量和强制执行你的测试生成你选择的类别值的频率。

<上一篇文章：将我的糖尿病治疗掌握在自己手中 Archive Bluesky Github