Make the pseudorandom generation of IdnaTestV2.txt test cases delicate #693

markusicu · 2024-02-09T18:12:39Z

Make the pseudorandom generation of IdnaTestV2.txt test cases "delicate", to make comparison of the test data file between versions less onerous.

Was UTC action item 0-A343, originally intended for "After 7.0", assigned to @macchiati. On 20240207 Mark included this action item in a list of “The following tickets will probably always be at the bottom of my priority list, so should be reassigned if they are important for someone else.”

Truly random generation of test cases makes diffs hard to review, so they rarely get reviewed.

Idea: Try to preserve test cases from the previous version if they still apply. (Or just always?)

macchiati · 2024-02-09T19:26:27Z

I'll make a general note, because multiple test files have the same general issue.

The tests, whether comprehensive or random, choose characters that exemplify behavior. The comprehensive ones are where a there are a small number of relevant tests; the random ones are where that would be infeasible.

The simplest example is where a set of strings of random length is built from a random selection of characters from a pool of exemplars, one for each of an enum property's values. To this set, there are often particular cases that are added that want to be specially tested for regressions.

It is a pain to try to reconstruct from the test file what the exemplars were, so my favored approach is to list the exemplars at the top of a file in a comment. Example. Note that this is not a Unicode set; it must be an ordered list.

# @exemplar_code_points: a ; \x{308} (

Those can then be picked up by the tooling for the next version. If they all work, great; the file should be identical.

A. If not, there are two situations that can arise in a new release:

A new property value is added.
An exemplar code point changes property value.
A property value becomes 'empty' (no code points have the value).

It is pretty simple to check that the old exemplar_code_points cover the enum values, and if not, add some needed for the missing values. However if at all possible, the size of the list and positions of the characters in the list should remain the same. If additional characters are needed, they can replace ones that become duplicates (because of #2). If there are no duplicates, the additions should go at the end.

B. If the exemplar set changes size, there is a further problem. The way the files are built, there is one random number generator that is used to pick among the characters and the lengths. That means if you add a character (or change the list), then that causes basically the entire file to be different.

There are probably some good ways to minimize changes, but it needs some thought. One option is to punt: when we add new characters to the exemplars just accept that the file will not be diff'able. I've tried to avoid parsing the old file, but that may be necessary if we want to make it diffable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the pseudorandom generation of IdnaTestV2.txt test cases delicate #693

Make the pseudorandom generation of IdnaTestV2.txt test cases delicate #693

markusicu commented Feb 9, 2024

macchiati commented Feb 9, 2024 •

edited by markusicu

Make the pseudorandom generation of IdnaTestV2.txt test cases delicate #693

Make the pseudorandom generation of IdnaTestV2.txt test cases delicate #693

Comments

markusicu commented Feb 9, 2024

macchiati commented Feb 9, 2024 • edited by markusicu

macchiati commented Feb 9, 2024 •

edited by markusicu