Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the pseudorandom generation of IdnaTestV2.txt test cases delicate #693

Open
markusicu opened this issue Feb 9, 2024 · 1 comment
Open

Comments

@markusicu
Copy link
Member

Make the pseudorandom generation of IdnaTestV2.txt test cases "delicate", to make comparison of the test data file between versions less onerous.

Was UTC action item 0-A343, originally intended for "After 7.0", assigned to @macchiati. On 20240207 Mark included this action item in a list of “The following tickets will probably always be at the bottom of my priority list, so should be reassigned if they are important for someone else.”

Truly random generation of test cases makes diffs hard to review, so they rarely get reviewed.

Idea: Try to preserve test cases from the previous version if they still apply. (Or just always?)

@macchiati
Copy link
Member

macchiati commented Feb 9, 2024

I'll make a general note, because multiple test files have the same general issue.

The tests, whether comprehensive or random, choose characters that exemplify behavior. The comprehensive ones are where a there are a small number of relevant tests; the random ones are where that would be infeasible.

The simplest example is where a set of strings of random length is built from a random selection of characters from a pool of exemplars, one for each of an enum property's values. To this set, there are often particular cases that are added that want to be specially tested for regressions.

It is a pain to try to reconstruct from the test file what the exemplars were, so my favored approach is to list the exemplars at the top of a file in a comment. Example. Note that this is not a Unicode set; it must be an ordered list.

# @exemplar_code_points: a ; \x{308} (

Those can then be picked up by the tooling for the next version. If they all work, great; the file should be identical.

A. If not, there are two situations that can arise in a new release:

  1. A new property value is added.
  2. An exemplar code point changes property value.
  3. A property value becomes 'empty' (no code points have the value).

It is pretty simple to check that the old exemplar_code_points cover the enum values, and if not, add some needed for the missing values. However if at all possible, the size of the list and positions of the characters in the list should remain the same. If additional characters are needed, they can replace ones that become duplicates (because of #2). If there are no duplicates, the additions should go at the end.

B. If the exemplar set changes size, there is a further problem. The way the files are built, there is one random number generator that is used to pick among the characters and the lengths. That means if you add a character (or change the list), then that causes basically the entire file to be different.

There are probably some good ways to minimize changes, but it needs some thought. One option is to punt: when we add new characters to the exemplars just accept that the file will not be diff'able. I've tried to avoid parsing the old file, but that may be necessary if we want to make it diffable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants