Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make descriptions unique #2020

Open
ErikSchierboom opened this issue Apr 21, 2022 · 5 comments
Open

Make descriptions unique #2020

ErikSchierboom opened this issue Apr 21, 2022 · 5 comments

Comments

@ErikSchierboom
Copy link
Member

Recently, there was some discussion on whether test case descriptions should be unique: #2019 (comment)
IMO there is an unwritten rule that each description should be unique, but it is not documented anywhere I think.
To me, it seems like having unique descriptions is useful for several reasons:

  • It makes for easier identification of test cases (descriptions are easier to scan than UUIDs)
  • It is often a prerequisite for test generators that a (full) description is unique

To be clear, when I'm referring to a description being unique, what I mean is that the concatenated description of a test case must be unique. In other words, the uniqueness also takes into account any description of parent test case goupings.

@wolf99 Would something like this be possible to encode in the JSON schema we have?

@SaschaMann
Copy link
Contributor

SaschaMann commented Apr 21, 2022

This clashes with the actually documented purpose of descriptions:

Descriptions should not simply explain what each case is (that is redundant information) but also why each case is there. For example, what kinds of implementation mistakes might this case help us find?

There are plenty of scenarios where more than one case has the same reason for existing.

It is often a prerequisite for test generators that a (full) description is unique

I don't think this should be considered. They can break test generators in many ways, e.g. if the test generator can't handle certain characters, numbers etc. and they're also mutable. Any kind of restriction for descriptions based on the needs of generators comes at the cost of the main audience of the canonical data: humans reading and interpreting it.

If uniqueness is required, the UUID or some kind of hash of the test case can trivially be used already by the generators to achieve it.

@jiegillet
Copy link
Contributor

jiegillet commented Apr 21, 2022

I don't think this should be considered. They can break test generators in many ways, e.g. if the test generator can't handle certain characters, numbers etc. and they're also mutable. Any kind of restriction for descriptions based on the needs of generators comes at the cost of the main audience of the canonical data: humans reading and interpreting it.

I think this should be considered. Most generators (or even humans copy-pasting like I did a bunch of times) will use the description as a test name, it's a natural fit, so why wouldn't we make the job easier by having a few person doing the thinking and interpreting here rather than forcing 50 people to do it down the line?

If the documented purpose of description clashes with what people use them for, I think the documentation should adapt.

@petertseng
Copy link
Member

I can inform you that according to the below, all instances of duplicate descriptions fall under one of the three categories:

  • although the description by itself repeats another, the cases in question are within different groups, and those groups have different descriptions (-s flag)
  • although the description repeats another, all but one of the cases in question are reimplemented (-r flag).
  • state-of-tic-tac-toe
require 'json'

def cases(h)
  # top-level won't have a description, but all lower levels will.
  # therefore we have to unconditionally flat_map cases on the top level.
  # on lower levels we either flat_map cases or just return individuals.
  cases = ->(hh, path = [].freeze) {
    hh['cases']&.flat_map { |c|
      cases[c, (path + [hh['description'].freeze]).freeze]
    } || [hh.merge(path: path)]
  }
  h['cases'].flat_map(&cases)
end

def mark_reimplemented(cases)
  reimpls = cases.map { |c| c['reimplements'] }
  cases.each { |c| c[:reimplemented] = reimpls.include?(c['uuid']) }
end

reimpl = ARGV.delete('-r')
short = ARGV.delete('-s')

maxes = []
freq = Hash.new(0)

Dir.glob("#{__dir__}/exercises/*/canonical-data.json") { |f|
  exercise = File.basename(File.dirname(f))
  descs = mark_reimplemented(cases(JSON.parse(File.read(f)))).filter_map { |c|
    next if c.fetch(:reimplemented) && !reimpl
    arbitrary = ?/
    ((short ? [] : c[:path]) + [c['description']]).join(arbitrary)
  }
  descs.tally.each { |k, v| puts "#{f} #{k} #{v}" if v > 1 }
}

I have absolutely no stake in whether duplicate descriptions should be allowed and my sincere opinion is that I'll be able to deal with it either way, so I won't be weighing in on that; that will have to be left to the interested parties who do have a stake in it.

@wolf99
Copy link
Contributor

wolf99 commented Apr 22, 2022

Hi @ErikSchierboom
TBH I'm still figuring out exactly how far I can push the "implies" features of the JSON schema spec with regard to the uniqueness requirements.
Your recent comment on the availability (or lack thereof) of JSON schema handling in Nim has given me something to investigate a bit before continuing much more with the schema also. While there is still some benefit to having a schema, it is quite a reduced benefit if the tool cannot use it.

@ErikSchierboom
Copy link
Member Author

Okay, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants