Missing Unit Tests #4330

StefanoD · 2023-01-07T20:47:18Z

Describe the issue

Am I something missing or are unit tests missing?

I was considering to do more pull requests, but without unit tests, I am afraid to break something. Further, how are we supposed to note regressions without unit tests?

Expected behavior

I would have expected a folder with unit tests with a modern frame work like Google Test and there would be a compile option to compile the project without unit tests.

Steps to reproduce

Open the project, go to the test folder and be aware there are no unit tests.

Anything else?

No response

Operating system

All

Stockfish version

9fe9ff0

The text was updated successfully, but these errors were encountered:

Disservin · 2023-01-07T20:53:55Z

testing is mostly done via fishtest, https://tests.stockfishchess.org/tests and our github ci and your local compiler (lols)

Disservin · 2023-01-07T20:56:54Z

additionally you could compile in debug mode to see if you hit any asserts

StefanoD · 2023-01-07T20:59:34Z

This seems to be an integration test. Integration tests and unit tests are not the same. An integration test cannot replace unit tests and unit tests cannot replace integration tests.

And manual debugging is very error prone. One change can affect other parts which you didn't expect and without unit tests, refactoring is getting really hard which may result into not refactoring code. That is, the technical debt increases.

Disservin · 2023-01-07T21:05:07Z

Yea, I was just refering to testing in general.

vdbergh · 2023-01-07T21:43:37Z

@StefanoD Can you give a concrete example of a unit test that would be useful in the context of a chess engine?

IMHO the difficulty is that it is not clear what constitutes a bug in a chess engine. For example in normal software, access to the hash table (a shared resource) should be protected by locks. In a chess engine however race conditions are tolerated while accessing the hash table as testing has shown that the overall effect on Elo is positive.

StefanoD · 2023-01-07T21:57:58Z

@vdbergh I. e. finding best moves, finding legal moves. In context of hash table: Testing for low collisions, testing for performance. This is all relevant, when you change code.
Or just simple things like parsing FEN strings or even printing information to the terminal when you refactor the code. For example, I don't like the fact, that some temporary buffers are created and then they are written with sprintf and then streamed into a stringstream. First, creating temporary buffers hits unnecessary performance, second, this is error prone, because of memory access violations. This is why I created this pull request. I could manage to test this manually with the godbolt online compiler, without running the whole program (I actually don't know how to use this program, because I'm new to it), but it gets much more complicated with other, similar code where I considered to refactor, too.

In short: The idea of unit tests are to test (almost) every function, such that regressions can be found instantly and without much pain.

Disservin · 2023-01-07T22:07:14Z

@StefanoD Can you give a concrete example of a unit test that would be useful in the context of a chess engine?

IMHO the difficulty is that it is not clear what constitutes a bug in a chess engine. For example in normal software, access to the hash table (a shared resource) should be protected by locks. In a chess engine however race conditions are tolerated while accessing the hash table as testing has shown that the overall effect on Elo is positive.

verifiying move legality on a number of different test positions could be a unit test, or repetition testing ?

Disservin · 2023-01-07T22:13:49Z

In context of hash table: Testing for low collisions, testing for performance

these are things I would say are good candidates to test on fishtest.

MinetaS · 2023-01-07T22:17:22Z

For example, I don't like the fact, that some temporary buffers are created and then they are written with sprintf and then streamed into a stringstream. First, creating temporary buffers hits unnecessary performance, second, this is error prone, because of memory access violations.

The mentioned sprintf is totally regardless of performance impact for the first. eval command is for debugging purposes and is not called over hundreds of times while SF do searching. Also declaring & defining local variables inside a function scope always let the variable placed in stack, not heap (indeed doesn't call allocator-related functions).
The patch you've submitted is actually more heavy + slow if not properly optimized by compilers, but let's not mention it here.

And next sprintf never causes memory violation once the buffer exists in memory. I don't know what reasons you're thinking about, but mostly printf related security issues come from %n + FSB (which is forbidden in early 2000s), and %s with user-input, not terminated by a NULL (read overflow, memory leak); both cases are not present in SF code.

vdbergh · 2023-01-07T22:18:43Z

@StefanoD Legal move generation is typically tested with perft. Your other examples in the first paragraph are not convincing. One doesn't care about best moves in individual positions in chess engines (such a thing isn't even properly defined), only overall Elo counts (which is determined by testing on Fishtest). And the number of collisions in SF's hash table is quite high as testing has again shown that the overall Elo effect is positive.

I am not against unit testing but I think that in SF's context it might not be as useful as you think it is.

Something that would be suitable for unit testing is the table base code. But what would be really useful is a documentation of the file format.

Disservin · 2023-01-07T22:21:42Z

@vdbergh I think you tagged the wrong person, I think you meant me?
I was actually talking more about Position::legal()...

StefanoD · 2023-01-07T22:25:37Z

@vdbergh

And next sprintf never causes memory violation once the buffer exists in memory.

No, if the buffer is too small or a string not null-terminated, you'll get an access violation.

If I really have to convince you that not having unit tests, is bad, then I guess you have to deal with more regressions and less contributors.

vdbergh · 2023-01-07T22:31:26Z

@Disservin No @StefanoD also gave legal move generation as an example. I don't quite know what Position::legal() does. If it is used in legal move generation then it will be checked by perft.

I guess a unit test could consist of running perft on a number positions with known perft values (the starting position might not be sufficient for obscure corner cases).

Disservin · 2023-01-07T22:33:38Z

Position::legal() is only run in our main search function call its used to verify that the ttmove is legal in the position, perft doesnt use it

MinetaS · 2023-01-07T22:37:03Z

No, if the buffer is too small or a string not null-terminated, you'll get an access violation.

      char buffer[3][8];
      std::memset(buffer, '\0', sizeof(buffer));

Have you checked the previous lines?

I agree with introducing unit tests btw, but you should provide a detailed list of scopes of which we need to test.
As Disservin mentioned, SF is already being tested on a pretty decent testing framework (fishtest).

vdbergh · 2023-01-07T22:37:53Z

@Disservin Ok. Perhaps Position::legal() only verifies that the move is pseudo legal?

Disservin · 2023-01-07T22:40:58Z

@Disservin Ok. Perhaps Position::legal() only verifies that the move is pseudo legal?

nope, there are Position::pseudo_legal() and Position::legal() the two connected together should be true for any legal move and false for any other non legal move. You could test with trying all 2^16 (moves are 16bit large) combinations on a given position. Would at least shrink the realm of errors that could happen, however since stockfish has not crashed playing an illegal move we can assume they work as intended.

StefanoD · 2023-01-07T22:47:12Z

No, if the buffer is too small or a string not null-terminated, you'll get an access violation.
      char buffer[3][8];
      std::memset(buffer, '\0', sizeof(buffer));
Have you checked the previous lines?

Yes, I have and the function format_cp_aligned_dot() which was called, relied that the buffer was correctly constructed from the caller. But you can call the function from other parts, too and the logic has to be reimplemented. And what if you change the formatting? Then you might need a different buffer size and adjust the buffer from every caller which is error prone.

As I said: Unit tests are also for testing regressions in case of refactorings.

I agree with introducing unit tests btw, but you should provide a detailed list of scopes of which we need to test. As Disservin mentioned, SF is already being tested on a pretty decent testing framework (fishtest).

As you know, I am absolutely no expert in this project. Thus, I thought you could name more scopes than me. But, if a project doesn't have unit tests then, for me, there is something not optimal. In my opinion, every project must absolutely have unit tests.

Of course, it's good that you have integration tests!

ddobbelaere · 2023-01-08T07:26:56Z

I tend to agree with @vdbergh and others.

Every project should not have unit tests per se, this is just way too dogmatic. It depends on the case. But anyway, if some more good examples are found for where unit tests add value, you might convince me :)

dubslow · 2023-01-08T08:41:00Z

As I said: Unit tests are also for testing regressions in case of refactorings.

Fishtest tests for both nonregression of elo and gain of elo. In fact I'm tempted to say that your pull request, which I'm generally in favor of, should be put thru a nonregression test on fishtest just to be sure that it is sane.

jayaddison · 2023-01-08T18:39:06Z

How long do fishnet and perft tests take to run, and can newcomers run them locally after making an experimental code change?

vondele · 2023-01-08T18:43:15Z

Stockfish/src (master)$ time ../tests/perft.sh 
perft testing started
perft testing OK

real	0m2.313s
user	0m0.016s
sys	0m0.020s

Disservin · 2023-01-08T18:44:44Z

Fishtest is a distributed testing framework testing locally isnt the purpose of it. Perft tests finish in under 1 minute. Anyone can create a fishtest account btw

StefanoD · 2023-01-08T18:46:18Z

perft.sh is only a small test. This could be also accomplished with a unit test framework. But a unit test framework invites developers to do other small tests. For example to test one single function after refactoring it. This is as easy as it can get.

vdbergh · 2023-01-08T19:17:23Z

@TheDarkKnightRise @Disservin I know but I thought that it is sufficient to check that the hash move is pseudo legal.

snicolet · 2023-08-22T12:56:33Z

Since Stefano has a very clear idea of the kind of unit tests he/she would like, but has not proposed any unit code in the last six months, I'll close the issue as not urgent :-)

naXa777 · 2023-11-28T14:47:06Z

You can add unit tests for known bugs, like #4877. It's a good practice to add a failing unit test for every defect, and then make sure the test passes after the bugfix. The point is that the developers can demonstrate that they isolated what is wrong with the code, proved that it is wrong, fixed it, proved that the fix is correct.
source.

Disservin · 2023-11-28T15:17:33Z

You can add unit tests for known bugs, like #4877. It's a good practice to add a failing unit test for every defect, and then make sure the test passes after the bugfix. The point is that the developers can demonstrate that they isolated what is wrong with the code, proved that it is wrong, fixed it, proved that the fix is correct. source.

Well with some bugs like the one you mentioned is a basically impossible to unit test it. It depends a lot on what is currently done in search and what not. In theory these things should be caught in the CI and writing an unit test for undefined behavior is not really possible anyway… code can have undefined behavior but happen to give the right result when you are running the unit test.

naXa777 · 2023-11-28T18:28:29Z

I'm not familiar with C++, so I don't know how it can be achieved in this language. But this case seems pretty standard. @Disservin, do you say that it's impossible to mock dependencies, including the std::clamp (e.g. use the normal implementation, but throw exception if max value < min value), and then call the search function with parameters that simulate the specific scenario in which std::clamp would be called with max value smaller than min value? With this setup the unit test will fail in 100% of runs. Then apply the fix proposed in #4877 and the test will pass. Such tests will act as a safety net against regressions.

upd: ok, chatgpt proposes wild solutions for mocking std functions, so let me agree that it's not that straightforward to unit test the bug that happens because of undefined behavior in C++ standard library.

Disservin · 2023-11-28T18:44:01Z

tldr; unit testing search is not as straight forward as one might think

But this case seems pretty standard.

Well "unit testing" undefined behavior is not pretty standard to my knowledge. The closest thing which you can do is run the program with -fsanitize=undefined, but that is not really related to any unit and rather testing the program as a whole then.

do you say that it's impossible to mock dependencies,

i'm not

including the std::clamp (e.g. use the normal implementation, but throw exception if max value < min value)

std::clamp is a function from the standard, so we cant change the underlying implemention (though my libcxx has an assertion in place __glibcxx_assert(!(__hi < __lo)); which one could enable, I guess).

and then call the search function with parameters that simulate the specific scenario in which std::clamp

this is easier said than done, search changes all time. So each new functional patch would be required to find a special state in which certain conditions are met which trigger the problematic behaviour or everything in search needs to be split up into dedicated functions which are then in turn unit testable. Though there are many parts which rely on some (global) state to trigger the exact behaviour.

StefanoD · 2023-11-28T18:50:08Z

Well with some bugs like the one you mentioned is a basically impossible to unit test it. It depends a lot on what is currently done in search and what not. In theory these things should be caught in the CI and writing an unit test for undefined behavior is not really possible anyway… code can have undefined behavior but happen to give the right result when you are running the unit test.

Your answer only explains why there are no unit tests and why the discussion reached a dead end.
Only as an example: In my company we have a product where only the instrument layer has 2700 unit tests! These tests run on CI and we even simulate protocol communication with hardware devices by using Google Mock and dependency injection. We can develop and test before we have hardware available and we find regressions very fast. We can even simulate hardware malfunctioning like a hardware timeout in order to test our workarounds.

Now back to stockfish: This is a 100% software project and you are writing about the impossible task of writing sane unit tests. Sorry, but I'm shocked.
If undefined behavior can be triggered in production, it can also be triggered in unit tests! And not only that, if you for example split big fat function into separate small functions, it gets really easy testable. And to bring it even further: If you have constexpr functions and have a unit test which triggers undefined behavior, the unit test won't compile. Because the C++ standard says, that the code execution of a constexpr function must be free of undefined behavior and by this, you can find undefined behavior at compile time!

I hope, I could give you some inspirations...

Disservin · 2023-11-28T19:08:59Z

@StefanoD Unit tests are great... I don't disagree with that. But I find it ridiculous to test a function for undefined behaviour because it's not reliable that this will actually make the unit test fail or even creating an abstraction around std::clamp, should we create another layer of abstraction for all functions from the standard which may lead to UB?

This is a 100% software project and you are writing about the impossible task of writing sane unit tests.

When did I say this? My conclusion was simply tldr; unit testing search is not as straight forward as one might think in the current state of Stockfish, which ofc might change over time.

if you for example split big fat function into separate small functions, it gets really easy testable.

I said that. or everything in search needs to be split up into dedicated functions which are then in turn unit testable.

All I wanted to express that with the current state it's not that simple to add a unit test for the bug encountered in #4877, sure create a function called safe_clamp and test that, great that works, however it wont prevent other devs from using clamp with runtime vars in some other place again. The entire surrounding function should be tested, which (again) is not that easy right now...

Disservin · 2023-11-28T19:13:13Z

Also if we move into the direction of #4626 there should probably be unit tests, no doubt.

miguel-l · 2024-01-07T09:08:56Z

Also if we move into the direction of #4626 there should probably be unit tests, no doubt.

Agreed! It would also would increase confidence in changes like #4968

StefanoD changed the title ~~Unit Tests~~ Missing Unit Tests Jan 7, 2023

Disservin added the discussion label Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Unit Tests #4330

Missing Unit Tests #4330

StefanoD commented Jan 7, 2023

Disservin commented Jan 7, 2023

Disservin commented Jan 7, 2023

StefanoD commented Jan 7, 2023

Disservin commented Jan 7, 2023

vdbergh commented Jan 7, 2023

StefanoD commented Jan 7, 2023 •

edited

Disservin commented Jan 7, 2023 •

edited

Disservin commented Jan 7, 2023

MinetaS commented Jan 7, 2023 •

edited

vdbergh commented Jan 7, 2023

Disservin commented Jan 7, 2023

StefanoD commented Jan 7, 2023

vdbergh commented Jan 7, 2023 •

edited

Disservin commented Jan 7, 2023

MinetaS commented Jan 7, 2023

vdbergh commented Jan 7, 2023 •

edited

Disservin commented Jan 7, 2023

StefanoD commented Jan 7, 2023

ddobbelaere commented Jan 8, 2023

dubslow commented Jan 8, 2023

jayaddison commented Jan 8, 2023

vondele commented Jan 8, 2023

Disservin commented Jan 8, 2023

StefanoD commented Jan 8, 2023

vdbergh commented Jan 8, 2023 •

edited

snicolet commented Aug 22, 2023

naXa777 commented Nov 28, 2023

Disservin commented Nov 28, 2023

naXa777 commented Nov 28, 2023 •

edited

Disservin commented Nov 28, 2023

StefanoD commented Nov 28, 2023

Disservin commented Nov 28, 2023

Disservin commented Nov 28, 2023

miguel-l commented Jan 7, 2024

Missing Unit Tests #4330

Missing Unit Tests #4330

Comments

StefanoD commented Jan 7, 2023

Describe the issue

Expected behavior

Steps to reproduce

Anything else?

Operating system

Stockfish version

Disservin commented Jan 7, 2023

Disservin commented Jan 7, 2023

StefanoD commented Jan 7, 2023

Disservin commented Jan 7, 2023

vdbergh commented Jan 7, 2023

StefanoD commented Jan 7, 2023 • edited

Disservin commented Jan 7, 2023 • edited

Disservin commented Jan 7, 2023

MinetaS commented Jan 7, 2023 • edited

vdbergh commented Jan 7, 2023

Disservin commented Jan 7, 2023

StefanoD commented Jan 7, 2023

vdbergh commented Jan 7, 2023 • edited

Disservin commented Jan 7, 2023

MinetaS commented Jan 7, 2023

vdbergh commented Jan 7, 2023 • edited

Disservin commented Jan 7, 2023

StefanoD commented Jan 7, 2023

ddobbelaere commented Jan 8, 2023

dubslow commented Jan 8, 2023

jayaddison commented Jan 8, 2023

vondele commented Jan 8, 2023

Disservin commented Jan 8, 2023

StefanoD commented Jan 8, 2023

vdbergh commented Jan 8, 2023 • edited

snicolet commented Aug 22, 2023

naXa777 commented Nov 28, 2023

Disservin commented Nov 28, 2023

naXa777 commented Nov 28, 2023 • edited

Disservin commented Nov 28, 2023

StefanoD commented Nov 28, 2023

Disservin commented Nov 28, 2023

Disservin commented Nov 28, 2023

miguel-l commented Jan 7, 2024

StefanoD commented Jan 7, 2023 •

edited

Disservin commented Jan 7, 2023 •

edited

MinetaS commented Jan 7, 2023 •

edited

vdbergh commented Jan 7, 2023 •

edited

vdbergh commented Jan 7, 2023 •

edited

vdbergh commented Jan 8, 2023 •

edited

naXa777 commented Nov 28, 2023 •

edited