Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Unit Tests #4330

Open
StefanoD opened this issue Jan 7, 2023 · 34 comments
Open

Missing Unit Tests #4330

StefanoD opened this issue Jan 7, 2023 · 34 comments

Comments

@StefanoD
Copy link
Contributor

StefanoD commented Jan 7, 2023

Describe the issue

Am I something missing or are unit tests missing?

I was considering to do more pull requests, but without unit tests, I am afraid to break something. Further, how are we supposed to note regressions without unit tests?

Expected behavior

I would have expected a folder with unit tests with a modern frame work like Google Test and there would be a compile option to compile the project without unit tests.

Steps to reproduce

Open the project, go to the test folder and be aware there are no unit tests.

Anything else?

No response

Operating system

All

Stockfish version

9fe9ff0

@StefanoD StefanoD changed the title Unit Tests Missing Unit Tests Jan 7, 2023
@Disservin
Copy link
Member

testing is mostly done via fishtest, https://tests.stockfishchess.org/tests and our github ci and your local compiler (lols)

@Disservin
Copy link
Member

additionally you could compile in debug mode to see if you hit any asserts

@StefanoD
Copy link
Contributor Author

StefanoD commented Jan 7, 2023

This seems to be an integration test. Integration tests and unit tests are not the same. An integration test cannot replace unit tests and unit tests cannot replace integration tests.

And manual debugging is very error prone. One change can affect other parts which you didn't expect and without unit tests, refactoring is getting really hard which may result into not refactoring code. That is, the technical debt increases.

@Disservin
Copy link
Member

Yea, I was just refering to testing in general.

@vdbergh
Copy link
Contributor

vdbergh commented Jan 7, 2023

@StefanoD Can you give a concrete example of a unit test that would be useful in the context of a chess engine?

IMHO the difficulty is that it is not clear what constitutes a bug in a chess engine. For example in normal software, access to the hash table (a shared resource) should be protected by locks. In a chess engine however race conditions are tolerated while accessing the hash table as testing has shown that the overall effect on Elo is positive.

@StefanoD
Copy link
Contributor Author

StefanoD commented Jan 7, 2023

@vdbergh I. e. finding best moves, finding legal moves. In context of hash table: Testing for low collisions, testing for performance. This is all relevant, when you change code.
Or just simple things like parsing FEN strings or even printing information to the terminal when you refactor the code. For example, I don't like the fact, that some temporary buffers are created and then they are written with sprintf and then streamed into a stringstream. First, creating temporary buffers hits unnecessary performance, second, this is error prone, because of memory access violations. This is why I created this pull request. I could manage to test this manually with the godbolt online compiler, without running the whole program (I actually don't know how to use this program, because I'm new to it), but it gets much more complicated with other, similar code where I considered to refactor, too.

In short: The idea of unit tests are to test (almost) every function, such that regressions can be found instantly and without much pain.

@Disservin
Copy link
Member

Disservin commented Jan 7, 2023

@StefanoD Can you give a concrete example of a unit test that would be useful in the context of a chess engine?

IMHO the difficulty is that it is not clear what constitutes a bug in a chess engine. For example in normal software, access to the hash table (a shared resource) should be protected by locks. In a chess engine however race conditions are tolerated while accessing the hash table as testing has shown that the overall effect on Elo is positive.

verifiying move legality on a number of different test positions could be a unit test, or repetition testing ?

@Disservin
Copy link
Member

In context of hash table: Testing for low collisions, testing for performance

these are things I would say are good candidates to test on fishtest.

@MinetaS
Copy link
Contributor

MinetaS commented Jan 7, 2023

For example, I don't like the fact, that some temporary buffers are created and then they are written with sprintf and then streamed into a stringstream. First, creating temporary buffers hits unnecessary performance, second, this is error prone, because of memory access violations.

The mentioned sprintf is totally regardless of performance impact for the first. eval command is for debugging purposes and is not called over hundreds of times while SF do searching. Also declaring & defining local variables inside a function scope always let the variable placed in stack, not heap (indeed doesn't call allocator-related functions).
The patch you've submitted is actually more heavy + slow if not properly optimized by compilers, but let's not mention it here.

And next sprintf never causes memory violation once the buffer exists in memory. I don't know what reasons you're thinking about, but mostly printf related security issues come from %n + FSB (which is forbidden in early 2000s), and %s with user-input, not terminated by a NULL (read overflow, memory leak); both cases are not present in SF code.

@vdbergh
Copy link
Contributor

vdbergh commented Jan 7, 2023

@StefanoD Legal move generation is typically tested with perft. Your other examples in the first paragraph are not convincing. One doesn't care about best moves in individual positions in chess engines (such a thing isn't even properly defined), only overall Elo counts (which is determined by testing on Fishtest). And the number of collisions in SF's hash table is quite high as testing has again shown that the overall Elo effect is positive.

I am not against unit testing but I think that in SF's context it might not be as useful as you think it is.

Something that would be suitable for unit testing is the table base code. But what would be really useful is a documentation of the file format.

@Disservin
Copy link
Member

@vdbergh I think you tagged the wrong person, I think you meant me?
I was actually talking more about Position::legal()...

@StefanoD
Copy link
Contributor Author

StefanoD commented Jan 7, 2023

@vdbergh

And next sprintf never causes memory violation once the buffer exists in memory.

No, if the buffer is too small or a string not null-terminated, you'll get an access violation.

If I really have to convince you that not having unit tests, is bad, then I guess you have to deal with more regressions and less contributors.

@vdbergh
Copy link
Contributor

vdbergh commented Jan 7, 2023

@Disservin No @StefanoD also gave legal move generation as an example. I don't quite know what Position::legal() does. If it is used in legal move generation then it will be checked by perft.

I guess a unit test could consist of running perft on a number positions with known perft values (the starting position might not be sufficient for obscure corner cases).

@Disservin
Copy link
Member

Position::legal() is only run in our main search function call its used to verify that the ttmove is legal in the position, perft doesnt use it

@MinetaS
Copy link
Contributor

MinetaS commented Jan 7, 2023

No, if the buffer is too small or a string not null-terminated, you'll get an access violation.

      char buffer[3][8];
      std::memset(buffer, '\0', sizeof(buffer));

Have you checked the previous lines?

I agree with introducing unit tests btw, but you should provide a detailed list of scopes of which we need to test.
As Disservin mentioned, SF is already being tested on a pretty decent testing framework (fishtest).

@vdbergh
Copy link
Contributor

vdbergh commented Jan 7, 2023

@Disservin Ok. Perhaps Position::legal() only verifies that the move is pseudo legal?

@Disservin
Copy link
Member

@Disservin Ok. Perhaps Position::legal() only verifies that the move is pseudo legal?

nope, there are Position::pseudo_legal() and Position::legal() the two connected together should be true for any legal move and false for any other non legal move. You could test with trying all 2^16 (moves are 16bit large) combinations on a given position. Would at least shrink the realm of errors that could happen, however since stockfish has not crashed playing an illegal move we can assume they work as intended.

@StefanoD
Copy link
Contributor Author

StefanoD commented Jan 7, 2023

No, if the buffer is too small or a string not null-terminated, you'll get an access violation.

      char buffer[3][8];
      std::memset(buffer, '\0', sizeof(buffer));

Have you checked the previous lines?

Yes, I have and the function format_cp_aligned_dot() which was called, relied that the buffer was correctly constructed from the caller. But you can call the function from other parts, too and the logic has to be reimplemented. And what if you change the formatting? Then you might need a different buffer size and adjust the buffer from every caller which is error prone.

As I said: Unit tests are also for testing regressions in case of refactorings.

I agree with introducing unit tests btw, but you should provide a detailed list of scopes of which we need to test. As Disservin mentioned, SF is already being tested on a pretty decent testing framework (fishtest).

As you know, I am absolutely no expert in this project. Thus, I thought you could name more scopes than me. But, if a project doesn't have unit tests then, for me, there is something not optimal. In my opinion, every project must absolutely have unit tests.

Of course, it's good that you have integration tests!

@ddobbelaere
Copy link
Contributor

I tend to agree with @vdbergh and others.

Every project should not have unit tests per se, this is just way too dogmatic. It depends on the case. But anyway, if some more good examples are found for where unit tests add value, you might convince me :)

@dubslow
Copy link
Contributor

dubslow commented Jan 8, 2023

As I said: Unit tests are also for testing regressions in case of refactorings.

Fishtest tests for both nonregression of elo and gain of elo. In fact I'm tempted to say that your pull request, which I'm generally in favor of, should be put thru a nonregression test on fishtest just to be sure that it is sane.

@jayaddison
Copy link

How long do fishnet and perft tests take to run, and can newcomers run them locally after making an experimental code change?

@vondele
Copy link
Member

vondele commented Jan 8, 2023

Stockfish/src (master)$ time ../tests/perft.sh 
perft testing started
perft testing OK

real	0m2.313s
user	0m0.016s
sys	0m0.020s

@Disservin
Copy link
Member

Fishtest is a distributed testing framework testing locally isnt the purpose of it. Perft tests finish in under 1 minute. Anyone can create a fishtest account btw

@StefanoD
Copy link
Contributor Author

StefanoD commented Jan 8, 2023

perft.sh is only a small test. This could be also accomplished with a unit test framework. But a unit test framework invites developers to do other small tests. For example to test one single function after refactoring it. This is as easy as it can get.

@vdbergh
Copy link
Contributor

vdbergh commented Jan 8, 2023

@TheDarkKnightRise @Disservin I know but I thought that it is sufficient to check that the hash move is pseudo legal.

@snicolet
Copy link
Member

Since Stefano has a very clear idea of the kind of unit tests he/she would like, but has not proposed any unit code in the last six months, I'll close the issue as not urgent :-)

@naXa777
Copy link

naXa777 commented Nov 28, 2023

You can add unit tests for known bugs, like #4877. It's a good practice to add a failing unit test for every defect, and then make sure the test passes after the bugfix. The point is that the developers can demonstrate that they isolated what is wrong with the code, proved that it is wrong, fixed it, proved that the fix is correct.
source.

@Disservin
Copy link
Member

You can add unit tests for known bugs, like #4877. It's a good practice to add a failing unit test for every defect, and then make sure the test passes after the bugfix. The point is that the developers can demonstrate that they isolated what is wrong with the code, proved that it is wrong, fixed it, proved that the fix is correct. source.

Well with some bugs like the one you mentioned is a basically impossible to unit test it. It depends a lot on what is currently done in search and what not. In theory these things should be caught in the CI and writing an unit test for undefined behavior is not really possible anyway… code can have undefined behavior but happen to give the right result when you are running the unit test.

@naXa777
Copy link

naXa777 commented Nov 28, 2023

I'm not familiar with C++, so I don't know how it can be achieved in this language. But this case seems pretty standard. @Disservin, do you say that it's impossible to mock dependencies, including the std::clamp (e.g. use the normal implementation, but throw exception if max value < min value), and then call the search function with parameters that simulate the specific scenario in which std::clamp would be called with max value smaller than min value? With this setup the unit test will fail in 100% of runs. Then apply the fix proposed in #4877 and the test will pass. Such tests will act as a safety net against regressions.

upd: ok, chatgpt proposes wild solutions for mocking std functions, so let me agree that it's not that straightforward to unit test the bug that happens because of undefined behavior in C++ standard library.

@Disservin
Copy link
Member

tldr; unit testing search is not as straight forward as one might think

But this case seems pretty standard.

Well "unit testing" undefined behavior is not pretty standard to my knowledge. The closest thing which you can do is run the program with -fsanitize=undefined, but that is not really related to any unit and rather testing the program as a whole then.

do you say that it's impossible to mock dependencies,

i'm not

including the std::clamp (e.g. use the normal implementation, but throw exception if max value < min value)

std::clamp is a function from the standard, so we cant change the underlying implemention (though my libcxx has an assertion in place __glibcxx_assert(!(__hi < __lo)); which one could enable, I guess).

and then call the search function with parameters that simulate the specific scenario in which std::clamp

this is easier said than done, search changes all time. So each new functional patch would be required to find a special state in which certain conditions are met which trigger the problematic behaviour or everything in search needs to be split up into dedicated functions which are then in turn unit testable. Though there are many parts which rely on some (global) state to trigger the exact behaviour.

@StefanoD
Copy link
Contributor Author

Well with some bugs like the one you mentioned is a basically impossible to unit test it. It depends a lot on what is currently done in search and what not. In theory these things should be caught in the CI and writing an unit test for undefined behavior is not really possible anyway… code can have undefined behavior but happen to give the right result when you are running the unit test.

Your answer only explains why there are no unit tests and why the discussion reached a dead end.
Only as an example: In my company we have a product where only the instrument layer has 2700 unit tests! These tests run on CI and we even simulate protocol communication with hardware devices by using Google Mock and dependency injection. We can develop and test before we have hardware available and we find regressions very fast. We can even simulate hardware malfunctioning like a hardware timeout in order to test our workarounds.

Now back to stockfish: This is a 100% software project and you are writing about the impossible task of writing sane unit tests. Sorry, but I'm shocked.
If undefined behavior can be triggered in production, it can also be triggered in unit tests! And not only that, if you for example split big fat function into separate small functions, it gets really easy testable. And to bring it even further: If you have constexpr functions and have a unit test which triggers undefined behavior, the unit test won't compile. Because the C++ standard says, that the code execution of a constexpr function must be free of undefined behavior and by this, you can find undefined behavior at compile time!

I hope, I could give you some inspirations...

@Disservin
Copy link
Member

@StefanoD Unit tests are great... I don't disagree with that. But I find it ridiculous to test a function for undefined behaviour because it's not reliable that this will actually make the unit test fail or even creating an abstraction around std::clamp, should we create another layer of abstraction for all functions from the standard which may lead to UB?

This is a 100% software project and you are writing about the impossible task of writing sane unit tests.

When did I say this? My conclusion was simply tldr; unit testing search is not as straight forward as one might think in the current state of Stockfish, which ofc might change over time.

if you for example split big fat function into separate small functions, it gets really easy testable.

I said that. or everything in search needs to be split up into dedicated functions which are then in turn unit testable.

All I wanted to express that with the current state it's not that simple to add a unit test for the bug encountered in #4877, sure create a function called safe_clamp and test that, great that works, however it wont prevent other devs from using clamp with runtime vars in some other place again. The entire surrounding function should be tested, which (again) is not that easy right now...

@Disservin
Copy link
Member

Also if we move into the direction of #4626 there should probably be unit tests, no doubt.

@miguel-l
Copy link
Contributor

miguel-l commented Jan 7, 2024

Also if we move into the direction of #4626 there should probably be unit tests, no doubt.

Agreed! It would also would increase confidence in changes like #4968

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

12 participants