While it is one thing to understand a test frameworks feedback well enough for using the tests to drive the solution, it is another thing to design tests so their feedback actually helps.
Here one cannot know which assertion actually failed:
@iHiD I havenāt seen this problem in JavaScript only. TypeScript has identical tests and Iām sure it is a general thing to keep in mind when adding / updating exercises (especially when using test generators). So I chose the āBuildingā category. Would you mind moving it back?
Before splitting out the tests, can you have a look how other tracks solved this (if at all)?
FWIW, this doesnāt necessarily fix the problem. If there is an implementation mistake, splitting out the tests will make all the splitted out tests flaky. That is, depending on the random value generated, a different test may start failing.
The flakiness of the tests comes from the randomness and is true for all the character tests (only modifier is deterministic). You donāt solve that by putting all assertions into one test.
An improvement to flakiness caused by random values is repeated testing. Run the tests a thousand times and itās more likely to catch an outlier. PHP track runs the test 10 times, which is better than once. But flakiness remains with even the highest number of repetitions.
The thing solving the issue with knowing which assertion fails is: A helpful message per assertion. As jest doesnāt support that (directly), splitting the tests is the way I would go. Other ways could be: Adding a package that adds messages (like jest-expect-message), implementing custom matchers replacing the jest provided messages.
PHPUnit has a message per assertion. Other testing frameworks may support that, too. The way failures are taken from the test output to the online editor tab also can help. BASH has a message per abilitiy because the name of the ability is in the expected value and that is shown in the output. The line number of the failing assertion would help, too.
I donāt know about other tracks, and how their failing output would appear in the online editor. E.g. Java, Python, C# have no messages per assertion specified, but there may be helpful hints in the test output of the online editor tab.
PS: What triggered the test output in the first post was a typo in one of the abilities. Nothing to do with the randomness bounds.
I agree with your assessment, and I agree that running it multiple times probably helps here.
What I mostly meant is that if we want this change (for obvious reasons youāve already argued for), we may not be the only one who wants it. As such, we probably should make it a canonical thing, where the old test is deprecated/replaced with a set of new tests.
As a maintainer, Iād be interested in splitting those tests apart now that someone mentioned it. Perhaps āeach ability is only calculated onceā can be split apart as well. The original test just checked strength, but the current test checks if each ability is calculated once. Individual tests for each ability would be helpful for similar reasons.
This is an issue that affects multiple tracks. Iāve also seen it in Clojure.
Personally i would prefer if the presentation of the tests is standardized. Each test should return a JSON based on the canonical-data from problem specs. It would include things like name of the tested function, description of the test, input arguments, expected result, etc. Then the browser would simply have to render the JSON.
This does require a lot of work though, maybe even impossible to implement in some tracks, but it will make the presentation more uniform and useful.