Please give useful test feedback

While it is one thing to understand a test frameworks feedback well enough for using the tests to drive the solution, it is another thing to design tests so their feedback actually helps.

Here one cannot know which assertion actually failed:

Please take care for providing helpful test message where required. Thank you!

1 Like

For D&D Character in the JavaScript track:

@iHiD I havenā€™t seen this problem in JavaScript only. TypeScript has identical tests and Iā€™m sure it is a general thing to keep in mind when adding / updating exercises (especially when using test generators). So I chose the ā€œBuildingā€ category. Would you mind moving it back?

The issue is canonical

See: typescript/exercises/practice/dnd-character/.meta/tests.toml at f72a12b0e22d43bcf94a9fa45e9af000e42a9a0c Ā· exercism/typescript Ā· GitHub

Before splitting out the tests, can you have a look how other tracks solved this (if at all)?

FWIW, this doesnā€™t necessarily fix the problem. If there is an implementation mistake, splitting out the tests will make all the splitted out tests flaky. That is, depending on the random value generated, a different test may start failing.

The flakiness of the tests comes from the randomness and is true for all the character tests (only modifier is deterministic). You donā€™t solve that by putting all assertions into one test.

An improvement to flakiness caused by random values is repeated testing. Run the tests a thousand times and itā€™s more likely to catch an outlier. PHP track runs the test 10 times, which is better than once. But flakiness remains with even the highest number of repetitions.

The thing solving the issue with knowing which assertion fails is: A helpful message per assertion. As jest doesnā€™t support that (directly), splitting the tests is the way I would go. Other ways could be: Adding a package that adds messages (like jest-expect-message), implementing custom matchers replacing the jest provided messages.

PHPUnit has a message per assertion. Other testing frameworks may support that, too. The way failures are taken from the test output to the online editor tab also can help. BASH has a message per abilitiy because the name of the ability is in the expected value and that is shown in the output. The line number of the failing assertion would help, too.

I donā€™t know about other tracks, and how their failing output would appear in the online editor. E.g. Java, Python, C# have no messages per assertion specified, but there may be helpful hints in the test output of the online editor tab.

PS: What triggered the test output in the first post was a typo in one of the abilities. Nothing to do with the randomness bounds.

I agree with your assessment, and I agree that running it multiple times probably helps here.

What I mostly meant is that if we want this change (for obvious reasons youā€™ve already argued for), we may not be the only one who wants it. As such, we probably should make it a canonical thing, where the old test is deprecated/replaced with a set of new tests.

cc @ErikSchierboom

1 Like

As a maintainer, Iā€™d be interested in splitting those tests apart now that someone mentioned it. Perhaps ā€œeach ability is only calculated onceā€ can be split apart as well. The original test just checked strength, but the current test checks if each ability is calculated once. Individual tests for each ability would be helpful for similar reasons.

1 Like

This is an issue that affects multiple tracks. Iā€™ve also seen it in Clojure.

Personally i would prefer if the presentation of the tests is standardized. Each test should return a JSON based on the canonical-data from problem specs. It would include things like name of the tested function, description of the test, input arguments, expected result, etc. Then the browser would simply have to render the JSON.

This does require a lot of work though, maybe even impossible to implement in some tracks, but it will make the presentation more uniform and useful.

1 Like

I think splitting the tests up into individual tests (probably in addition to the existing test) makes a lot of sense!

Merged! Thank you @mk-mxp