Please give useful test feedback

mk-mxp · May 25, 2024, 8:07am

While it is one thing to understand a test frameworks feedback well enough for using the tests to drive the solution, it is another thing to design tests so their feedback actually helps.

Here one cannot know which assertion actually failed:

Please take care for providing helpful test message where required. Thank you!

mk-mxp · May 25, 2024, 12:20pm

For D&D Character in the JavaScript track:

mk-mxp · May 26, 2024, 8:54am

@iHiD I haven’t seen this problem in JavaScript only. TypeScript has identical tests and I’m sure it is a general thing to keep in mind when adding / updating exercises (especially when using test generators). So I chose the “Building” category. Would you mind moving it back?

mk-mxp · May 26, 2024, 12:16pm

SleeplessByte · May 26, 2024, 4:41pm

The issue is canonical

See: typescript/exercises/practice/dnd-character/.meta/tests.toml at f72a12b0e22d43bcf94a9fa45e9af000e42a9a0c · exercism/typescript · GitHub

Before splitting out the tests, can you have a look how other tracks solved this (if at all)?

FWIW, this doesn’t necessarily fix the problem. If there is an implementation mistake, splitting out the tests will make all the splitted out tests flaky. That is, depending on the random value generated, a different test may start failing.

mk-mxp · May 27, 2024, 6:43am

The flakiness of the tests comes from the randomness and is true for all the character tests (only modifier is deterministic). You don’t solve that by putting all assertions into one test.

An improvement to flakiness caused by random values is repeated testing. Run the tests a thousand times and it’s more likely to catch an outlier. PHP track runs the test 10 times, which is better than once. But flakiness remains with even the highest number of repetitions.

The thing solving the issue with knowing which assertion fails is: A helpful message per assertion. As jest doesn’t support that (directly), splitting the tests is the way I would go. Other ways could be: Adding a package that adds messages (like jest-expect-message), implementing custom matchers replacing the jest provided messages.

PHPUnit has a message per assertion. Other testing frameworks may support that, too. The way failures are taken from the test output to the online editor tab also can help. BASH has a message per abilitiy because the name of the ability is in the expected value and that is shown in the output. The line number of the failing assertion would help, too.

I don’t know about other tracks, and how their failing output would appear in the online editor. E.g. Java, Python, C# have no messages per assertion specified, but there may be helpful hints in the test output of the online editor tab.

PS: What triggered the test output in the first post was a typo in one of the abilities. Nothing to do with the randomness bounds.

SleeplessByte · June 11, 2024, 2:59pm

I agree with your assessment, and I agree that running it multiple times probably helps here.

What I mostly meant is that if we want this change (for obvious reasons you’ve already argued for), we may not be the only one who wants it. As such, we probably should make it a canonical thing, where the old test is deprecated/replaced with a set of new tests.

cc @ErikSchierboom

BNAndras · June 11, 2024, 8:41pm

As a maintainer, I’d be interested in splitting those tests apart now that someone mentioned it. Perhaps “each ability is only calculated once” can be split apart as well. The original test just checked strength, but the current test checks if each ability is calculated once. Individual tests for each ability would be helpful for similar reasons.

tasx · June 12, 2024, 12:56pm

This is an issue that affects multiple tracks. I’ve also seen it in Clojure.

Personally i would prefer if the presentation of the tests is standardized. Each test should return a JSON based on the canonical-data from problem specs. It would include things like name of the tested function, description of the test, input arguments, expected result, etc. Then the browser would simply have to render the JSON.

This does require a lot of work though, maybe even impossible to implement in some tracks, but it will make the presentation more uniform and useful.

ErikSchierboom · July 9, 2024, 1:32pm

I think splitting the tests up into individual tests (probably in addition to the existing test) makes a lot of sense!

SleeplessByte · July 30, 2024, 10:23am

Merged! Thank you @mk-mxp