Tips for testing a test runner wanted

Borderite · May 11, 2025, 6:28pm

I have been working to upgrade the guile interpreter used in the Scheme test runner. I have already revised the dockerfile, scripts, etc. to accomodate the difference between the two versions of the interpreter. I now want to make sure that the revised test runner will run flawlessly.

I have given some tests to the revised runner. Each of the test feeds a solution to the test runner and compares the json output from the runner with the anticipated output. The types of scenarios in the test set are:

The solution passes the exercise.
The solution fails in some tests, but it does not cause execution errors.
The solution has syntax errors, which cause an execution error.
The solution file is empty.
The name of the uploaded solution is different from the expected one.

Are there any other scenarios I should consider? Also, should I test the runner in a different way?

I would appreciate your help.

Thanks!

SleeplessByte · May 11, 2025, 7:10pm

This is a really good list of scenarios.

In the JavaScript and TypeScript test runner we have tests for:

The solution passes the exercise.
The solution fails in some tests, but it does not cause execution errors.
The solution has syntax errors, which cause an execution error.
The solution file is empty.
The name of the uploaded solution is different from the expected one.
The config.json file is missing (may be irrelevant for you, we had to support older exercises that did not have this yet)
Have tests for different flags in config.json (e.g. use a different route/enable disable tests). This is probably irrelevant for you

In particular we also test the following properties:

The output is generated.
The output has the expected keys and values.
The output generates the correct format for messages.
The output can generate the correct “task id” per failing/passing test.

The bottom 4 things are all output format relevant. If you are generating version 1 output, you need to support less than 3. You can even decide to generate version 1 when 3 is not possible and 3 in all other cases.

Borderite · May 12, 2025, 12:04am

@SleeplessByte Thank you very much for the information. I will look into each of the extra items you described.

Thanks again!!

SleeplessByte · May 12, 2025, 12:25am

In case you have not found this yet, the format is described on the website: The Test Runner Interface | Exercism's Docs.

Most tracks also have a smoke test to actually “run docker” (and mount the image) to fail in case there is something wrong with that!

Example shell script: javascript/run-in-docker.sh
Example workflow relevant lines: ci.yml#L53-L69

I double checked for you the JavaScript track and saw the tests also include:

a test that skipped tests get “unskipped” (remember, normally only the first test is unskipped for practice exercises),
a test that checks the log command (if supported) actually works (ie. output is piped to the per test output key).
the previously mentioned taskid test
all the other tests mentioned before.

I hope this helps!

mk-mxp · May 12, 2025, 9:14am

In the PHP track, we also had to add a test for student code using “global statements”, where code was syntactically correct but produced fatal errors on runtime (execution did start, but tests did not execute).

Depending on the system capabilities you have, add scenarios where students produce invalid UNICODE output (like printing binary values without stringification) or produce valid but invisible characters (ASCII control characters, especially 0x7F DEL). Invalid UNICODE produces invalid JSON and invisible chars are a nightmare for students.

Also do not underestimate the problems when combining scenarios. I learned a lot about PHP and the testing framework when edge cases came together.

I also recommend adding test scenarios for the limits on Exercisms test-runner interface JSON:

The top level message value is limited to 65535 characters. The effective maximum length is less if the value contains multibyte characters.
The tests MUST be returned in the order they are specified in the tests file.
On per-test output:

The output must be limited to 500 chars. Either truncating with a message of “Output was truncated. Please limit to 500 chars” or returning an error in this situation are acceptable.

Borderite · May 12, 2025, 4:22pm

@SleeplessByte Thanks for the additional information. I have gone over the Test Runner Interface document before. But I will read it again.

The Scheme track also has a run-in-docker script, and it is run in ci.yml. I will also check the other items, too.

Thanks!!

Borderite · May 12, 2025, 4:26pm

@mk-mxp Thanks for your advice. I will put the items you mentioned in my to-do list.

Thanks again!!