Advice for writing a test generator

senekor · September 6, 2023, 6:24pm

I’m in the process of writing a test generator for the Rust track. I’m sufficiently far along that I think it would be great to get some input from people who may have experience with this.

Here’s what I’ve done already:

I’ve got Rust data sructures for stuff like the track config.json and exercise .meta/config.json so I can easily test their structure and manipulate them.
I have a small CLI tool to add an entry in the track config for a new practice exercise, with slug, name and difficulty (the only required properties I think).

And here’s what I’m planning to do:

shell out to configlet sync to generate part of the skeleton.
write a program to generate the Rust-specific program skeleton
use the local cache maintained by configlet of the problem-specifications to read test data and generate tests based on that.
the generated test cases defer to a separate function implemented by the exercise author. this function may do some data massaging, since we’re getting json from the problem spec. We don’t want the students to have to output json, and we don’t want the author to have to manually adjust every test case.
I want it to be easy and fast to sync test suites with updates from problem-specifications. So I intend to add some comment marker in the test file (e.g. “// below here is generated code, do not edit!”)
custom tests, which cannot be upstreamed for some reason, can be added above that line.
the program for generating a new exercise and the one for syncing with problem-specifications share most of the code. the syncing workflow simply skips overwriting gaps already filled in by the exercise author.

The ideal workflow I’m aiming at is one command, one commit, one PR, and the whole repository is perfectly in sync with problem-specifications. Adding a new exercise takes only as much effort as filling in the gaps that cannot be automated, with guidance from tools what those gaps are.

What do experienced people think about this plan? Am I missing something? Any stumbling blocks to look out for?

vaeng · September 6, 2023, 7:36pm

I think you are on a good track.

The only thing, that I would change is the order of testing. If I have understood you correctly, you would first have the “special rust related” tests first and then the “standard” test case.

To my knowldge, the way (other) test runners work, is from top to bottom, stopping when the first test fails. That might lead to exercises, where the edge cases are tested first and the general ones afterwards. For TDD that might make it hard for students to solve exercises one step at a time.

senekor · September 7, 2023, 2:17am

Thanks! That makes sense, I’ll make sure custom tests can be added at the bottom. Shouldn’t be much harder to implement.

Cargo runs tests in parallel, but we still want students to go through the tests from top to bottom, un-ignoring the tests one by one. So the point is important.

BethanyG · September 7, 2023, 3:49am

I think you are off to a good start!

the UUID of the exercise is also required info in the track config.json. config.json spec here

A few thoughts:

Practice exercises must have a tests.toml file. See this from the configlet docs for the details (sample one from Python here). See also the practice exercise spec, for other requirements.

Not every test case for every exercise can be implemented in every track, so it’s important to periodically review incoming test cases to see if they “make sense”, and set the ones that don’t to include = false. And when a change is made to an existing test case in prob. specs, it is important to look for the reimplements = <UUID> key.
Because stuff … happens … I strongly recommend a CI workflow that includes validating existing auto-generated test files against their newly generated versions to ensure things didn’t unexpectedly change/get mutated. Some examples here of what we’ve done on the Python track.
Also recommend testing all example solutions against all test files, in case examples need to change when test cases change.
Because you are already going to have to read the prob-specs JSON, I recommend that additional/custom test data for the track also be written in JSON, so you don’t have to write additional code for processing. An example from Python here
Recommend that you have an option of using a local clone of Problem Specs, in case you are in a situation where you can’t (or don’t want to) connect to GH. Sounds silly, but its saved me on several occasions.

senekor · September 7, 2023, 6:12am

Thank you, this is very helpful! couple follow up questions:

Because stuff … happens … I strongly recommend a CI workflow that includes validating existing auto-generated test files against their newly generated versions to ensure things didn’t unexpectedly change/get mutated.

Just to make sure I understand this correctly. The intent is to prevent humans from editing tests that were auto-generated? I’m guessing that’s what the --check flag here does:

bin/generate_tests.py --verbose -p .problem-specifications --check

Luckily, tests to make sure example solutions pass all tests are already in place, so that should prevent completely bad test cases from slipping through code review.

Recommend that you have an option of using a local clone of Problem Specs, in case you are in a situation where you can’t (or don’t want to) connect to GH.

Sounds like a great idea, how are you handling that? It looks to me like the most recent version of problem-specifications is cloned in CI. Wouldn’t that break when there is an update upstream, or am I misunderstanding how generate_tests.py --check operates?

ErikSchierboom · September 7, 2023, 7:55am

We’re doing some work that will add a configlet command to do this (not yet available though).

Are there many of these?

senekor · September 7, 2023, 8:13am

We’re doing some work that will add a configlet command to do this (not yet available though).

Sounds great! I might keep around what I have already until that’s ready. Then I’ll just delete it. Less code is better code.

Are there many of these?

I have no idea. The intention is just to future proof. I like the approach in the python repo mentioned by @BethanyG. Additional tests are specified in a separate json file just like the problem spec tests.

The thing is, none of the exercises in the Rust repo have a tests.toml. Running configlet sync is just a huge wall of warnings. Difficult to say how far apart the Rust repo has drifted from problem specs.

Here’s a recent example of where I was made aware of custom tests that weren’t upstreamed:

ErikSchierboom · September 7, 2023, 8:44am

If this is possible, I really like this solution!

Having a test generator can be really helpful then. Once you have it, you’ll just regenerate the exercise (with the tests.toml synced too) and you’ll be up to date :)

glennj · September 7, 2023, 5:54pm

Note that configlet stashes a copy under ~/.cache/exercism/configlet/ (actually $XDG_CACHE_HOME/exercism/configlet)

senekor · September 7, 2023, 7:13pm

Yeah, that’s what I’ve been using for now. Bethany’s comment got me thinking about submodules. I’m still not sure but the way I interpret the CI in the python repo, it looks like it would break if the the problem spec repo is updated. A submodule is just what came to mind to solve this problem, one could have a pinned version of the problem spec repo included.

What I dislike about $XDG_CACHE_HOME/exercism/configlet/ is that it’s platform specific. I don’t wanna go out of my way to support proprietary operating systems, but a submodule sounds like a clean and low-effort solution.

BethanyG · September 7, 2023, 8:59pm

Apologies for the delay – and for the long message!

senekor:

Because stuff … happens … I strongly recommend a CI workflow that includes validating existing auto-generated test files against their newly generated versions to ensure things didn’t unexpectedly change/get mutated.

Just to make sure I understand this correctly. The intent is to prevent humans from editing tests that were auto-generated? I’m guessing that’s what the --check flag here does:
bin/generate_tests.py --verbose -p .problem-specifications --check

That’s part of it, but it also is a check against someone not having updated their repo branch recently, or where problem specs has changed or deprecated a test case, but the changes have not yet been synced to the repo.

I guess this depends on how you define “break” . When a mis-match between the existing test files in the branch and the ones generated from problem-specs is detected, an error message about re-generating the mis-matched test file(s) is printed out in the CI run. Any mis-matched test files then cause the CI to fail. When this happens, we go back and use the same script manually in “create” mode to regenerate/update the erroring test files and check them into the branch.

There are two issues with this that I hope to address at some point:

If new test cases are added upstream, the tests.toml file has to be updated with them before the generation. Generation only happens for tests recorded in tests.toml. This means that occasionally, we miss new upstream test cases – although not often.
Simply sucking in any new test case without review doesn’t quite work. Not sure how I handle the review of new cases in a smooth fashion that’s not manual.
This sucks if the PR in question didn’t touch any of the mis-match files, but usually I can see the error quickly and help anyone who gets confused by either doing the update, having them refresh their branch, or by editing their branch directly to refresh/add the files.

Often, I also use this as a reminder to do a general sync to pick up any interim updates/additions to problem specifications metadata or exercise text.

Ideally, the script would do the sync automagically by calling configlet to update tests.toml with a yes flag for any additions, but … I haven’t thought through all the steps of that yet, so its half-automated and half-manual

But I digress. The scenario where you would work off a local copy of problem specifications is where you are adding an exercise, adding track-specific tests, or updating a bunch of test files and want to do that from the local clone instead of re-cloning from problem specs. The script has a flag for specifying the path to the desired prob-specs repo. But you could also jus use the configlet cach, and also instruct configlet to work offline.

BethanyG · September 7, 2023, 9:10pm

FWIW, Python uses the JinJa templating system for this. Interestingly enough, the author has made a Rust port of it, repo here. It may or may not suite your needs. I know for Python we’ve tried to make a generic template that gets extended for various exercises … but that’s had very mixed adoption over time, so I am now working on trying to “normalize” our templates a bit more. Here is an example of an exercise with a template. And here is our directory with base template and macros.

senekor · September 8, 2023, 4:26am

Why doesn’t it work? If the example solution passes the new test cases, it’s probably fine to include, right?

This sucks if the PR in question didn’t touch any of the mis-match files, but usually I can see the error quickly and help

Yeah, I think I want to avoid that. As I’m currently the only maintainer of the Rust track after months of nobody maintaining it, I can’t be sure there will always be someone around to help with surprises like this.

I’ll try the submodule approach to always have a pinned version locally. Thinking into the future, I could add a cron job GitHub Action that updates the submodule in case of upstream changes. Like depandabot, basically.

Python uses the JinJa templating system for this

Oh wow, that looks awesome. I especially like that tests generated with templates like this are probably easier to read for students than ones that do nothing but pass json to a different function. Definitely gonna try it out.

Again, thanks for the great insights!

BethanyG · September 8, 2023, 5:59am

… this is probably true. Hum. Now you have me thinking about how I make things better in the test generator I probably won’t get to it a for a bit (currently doing violence to the Python test runner…), but I certainly have some more ideas on my list. Thank you.

ooh. This. An auto sync job to pull in test case changes and warn if any example solutions fail the newly generated test files…or something like that. Hum… I am going to have to play with that!

senekor · September 9, 2023, 2:52pm

Unfortunately configlet has removed the option to specify a custom path to problem-specifications, which makes it impossible to pin a specific version as a submodule and use that.

senekor · September 10, 2023, 8:25am

49 commits later… anybody wanna volunteer to review? @ErikSchierboom

@BethanyG I hope it’s ok that I stole the Python track’s readme, it’s so beautiful

github.com/exercism/rust

Add test generator

exercism:main ← exercism:auto-sync

opened 10:32PM - 09 Sep 23 UTC

senekor

+2338 -1402

This adds a test generator written in Rust. The main goal is to be able to quick…ly synchronize with problem-specifications. In the process, I rewrote a bunch of Bash scripts in Rust to clean up the tooling. I changed a couple of things in the track config.json, including on deprecated exercises. Mostly for more consistent de-/serialization. Maybe that's a big no-no? This is the first step - to have a generator in the repo. The next steps will be to apply it one-by-one to all exercises present in the problem-specifications repository. That will probably lead to iterative changes to the generator as well. Once that's done, I will add CI tests to make sure all exercises are in sync with the version of problem-specifications pinned here as a submodule. Updating will then be a matter of pulling the submodule and pleasing CI tests. The following is intended as guidance for reviews as well as the commit message of the squash commit if this gets merged. --- Add test generator * cleanup dev tooling * move all scripts into `bin/` * turn several scripts into Rust tests: * `check_exercises_for_authors.sh` * `count_ignores.sh` * `verify_exercise_difficulty.sh` * `lint_tool_file_names.sh` * `lint_trailing_spaces.sh` * remove several scripts: * `check_uuids.sh`: `configlet lint` covers that already * `ensure_lib_src_rs_exists.sh`: this is not an issue if exercise stubs are generated * `clean_topics_vs_practices.py`: applies undocumented standards (max 10 practice exercises per topic) * `fetch_canonical_data`: we now have the submodule for this * cleanup documentation * remove generic instructions about WSL * merge `CONTRIBUTING.md` and `maintaining.md` * move contributing docs in `README.md` to `CONTRIBUTING.md` * outsource instructions for making good PRs * outsource instructions for installing Rust * steal beautiful `REAMDE.md` from Python track * apply conventions and guidelines in a couple places where the new Rust tests found deficiencies * replace Bash-based exercise generation with Rust-based one (deleting the related `util/` Rust crates) * can be used to sync with problem-specifications too * uses the Tera templating engine for customizable test generation inspired by the python track using Jinja for this purpose * apply test generation to acronym exercise to validate MVP

BethanyG · September 10, 2023, 12:37pm

Nice work!!!

…and only 87 files! I would volunteer to review, but I haven’t been added to the repo. Besides, Eric will do a much better job than I could.

More than OK! Happy it’s useful to you

ErikSchierboom · September 13, 2023, 10:03am

I’ve approved!