New Exercise Proposal: Mazy Mice

rabestro · May 8, 2025, 11:24am

Thanks for the heads-up, no rush at all!

While Mazy Mice is on your mind for a future review, I wanted to raise a point that’s come up: how best to accommodate different implementation approaches across tracks. Specifically, for some languages, it might be more idiomatic for students to return a data structure (e.g., List[List[CellType]] in Scala) rather than a fully formatted string maze. This would allow them to focus on the generation logic itself.

For instance, a Scala 3 approach might look something like this:

// Enum representing elements in the maze's character grid
// (Illustrative: a full version would include all specific wall/corner/T-junction types)
enum MazeElement:
  case PATH_SPACE      // e.g., ' '
  case WALL_HORIZONTAL // e.g., '─'
  case WALL_VERTICAL   // e.g., '│'
  case WALL_CORNER_TL  // Top-Left corner, e.g., '┌'
  // ... other specific wall types like WALL_CORNER_TR, WALL_T_LEFT, etc.
  case ENTRANCE        // Represents the entrance marker
  case EXIT            // Represents the exit marker

// Function signature for generating the maze structure
// 'rows' and 'cols' are for the room grid; output grid is (2*rows+1) x (2*cols+1)
def generateMaze(rows: Int, cols: Int, seed: Option[Int] = None): List[List[MazeElement]] = {
  // ... logic to return the structured maze ...
  ???
}

(A complete MazeElement enum would detail all components corresponding to the box-drawing characters and path/entry/exit points, allowing a track to map these to characters if needed or test the structure directly.)

My main questions, when you have a moment to consider, are:

Should the canonical description.md somehow acknowledge that some tracks might expect a data structure return, even though the description itself will visually showcase the string representation?
Would it be useful to include hints or guidelines for track maintainers on adapting the exercise for a data structure approach (e.g., suggestions for Enum design, how property-based tests apply)? Our current canonical-data.json is property-based and seems compatible.

This also relates to the ongoing thought about ensuring the visual examples in description.md are as robust as possible (e.g., using > for the entrance/exit arrows). Even if some tracks don’t output strings, the description.md is the primary visual reference for the maze’s structure and components.

IsaacG · May 8, 2025, 12:51pm

It’s worth noting that, if I recall correctly, the use of UTF-8 didn’t work very well in the awk track until the locale settings in the Docket container got updated. There is potential for UTF-8 not working “out of the box” on other tracks.

rabestro · May 8, 2025, 1:16pm

On Java track it works out of the box

IsaacG · May 8, 2025, 1:35pm

Most modern languages have first class UTF support Some tracks might have trouble with UTF, though.

BNAndras · May 8, 2025, 3:54pm

Batch, Cairo, and probably the assembly languages would have some level of difficulty, but I think it’ll be mostly fine. Batch could swap out UTF-8 for similar ASCII characters. Cairo could do a two-dimensional array of enum variants as well. Both could be surfaced in instructions appends.

ErikSchierboom · May 12, 2025, 5:00pm

The mazy mice PR has been progressing nicely, but one thing that we’ve struggled with is how random the generator should be. In Mazy Mice by rabestro · Pull Request #2312 · exercism/problem-specifications · GitHub, senekor argues that allowing students to generate fully random mazes would make it harder for tracks to implement the maze. They suggest instead to pick a maze generating algorithm such that the canonical tests can be much more specific. I’d be curious in everyone’s thoughts, but mostly the maintainers’

mk-mxp · May 16, 2025, 2:56pm

Testing the suggested properties (“there is a single path from the entrance to the exit” and the like) makes writing the tests a good TDD exercise, that requires a load of concrete test cases. Generating such property tests automatically is impossible, I’d say.

Testing for “uses randomness” (“if the seed parameter is omitted, random mazes should be generated”) is quite impossible, as we often discuss in the forum. Suggesting to do so in problem specifications is a bit weird…

So, if we can’t have concrete test cases provided by problem specifications, simply provide a textual description with no canonical-data.json at all (we have a few such exercises, like dot-dsl).

rabestro · May 16, 2025, 6:02pm

Agree
Testing for “uses randomness” (“if the seed parameter is omitted, random mazes should be generated”) is quite impossible…

I use this pattern:

a = generateMaze(x, y)
b = generateMaze(x, y)

asserts:

a should be not equal to b
a is a perfect maze of size (x,y)
b is a perfect maze of size (x,y)

I had written tests for Java and AWK.
I am quite sure it is possible for other languages.

It is an option. While I’m using canonical-data.json to feed AI for generate test cases for AWK track (we don’t have a dedicated generator) and I can confirm that AI smart enough to generate good results. For this special exercise it may be a little bit tricky.

Just in case - how I used AI to generate a test for Scala track. After migrating to Scala 3 the old generator has broken and doesn’t work. So, I just ask AI. I can feed canonical-data.json or in the last case I feed Java tests. AI generates 100% valid tests.

So, we can keep canonical-data.json just for AI or some text description.

https://g.co/gemini/share/bcb716a8ca66

IsaacG · May 16, 2025, 6:53pm

Note, sometimes a randomizer does return the same value twice in a row ;) If the user does something like randomizer = new Random(seed=getRandom()) then there’s a chance that this would fail.

Anything is possible. That doesn’t mean it’s a reasonable expectation or ask for maintainers to do this.

If an AI is needed to generate tests, that is a bit of a red flag with regard to the test complexity.

senekor · May 16, 2025, 10:00pm

I haven’t seen a convincing arguments against my idea of making the whole thing deterministic:

pick an interesting maze-generation algorith
define the set of edge cases we need to test against for implementing that particular algorithm
find mazes where generating them with that algorith isolates the edge cases
take the list of “random” decisions the algorithm needs to make to produce a maze as input to the test case
(optional) sprinke randomness-vibes on top by including a test case where the decisions of the algorithm are actually generated randomly

benefits:

higher-quality tests
possible to use existing test generators (instead of AI)
Students are taught the importance of separating the functional core of their software from any randomness / IO to improve testability.
Tracks are more likely to actually implement the exercise, as there is no hurdle of writing tests manually or reviewing AI generated ones.

mix of benefit and downside:

Students cannot cheese the exercise by trivially generating boring-to-solve mazes, but they also cannot choose to implement different maze generation algorithms they might be interested in.

IsaacG · May 17, 2025, 4:05am

I like the determinism in this approach. However, would that result in an overly prescriptive requirement? Most exercises are pretty agnostic to how you go about solving them.

senekor · May 17, 2025, 8:04am

That’s true, but I think it makes sense for this exercise. Firstly, there is the exercise about the Sieve of Eratosthenes, so it’s not unprecedented to have exercises about specific algorithms. But more importantly, coming up with an interesting maze generation algorith is difficult. If users come up with their own algorithm, there’s a good chance it’ll be boring and or produce boring puzzles. The instructions in the PR even link to Wikipedia, nudging users to pick an interesting algorithm to implement. So, if we want users to pick an existing, well-documented algorithm anyway to make the exercise fun, why not make that decision for them and massively improve the tests in the process?

keiraville · May 17, 2025, 10:38am

Here is a specific suggestion for a maze generation algorithm. My wording isn’t student-ready.

Random Number Generator

We use a Linear congruential generator

X_{n+1} = (8121 * X_n + 28411) mod 134456

X_0 is the supplied seed (it appears as an input field in canonical-data.json)

Each X_n specifies a direction, using X_n * 4 div 134456

0 right
1 up
2 left
3 down

Maze Generation

The maze is generated using randomized depth-first search (backtracking algorithm).

The width and height of the grid of cells is specified in the input.

We start at the top left corner and choose the next cell to visit by generating a random direction. Whenever the selected direction is not valid (because it would take us outside the grid or to a previously visited cell), we generate another random direction. If all neighbouring cells have been visited, we go back to the most recently visited cell having an unvisited neighbour. We repeat until no cells have an unvisited neighbour.

Note something that should be made clearer: we don’t generate a random direction when a cell has no unvisited neighbours, but otherwise we always do generate a random direction; even when there is only one unvisited neighbour, in which case it might take us several attempts to find the only valid direction.

senekor · May 17, 2025, 1:48pm

Linear congruential generator

Always using a fixed scheme to generate the decisions makes it hard to find a seed that produces a maze that’s well suited for testing a specific edge case. I think it would be much easier to simply store the list of decisions as a list. At least for the bulk of tests that check for known edge cases on relatively small puzzles.

It would be reasonable to use the random number generator for later tests that involve larger mazes. The two can be mixed.

"input": { "decisions": ["up", "down", "left", "right"] }
"input": { "seed": 1234 }

The list of decisions could also be a string where each character is a decision, if a smaller encoding is desired.

The maze is generated using randomized depth-first search (backtracking algorithm).

I have no strong objections to that, but I think it’s worth discussing a few options of maze generation algorithms. We want the algorithm to be as interesting as possible to implement. The difficulty should also be considered.

For example, I think Wilson’s algorithm would be more interesting to implement. It also has the neat property that it “generates an unbiased sample from the uniform distribution over all mazes” (if the decisions made are actually random).

mk-mxp · May 18, 2025, 8:54am

This sounds like I would implement such exercise:

Part 1: Students implement a problem spec provided algorithm for a random generator that allows determinism by seeding
Part 2: Students implement a problem spec provided algorithm for a maze generator that allows using the random generator from part 1

I have no opinion about the algorithms to use…

rabestro · May 18, 2025, 10:57am

Poor, poor mice! With their tiny brains, they will definitely get lost in complex and interesting labyrinths!

kotp · May 18, 2025, 6:30pm

Except for maybe Pinky, zorf!

SleeplessByte · May 26, 2025, 3:38am

Drive-by comment: seeds don’t transfer between two prngs such as two different languages.

Everything else: yeahhhh

keiraville · May 26, 2025, 7:29am

Drive-by comment: seeds don’t transfer between two prngs such as two different languages.

I explicitly specify a prng, to obtain the same results in any language. Like many Exercism test suites, I keep to numbers below 2^31. This maximizes language support.

Always using a fixed scheme to generate the decisions makes it hard to find a seed that produces a maze that’s well suited for testing a specific edge case.

That is a one-off challenge during test case selection. One could instrument an example solution to report if each edge case is encountered, and search for a suitable seed. (In a similar way, I randomly searched for game-of-life input that covered 18 cases: dead and live cells with 0…8 live neighbors.)

“generates an unbiased sample from the uniform distribution over all mazes” (if the decisions made are actually random).

My suggested prng

X_{n+1} = (8121 * X_n + 28411) mod 134456

can be used to obtain an unbiased choice y in [0, K), as follows:

L = (134456 div K) * K
do
  obtain next X
while X >= L
y = (X * K) div L

My earlier suggestion evaded this complexity by always having K=4. so L= 134456 and all X are below L. But I certainly agree that Wilson’s is more interesting.

senekor · May 26, 2025, 10:25am

Ok that seems doable… But what’s the advantage? You still have to tell users exactly how to use the numbers generated by this prng so they can implement the algorithm correctly. What’s the difference compared to just giving them those decisions as a list?

I suppose it would make the data structure in canonical-data.json more uniform. I don’t value that too highly. I use templating on the Rust track to generate tests and I think one more if-statement in the template to distinguish two different types of input is not a big deal.

The list of decisions as input also makes it easier for students to understand why exactly a test case covers a specific edge case. And it forces exercise authors to choose an exercise design that makes it easy for students to define their own test cases if they’re having trouble with an edge case the test suite designer didn’t consider.

It seems to me the biggest impact is just making it harder for the test suite designer. If they want to go that route, I’m not opposed to it, but I’m not seeing the benefit yet.