Order tests for Test Driven Development

TL;DR: The order of tests in an excercise is relevant, if students shall apply / understand Test First / Test Driven Development. Do we (want to) respect this in problem specifications?

I stumpled upon this question while working on minesweeper. Here problem specification defines tests in an order that is inappropriate for TDD. In this exercise the 1-dimensional tests should come before any 2-dimensional test to support TDD well. But I did not find any hint, that the order in problem specifications is in any way related to the order the tests should be passed for TDD. And while discussing the test generators, the order of the tests also never was mentioned.

In many places, Excercism claims to follow “Test Driven Development”. Python even has a track document on the topic. I think, TDD is a good thing to learn. As TDD is a process, not a static thing, I would prefer to say “Test first” to what Exercism actually does. But let’s stick with TDD to name the goal.

Exercism also claims to focus on language fluency, not proficiency. And TDD is, as I understand it, an advanced topic of proficiency. But also very much language agnostic. So it is not the focus of Exercism to teach TDD, but it is the way of thinking we use to provide a learning experience. In other words: we need students to understand TDD at least in part to keep our focus on fluency in languages.

The audience of Exercism is: Students who are new to a language but not new to programming in general (I lack a link to a source that defines that. I know there is one…). As TDD is still not broadly adopted and far away from being a core component of “programming in general”, we can expect little knowledge about TDD in our audience. This is also seen in many threads in the forum, where people have no idea “where input comes from” or “what all these errors shall tell them”.

Students with little or no knowledge about TDD must put a lot of effort into understanding the TDD process on top of the effort to learn the language of choice. Many also are new to Exercism, which adds learning Exercism website usage. So we should reduce the effort of learning relevant TDD things as much as possible.

Students only really need to understand one part of TDD for Exercism: “Write a failing test, then make that pass”. This constant refinement process from general to specific tests and specific to general applicable production code is key to solving the exercises. But it is hard to grasp that step-by-step concept from what we present them.

This step by step refinement is nearly impossible to understand because we already have all tests written. The consequences of having all tests done are: Many tracks need to compile code, that at least has to have all required function signatures defined. And in general all tests are run and the result is a wall of failures.

It is advanced TDD knowledge to decide which one of all the failing tests should be made pass first. No student should make such a decision. Instead, the first test should be satisfied first. This would be easy to communicate, aligns to the idea of the “task id” for the syllabus and greatly improves usability of the test feedback.

There are languages / testing frameworks that could support the iterative refinement of TDD with all tests being written even better. E.g. PHPUnit can stop at the first failing test, so the wall of errors is avoided. That could help students to understand the process of TDD as a step-by-step guidance to their solution as much as possible.

But in the end, all possible improvement requires a sensible order of tests for TDD. Which we currently don’t have. We have a means of grouping by some theme (scenarios and titled cases arrays), but these are in no way related to ordering. The only order we have is the order of test cases in the cases arrays. And that is not meant to be an order for TDD.

Is there another mechanism I didn’t see yet? Should we try to have an order of tests for TDD?

I’m not sure this is true. Most tracks do not teach the syntax of the language. If the student is entirely new to a language, solving exercises in that language without a guide to teach the language basics can be extremely difficult.

Thanks for this.

I have one key question of importance (that we should discuss) and two points probably not worth discussing

Relevant point

Do we specify anywhere what this is meant to be an order for. My suspicion would be that the first specs were ordered from a TDD perspective. I see no reason these shouldn’t be ordered for TDD unless there’s a reason I’m unaware of.

Probably irrelevant points

I’d disagree with this. I think TDD is a way to conceptualise how to solve problems with code. The nature of writing tests and then making each pass is an explicit way of breaking a larger problem down into sub-problems and solving each one. I think that is very much a part of fluency, not proficiency. But I suggest we don’t bikeshed on this as it’s not relevant to the core of the conversation :slight_smile:

This is how the online editor works FYI. But it’s much harder to do this on the CLI route.

See the syllabus docs for examples of how someone totally unaware of the syntax shall be introduced to the language.

The ordering gets disrupted / contradicted by groups with a theme (titled cases inside the main cases array). There is no explicit statement in the docs, that the order in the cases array has to be the order for TDD. And as pointed out in the example of minesweeper, the order in fact is not well suited for TDD.

I don’t instinctively know what a good order for TDD would be here. There’s a risk that a specific ordering introduces an implementation bias (e.g. searching for mines vs searching for spaces). Do you have a strong feeling on what a good ordering would be?

I guess, my point is, if we agreed on what the order was, could we not just reorder the tests here, then when tracks update, we’d have a good ordering for TDD? (I appreciate the subgroupings might make that hard in other places - but let’s discuss this “simpler” case first)

I found, that the test "mine surrounded by spaces" requires to completely solve the exercise. Before that, returning the input was enough. To pass that test, iterating over all 2 dimensions and doing all the mine counting around every space was required to fill in the numbers. All cases following that just worked afterwards. A clear sign of wrong test order.

Moving testcases "horizontal line""vertical line, mines at edges" before that test would require to iterate on one dimension first, then on the other. The students decision about “counting around spaces or mines” is not touched by that.

Also consider, that replacing test cases needs to take the order into account if the sequence in cases is relevant. reimplements cannot be just appended to the end of the cases as done, e.g., in list-ops.

If there is no clear documentation about the relevance of the order, any system of ordering will not be used. Thus the question: “Do we (want to) respect this in problem specifications?”. In my opinion, we currently don’t. If we want to, we need to specify the system of ordering. If there is one already, we need to document it. And apply it.

I wonder if it would be helpful/possible to record how many times each test fails, and/or passes when some prior test fails, from each time a student presses the run tests button. Such stats would probably be useless should say PHPUnit quit half way through, unless it could somehow be persuaded to carry on in “quiet mode” (and would need wiping whenever the test set is updated).

I fully agree with just manually reordering canonical-data.json btw.
Every time I’ve encountered nested cases in that, I’ve just added extra loops to flatten them out and simply throw that information away, btw.