Function naming for tests

tasx · January 6, 2025, 11:22pm

Function naming for test implementations came up recently in the Clojure thread and in a stale GitHub issue. For those who haven’t been following, I’ll summarize the key points here. I look forward to hearing the opinions of other maintainers. Apologies for the length of the post. All views are very appreciated.

So, here’s how a typical function looks like:

(deftest does-not-detect-non-anagrams-with-identical-checksum
   (testing "Does not detect non-anagrams with identical checksum"
      (is (= [] (anagram/anagrams-for "mass" ["last"])))))

The part next to deftest on the first line is the function name, and the “testing” line is optional.

Traditionally, on this track, function names are derived from the test description (shown as a string on the next line) by converting all text to lowercase and using a hyphen to separate the words. Here is how this test appears on the website:

Notice how, right below “Test 7”, the function name is displayed. I’ve seen other tracks (JS, iirc) using the test description in that position, which is something I’d like to see implemented here as well.

Unfortunately, I’m not in a position to make that change right now. I haven’t looked into the test runner, and I don’t believe I have either the expertise or the time to modify it.

In the other Clojure thread and on github, I argued that displaying the function name in this way doesn’t serve much of a purpose:

It wastes time by forcing people to parse a function name that often doesn’t make much sense.
It can become very long, particularly in nested descriptions.
It is difficult to maintain manually (yes, we still write the tests manually). Often, tests are copy-pasted, leading to duplicated function names because similar descriptions are used. This is a significant issue because two identical function names cause the second function to override the first, resulting in the first test being skipped without any error notification.

To address this, and given the lack of consensus on function naming conventions, I decided to adopt a test-<uuid> pattern for naming. With this change, the first line of the above test becomes:

(deftest test-1d0ab8aa-362f-49b7-9902-3d0c668d557b
...)

Obviously, this approach isn’t perfect either. The function name still appears on the website, and it doesn’t make sense. However, my reasoning is that people will also quickly learn to ignore that part and focus on expanding the test to view the “code run”.

Additionally, if we reach a point where every test includes a “testing” section that describes the test in plain English, we could eventually replace the displayed function name on the website with the proper test description. This would also allow us to simplify the “code run” by omitting the “testing” line entirely.

For these reasons, I have concluded that the function name doesn’t matter all that much. As long as the names are unique, I’m personally fine with it.

That said, I’m currently the only active maintainer, and I’d like to know how others would handle this situation. What would you consider an appropriate name for the function? Do you think deriving function names from descriptions is good or bad practice? Do you think the test-<uuid> pattern is a good approach?

Please share your thoughts.

BNAndras · January 6, 2025, 11:40pm

You need to change the human-readable test name in the results.json the test runner creates for the student’s solution. The Test Runner Interface | Exercism's Docs

Erlang numbers their function names so there’s no duplication. Your example test becomes 7_does_not_detect_non_anagrams_with_identical_checksum_test_ but is reported by the runner with a test name of “does not detect non-anagrams with identical checksum”.

BethanyG · January 7, 2025, 12:54am

The UUID thing is a bit problematic IMHO. These files also get downloaded by students using the CLI, and it feels pretty confusing/noisy to me to have to look for what the test function is testing when scanning the file (especially if the files are inconsistent in their formatting). But I am also brainwashed by Python’s “readability” mandates.

Leaving the naming aside, I’m chiming in for @BNAndras method/suggestion.

Python has a test generator, and uses a JinJa template per practice exercise for all but a very few exercises (here is an example of the JinJa template for Anagram).

This avoids the duplication and errors from manually updating test cases for the most part, since generation is based on a tests.toml file that lists test cases by UUID. The only (potential) hiccup is when a bunch of test cases get reimplemented, and we are sloppy in updating the tests.toml file.

We pull the canonical data description as the test function name, and because we’re using the syntax from Python’s built-in unittest module, we have both a class name and test functions that all start with test_ (here’s Anagrams test file as an example).

This led to extremely long test names and weird wrapping when we first tested it in the V3 UI, so we had the test runner trim and reformat it for the test runner JSON. Here’s the code and here is what that looks like on the site:

…and an expanded test:

Looking at Anagram specifically, we could probably do more in the template to shorten the names…another thing to add to the endless TODO list.

tasx · January 7, 2025, 1:31am

What the test function is testing is indicated by the (testing "Does not detect non-anagrams with identical checksum") part. That’s exactly why it exists. Function names are not supposed to used as a substitute for test descriptions.

Compare the two versions:

(deftest does-not-detect-non-anagrams-with-identical-checksum
   (testing "Does not detect non-anagrams with identical checksum"
      (is (= [] (anagram/anagrams-for "mass" ["last"])))))

(deftest test-1d0ab8aa-362f-49b7-9902-3d0c668d557b
   (testing "Does not detect non-anagrams with identical checksum"
      (is (= [] (anagram/anagrams-for "mass" ["last"])))))

In the first version you first parse the function name. Then you read the actual test description. You’ve just wasted time parsing two identical things formatted differently. In the second version you still read the first part of the function name, but you make no effort to parse it. You already know that the test description is on the second line.

ErikSchierboom · January 7, 2025, 1:45pm

I was of the same opinion, and I definitely haven’t been brainwashed by Python’s reqadability mandates

I honestly don’t see the problem here. Once I’ve seen the first version (with the longer name), any subsequent reads will just make me skip that bit (same as the guid bit).

To me, it looks like we’re optimizing things for the maintainer, not the student, whereas it should really be the other way around.

tasx · January 7, 2025, 2:55pm

Now we are talking. I agree.

It’s a combination of both. Including the description in both places isn’t common practice, and I prefer not to imply that this is how tests are typically written. The (testing ...) part should describe the test, while the function name can be anything. Ideally, it would be a concise description, but that often leads to duplicate names. Moreover, having the description in both places isn’t ideal for user experience, even if someone trains themselves to ignore the function name.

That said, I now have a clearer plan for moving forward. I’ll share my decision here later, but first, I’d like to gather a bit more feedback if possible.

ErikSchierboom · January 7, 2025, 6:08pm

Reading some example code, deftest is often defined once per tested function (e.g. deftest isogram?. IIRC the downside of that was that it’s inner tests stop executing when the first error occurs?

tasx · January 7, 2025, 6:47pm

No, they execute properly. But only the first failed test per deftest is shown. Take a look at this

This exercise has all tests in a single deftest. I edited the circled code so that more than one test would fail. (I added a zero to the end of each number 1->10, 2->20. and so on)
Yet, only the first failing test case is shown. The “ones”.

Edit: This appears to be a test runner issue. Can’t replicate locally.

tasx · January 7, 2025, 7:02pm

But, if only the first failing test is shown, it might also mean that the inner tests never execute. So i guess the correct reply would be "No, they should execute properly. "

tasx · January 7, 2025, 7:09pm

How is the tested function denoted in the canonical-data? Is it the “property” key?

ErikSchierboom · January 7, 2025, 7:09pm

Right. Then we should aim to fix the test runner

tasx · January 7, 2025, 7:37pm

Sure, that sounds like a better approach. Then we can have tests that match the structure of the canonical-data, short function names, and less code.

BNAndras · January 7, 2025, 9:13pm

Yeah, property identifies what’s being tested so it’s more or less synonymous with the function name in practice. You’re not held to using that name though as represented especially if it’s not idiomatic to your language.

tasx · January 7, 2025, 9:54pm

Alright, I’ll wrap things up. Thanks to everyone who viewed and shared their thoughts. Here’s the plan moving forward. Everything below this line is Clojure-specific:

If the test runner is fixed to display all failing cases, we can consider:

Test Organization:
One deftest per tested function will include all test cases for that function, along with their descriptions. The name of the function will follow the test-<function-name> pattern, where <function-name> matches the name of the function in the stub. For example: test-anagram?.
UUID:
The UUID of each test case will be included as a comment (e.g., ;; <uuid>) before each (testing ...) form so that we can quickly locate each case in the canonical data.

Downside:

This approach increases code density within a single function. It can become unwieldy, especially when implemented tests span 10+ lines. The inherent nesting of Lisp syntax exacerbates this, making it harder for humans to parse compared to having one test case per deftest.

If the test runner isn’t fixed:

Test Organization:
One deftest per test case. Each test case will include its own description. Each deftest will be named using the corresponding description as it appears in the test file. Given that we’ll end up with the same information in both the description and the function name, the name of the deftest will probably be revised. A possible solution would be the <function>-test-<n> pattern, where <function> matches the function name in the stub, and <n> is a number. For example: anagram?-test-1.
UUID:
The UUID of each test case will be included as a comment (e.g., ;; <uuid>) before each deftest form

Descriptions:

I’d prefer that the .toml file keeps the description verbatim from the canonical data. However, the description in the implementation may be modified. For example, if the description references lists but the implementation uses vectors, the implementation’s description should refer to vectors.

Function Name Shown in the Online Editor

We cannot generate them from the .toml descriptions. I’ve encountered many case where the .toml files were in sync, but the tests have not been implemented.
We cannot generate them from the (testing...) forms because not every implementation has one.

Bummer!

Next Steps:

If the test runner isn’t fixed, I will update all merged and unmerged cases, remove the uuids from the function names, and probably go with the <function>-test-<n> pattern since i don’t see any compelling reason to have the test case description duplicated in the function name.

@ErikSchierboom Comments or any disagreement?

ErikSchierboom · January 8, 2025, 6:32am

That sounds great to me! I’ll have a look at upgrading the test runner to see if that fixes anything

iHiD · January 8, 2025, 12:12pm

@tasx Thanks for reaching out to the community for their input here, and thanks everyone for joining in

ErikSchierboom · January 8, 2025, 12:39pm

@tasx This is the code that the test runner runs: clojure-test-runner/bin/run-exercise-tests.clj at main · exercism/clojure-test-runner · GitHub

It indeed translates this code:

(deftest largest-series-tests-pass
  (testing "can find the largest product of 2 with numbers in order"
    (is (= 72 72)))
  (testing "can find the largest product of 2"
    (is (= 48 48)))
  (testing "finds the largest product if span equals length"
    (is (= 18 18))))

to this JSON:

{
    "name" : "largest-series-tests-pass",
    "status" : "pass",
    "test_code" : "(testing \"can find the largest product of 2 with numbers in order\" (is (= 72 72)))\n(testing \"can find the largest product of 2\" (is (= 48 48)))\n(testing \"finds the largest product if span equals length\" (is (= 18 18)))"
  }

tasx · January 9, 2025, 2:03am

Hmm, this looks normal to me. But since you posted it here, it means i’m missing something.

ErikSchierboom · January 9, 2025, 6:15am

There should be three entries instead of one, as it is running three tests

ErikSchierboom · January 9, 2025, 9:49am

I’ve looked into this and it looks like it might not be possible to fix the test runner, so my preference would be to go with having one deftest per test case and I’m fine with the <function>-test-<n> pattern.