Mismatch between specification and tests for Elixir's Word Count exercise

jonathan-h-grebe · October 13, 2023, 9:04am

The instructions for the word count exercise on the elixir track, seem to conflict with the tests, in terms of implied specification.

As such, if you try to write a solution based just on the requirements described in the introduction, your code will fail - at least this was my experience, although I may be misunderstanding the intended meaning of the instructions.

The two (suspected) discrepancies that I’ve found are as described below.

Discrepancy 1: Required Character Sets

From instructions.md :

The subtitles from these dramas use only ASCII characters.

I took that to mean, “your solution only needs to be concerend with ascii characters”.

In word_count_test.exs however, there are tests that require the solution to deal with German and Polish scripts:

github.com

exercism/elixir/blob/main/exercises/practice/word-count/test/word_count_test.exs#L124-L134


      
          defmodule WordCountTest do
            use ExUnit.Case
          
            test "count one word" do
              assert WordCount.count("word") == %{"word" => 1}
            end
          
            @tag :pending
            test "count one of each word" do
              expected = %{"one" => 1, "of" => 1, "each" => 1}
              assert WordCount.count("one of each") == expected
            end
          
            @tag :pending
            test "multiple occurrences of a word" do
              expected = %{"one" => 1, "fish" => 4, "two" => 1, "red" => 1, "blue" => 1}
              assert WordCount.count("one fish two fish red fish blue fish") == expected
            end
          
            @tag :pending

This file has been truncated. show original

Discrepancy 2: Allowed punctuation that doesn’t split words

From instructions.md:

The only punctuation that does not separate words is the apostrophe in contractions.

However in word_count_tests.exs, this test requires that a hyphen be allowed in the middle of a word:

github.com

exercism/elixir/blob/b913876170294bc40d0ecef64863173f4dca3775/exercises/practice/word-count/test/word_count_test.exs#L45-L48


      
          test "hyphens" do
            expected = %{"co-operative" => 1}
            assert WordCount.count("co-operative") == expected
          end

If only apostrophes should be allowed in the middle of words, then I would have expected co and operative to be split into two separate map keys.

History of exercise (rough)

Having looked at the commit history, I found that the order of changes is basically:

9 years ago: test requiring hyphens to not split words added
3 years ago: german, polish script requiring tests added
6 months ago: instructions.md was edited to include statements that seem to go against the requirements as described above.

So, the “easier” specification implied by instructions.md seems to have been made later, which implies the “harder” specification of the tests is the original and therefore arguably “correct”.

The exercise is however categorized as easy though, and it also seems very possible that the specifications were intentionally simplified at some point, with the tests being left out of the update process by chance.

Can anyone tell me which way would be the right way to unify these two? Or if I’m mistaken and there’s no contradiction between them :)

IsaacG · October 13, 2023, 9:34am

The documentation was reworked here. It sounds like the Elixir version of this exercise may go beyond the canonical data and may benefit from an addendum file and/or matching the canonical tests better and/or not using the canonical docs directly.

jonathan-h-grebe · October 13, 2023, 12:15pm

I see - instructions.md is based on the canonical data (This repository, which defines exercise specifications with the intention of being common to all languages by the looks of it. https://github.com/exercism/problem-specifications/blob/main/exercises/word-count/canonical-data.json), but the elixir implementation diverges from it a little.

Is there a policy, what to do in these cases? eg in cases x y and z it’s acceptable to go beyond the canonical data, that kind of thing. I can understand maybe if there’s something specific to a language, some divergence may be beneficial to learning… although in this case, the two extra cases here don’t seem they would fit that critieria, rather they just make it a bit more challenging. Maybe there’s another reason I’m not seeing?

If there’s not, then it seems like it would make sense to

make the tests that run on submission all meet the canonical data
separate the tests that don’t into an option extra file, maybe label it “going further” or “extra challenge” something like that.

How does that sound?

IsaacG · October 13, 2023, 2:11pm

No. There is no policy. It’s up to the track maintainers to choose what they want to do on their track.

That’s a good question for a track maintainer. Though having two sets of tests doesn’t really work with the online editor since there is no way to control what tests run.
@angelikatyborska @neenjaw @jiegillet

neenjaw · October 13, 2023, 5:06pm

Is the root issue here is that the instructions didn’t prepare you for the complication on the base case? Or that you found it more difficult than advised when indicated it was labelled an easy exercise?

I don’t think there is any mandate that a track may not specify a test case that does not exist in problem specifications, and I’m unwilling to establish a precedent here without advisement from @iHiD , @kytrinyx or @ErikSchierboom (staff).

I would be willing to add an addendum without their input to advise of the additional and non-standard cases.

jonathan-h-grebe · October 14, 2023, 12:43am

Is the root issue here is that the instructions didn’t prepare you for the complication on the base case?

Yes, this exactly.

Just to clarify - I don’t think there’s anything necessarily wrong with the requirements of all cases not being encapsulated in the introduction.md per say. Requirements being fleshed out via the test code seems like a logical approach.

In this case however, the introduction and test cases explicitly contradict each other - ie the introduction specifically says “you don’t need to go further than X”, meanwhile the tests say “yes you do!”

The problem this caused me when trying to write a solution, was the dilemma of not knowing which way is actually correct.
If a solution satisfies the tests, it can be considered correct on one level, but the explicit contradictions suggested to me that the exercise may be a WIP, hence the tests may be changed in the future thus breaking any potential solution. As a result, I decided to hold off on doing the exercise - presumably not a desirable outcome.

angelikatyborska · October 16, 2023, 6:30am

My philosophy of maintaining the Elixir track is to never deviate from the tests defined in problem specifications unless there is an Elixir-specific reason to do so. If the tests that contradict the instructions are not listed there: https://github.com/exercism/problem-specifications/blob/main/exercises/word-count/canonical-data.json, I would appreciate a pull requests that removes them from the Elixir track.

As already noted in this thread, this specific exercise went through a lot of drastic changes in the past, and we didn’t notice that more tests needed adjusting when descriptions changed.

jonathan-h-grebe · October 16, 2023, 11:44am

Thank you for clarifying!

I would appreciate a pull requests that removes them from the Elixir track.

I’m just doing this, and noticed that while most of the test descriptions match the descriptions in the canonical data, a few of them are slightly different.

eg:

test_file: count one of each
canonical data: count one of each word.

Just to check, would it be ok if In the same PR I change the test descriptions to match the canonical descriptions? (eg in the above case, change count one of each to count one of each word

angelikatyborska · October 16, 2023, 1:10pm

Just to check, would it be ok if In the same PR I change the test descriptions to match the canonical descriptions?

Yes, absolutely

iHiD · October 17, 2023, 5:27am

To weigh in on this specific Exercism-wide question, adding extra tests is fine, but it’s generally advisable to also add an instructions.append.md file to explain what’s happening, so that students who have solved the exercise in other tracks do not unexpectedly hit extra requirements from when they’ve solved it before. That’s my general guidance.

ErikSchierboom · October 17, 2023, 1:41pm

I agree with the above!

jonathan-h-grebe · October 17, 2023, 2:04pm

Sorry to keep you waiting, PR is here.

angelikatyborska · October 18, 2023, 8:30am

Kept me waiting? 2 days in open source world is top speed, nothing to be sorry about