Bug in Python List Ops test

First off, here’s the Python list-ops test file and the canonical data in problem-specifications.

There’s a bug in the Python version: when a person I was mentoring submitted an passing iteration in which foldr only worked for addition (and not for any general function as it was supposed to), I was surprised. I checked the tests, and it turns out there are only two tests implemented for foldr. The accumulator in the first is a multiplying function, however, it’s an empty list, so the function is never called. The second test has an addition accumulator, and is functionally the only test that checks if foldr works properly. Thus, when the student wrote code that checked only for addition, all the tests passed.

I propose we add more tests to the Python List Ops test files. This is illustrated even in the canonical data for the exercise, wherein there are tests for both addition and division.

I’m not sure why, but this is intentional, not a bug.

Well, it did happen, but we don’t know that it’s intentional. Perhaps we should reverse it, then? Unless there’s a reason behind it.

I’m pretty sure there’s a reason behind it, ie that it is intentional, as I said. Bethany is the track maintainer and explicitly made that change. I expect there was a reason for this change. I don’t know what it is, though. I certainly wouldn’t try to revert something she explicitly changed just because I don’t know why it was changed.

@BethanyG do you remember why those tests got dropped?

I’m fairly swamped at the moment getting ready for Analytic April, so I am going to have to defer looking into/discussing this until after that. Thanks.

2 Likes

Of course not.

We’ll keep this on hold then, thanks!

[8066003b-f2ff-437e-9103-66e6df474844]
description = "folds (reduces) the given list from the right with a function -> direction dependent function applied to non-empty list"
reimplements = "be396a53-c074-4db3-8dd6-f7ed003cce7c"

Was set to false in the toml file, because at the time of update, the example solution failed the test, and no one had time to re-work the solution to make it pass. The most straightforward thing was to not include the test until such time as someone could re-work the example solution, update the toml file, and re-generate the test file.

@IsaacG – if you have time to do so, I would welcome a PR. Please keep in mind this would NOT be an exemplar solution – the examples are only meant to be proof that the problem and test cases are solvable.

I don’t know that I want to add any additional test cases beyond what the track already has right now. But we can discuss it, if we find in re-working the solution that there is a glaring need.

There is a mismatch between the canonical tests and the additional test_foldr_foldr_add_string. The former expect the accumulator to be the first argument to the folding function; the latter expects it to be the second argument. This has been te case since https://github.com/exercism/problem-specifications/pull/1746. See also this comment.

The fix will probably break lots of solutions. A good time to try to add all other missing tests?

Perhaps. I have to confess, I am overloaded enough with approaches and other maintenance work, that I’m not tracking as closely as I maybe should.

Keeping it the way it is would sustain the status quo - but clearly, that’s going to create problems down the road, so maybe we “bite the bullet”.

@MatthijsBlom – if you have bandwidth today/tomorrow, perhaps you could start the process to review/fix things and add additional tests if needed? Ping me if you need help with the toml/templating/test regeneration process.

Be warned: the toml file for this exercise has an unusually large number of reimplements, and then there is a .json file for the additional track tests.

I’d love to do anything I can, if you could explain exactly what! I’m quite free, but I’m new to Exercism, and so am unclear as to what exactly I should do here. Specifically, what missing tests should we implement?

I think since this has now gone to discussing an exercise re-design, we leave this as-is until that discussion is concluded.

Per my discussion with @BethanyG, I’ll take a look at adding more tests from the problem specs and exploring the problem space. I’ll report back my findings and/or file a PR once I got things worked out.

1 Like

And I (we) appreciate the offer of help! I think the problem here is that altering practice exercises at a track level is more complicated than it looks from the outside – especially on a track like Python, where practically all of the practice exercise test files are templated and auto-generated.

Explaining “exactly what” (and then reviewing) often takes more time than … just doing it. For this exercise, where there has been and is a debate on the tests and the implementation – I think it is probably best to have someone like Isaac or Matthijs (who are more familiar with how things are glued together here) do this bit on this exercise for now.

Should we decide to introduce follow-on practice exercises or do some concept exercises, that might be a great time to jump in and learn the process - or have a group session where we walk through the whys and hows.

1 Like

What is unimplemented?

Unimplemented exercises (which are not deprecated): {'8066003b-f2ff-437e-9103-66e6df474844', 'd7fcad99-e88e-40e1-a539-4c519681f390'}

Both of those missing exercises currently fail when run with the example.py.

Detecting unimplemented
>>> t = pathlib.Path("tests.toml").read_text()
>>> u = toml.loads(t)
>>> old = {v.get('reimplements') for v in u.values() if 'reimplements' in v}
>>> missing  = {k for k, v in u.items() if not v.get('include', True) and k not in old}
>>> missing
{'8066003b-f2ff-437e-9103-66e6df474844', 'd7fcad99-e88e-40e1-a539-4c519681f390'}
Generate tests and run tests
» ./bin/generate_tests.py list-ops &&  ./bin/test_exercises.py list-ops
...
FAILED list_ops_test.py::ListOpsTest::test_foldl_direction_dependent_function_applied_to_non_empty_list - ZeroDivisionError: integer division or modulo by zero
FAILED list_ops_test.py::ListOpsTest::test_foldr_direction_dependent_function_applied_to_non_empty_list - AssertionError: 1 != 9

Understanding the first failure

What went wrong with d7fcad99-e88e-40e1-a539-4c519681f390? The reimplementation explicitly says it requires / to support floats and to use the original/reimplemented test if using //.

Options?

  1. Use the original “deprecated” test.
  2. Use / and not //.

Suggestion: Since // is used for this test and only this test, the Jinja template can simply drop function.replace("/", "//") from the preprocesor and stick with /. This lets us make fewer changes to the original data and pass more tests with the existing solution.

Details of the first test (comment from the problem spec)

canonical-data.json

{
"description": "direction dependent function applied to non-empty list",
"comments": "[...] Expects / to preserve fractions. Integer division will not work here, since it would compute 1 / 24 = 0. Use the original test values (d2cf5644-aee1-4dfc-9b88-06896676fe27) if integer division is expected / required."
}
Suggested Diff
     {% set function = function.replace("modulo", "%") %}
-    {% set function = function.replace("/", "//") %}
     lambda {{function}}

Understanding the second failure

What went wrong with 8066003b-f2ff-437e-9103-66e6df474844? The example.py expects the function to take args (element, accumulator) but the test passes a function which takes args (accumulator, element). If we assume the canonical data is correct ("function": "(acc, el) -> el / acc"), then the example.py has a bug where it swaps the arguments.

We could change the example.py so it implements what the problem spec lays out: a function which takes (acc, el) and not (el, acc).

Issue: there is a Python-specific additional test which relies on the function taking (el, acc). This means the Python-specific test and the canonical data require different behaviors. One (canonical) requires calling acc = func(acc, el) and the other (Python) requires acc = func(el, acc). Bringing this exercise in line with the canonical data will break all current solutions. It’s simple to update the Python specific test to bring it in line with the canonical data.

Suggestion: Bring the Python tests in line with the canonical data, even though it does break existing solutions.

Details of implementation error

The Jinja generated foldr test uses a signature lambda acc, el. The example.py solution has a call function(list[0], foldr(function, list[1:], initial)). Note the lambda expects acc, el while the implementation passes el, acc.

Python test which has requirements different from the canonical data
def test_foldr_foldr_add_string(self):
    self.assertEqual(
        foldr(lambda x, y: x + y, ["e", "x", "e", "r", "c", "i", "s", "m"], "!"),
        "exercism!",
    )
Suggested Diff

In line with the unit test, which reads, lambda acc, el:

# example.py
-        return function(list[0], foldr(function, list[1:], initial))
+        return function(foldr(function, list[1:], initial), list[0])

# additional_tests.json
       "input": {
         "list": ["e", "x", "e", "r", "c", "i", "s", "m"],
         "initial": "!",
-        "function": "(x, y) -> x + y"
+        "function": "(acc, el) -> el + acc"
       },

Conclusion

One foldl test can be easily added by removing a transformer line from the test generator. This has no impact on existing solutions and allows adding one additional canonical test for foldl.

Suggested Diff
# template.j2
     {% set function = function.replace("modulo", "%") %}
-    {% set function = function.replace("/", "//") %}
     lambda {{function}}

The other missing test is for foldr. This one is sticky. The problem spec says functions take func(acc, el) and enforces this via the missing test. The example.py and the additional Python-specific test enforce func(el, acc). We could bring Python in line with the canonical data and add the missing test, but this will break existing solutions and require a slight modification to the Python specific test.

My personal opinion is that we should bring Python in line with the canonical specs.

Suggested Diff

In line with the unit test, which reads, lambda acc, el:

# example.py
-        return function(list[0], foldr(function, list[1:], initial))
+        return function(foldr(function, list[1:], initial), list[0])

# additional_tests.json
       "input": {
         "list": ["e", "x", "e", "r", "c", "i", "s", "m"],
         "initial": "!",
-        "function": "(x, y) -> x + y"
+        "function": "(acc, el) -> el + acc"
       },

I put those changes into a commit. If those changes sound good to you, @BethanyG, I can request a PR for those changes.

@safwansamsudeen Most of this know-how I have acquired just by reading docs. I guess you can too. For starters see exercism.org/docs/building and the docs in the Python repo. There are other docs elsewhere, but right now I do not know where. Click around in the Exercism org on GitHub. Browse.

To make sure that you understood, you can try to reproduce my finding of the discrepancy between the canonical and additional tests.

Even going in blind I was probably done finding the problem quicker than that I would be able to explain how to do it. Finding the problem took 30–60 minutes, and that is without the fix and review.

How do you reconcile these? It seems like an expensive change to me, with unclear payoff.

Reconcile what? For the foldr exercise, Python diverged from the problem specs. We can either remain diverged (which means the order-dependent foldr problem spec tests won’t work for Python and will continue to never work) or we can update Python to fall in line with the problem specs.

Generally speaking, the problem specs are considered “correct” and/or the canonical approach. There’s an effort to bring tracks as close inline with the problem specs as possible. Doing so, combined with template-base test generators, has the long term benefits or inheriting any changes made to the problem spec and being uniform across tracks. This makes long-term maintenance of exercises cheaper.

As Bethany mentioned earlier about “biting the bullet”, sometimes it’s painful in the short term to fix things (bring them in line with the problem specs) … but it makes track maintenance easier in the long run.

There is a third option: modify & add the missing tests to be consistent with the additional tests and hence all prior solutions.

There’s already an order specific test. What would be the value in adding a second test that tests the exact same behavior if the new test isn’t the canonical test? Having it based on the canonical test but modified doesn’t give us the benefits of using the canonical data.

Just to note: I am not ignoring things here. I am … thinking.

And so it doesn’t get lost in my rather long reply, I am going to add here at the top that I like Matthijas suggestion of a iterator ops exercise a lot. And I think we may want to create that exercise irregardless of what we decide here. We will be making several concept exercises on iterators, generators, itertools and functools. We’ll need more practice exercises that nudge students to use those features / tools.

Yes – BUT the canonical data (and the exercise descriptions for that matter) are generic for a reason. I’d say that the desire is to make the exercises as consistent as possible, but I am uncomfortable with the term ‘uniform’. Tracks have to adapt those things to what makes sense for their programming language. For example, some languages don’t even have a list or array data structure, nor the same notion of a custom type (aka making a class or struct).

All that to say that the Python track (and others) deviate when it makes sense in support of idioms or features in that language. Or when maintainers feel that they need a certain ‘type’ of exercise (a more functional-focused, or class-focused, or what-have-you focused) – its why we have a scenario field in problem-specs, and also why we have instruction appends and test appends at the track level. And we also have the ability to construct practice exercises that aren’t in problem specifications, and don’t have any canonical data.

IMHO - It’s when we are arbitrary about deviating, or when we deviate without documenting why that we get into trouble.


SO
I don’t have a clear reason as to why Python deviated in the first place - I can only guess. I do know that both functools.reduce() and itertools.accumulate() have the initializer last in the argument list, and specifically work leftright so the logic may have been that expectation and tests should follow what is seen in the core lib. And we may want to add an instruction append that talks about that a bit, should we change this exercise.

I agree that for long term maintenance and general ease, it might be best to bring this exercise into line with canonical data, with an appropriate instruction append. Yes - that’s going to invalidate 1500 solutions. But the alternative is to keep something custom that we’re likely to forget about or decide to change down the road (invalidating even more solutions).

The big question for me is that with the proposed iterator ops exercise on the table, does it make sense to even keep this list ops practice exercise at all?

I am thinking yes – as Glenn pointed out in the other thread, should a student choose to accept the challenge, there is ample learning opportunity to be had here. But we could also just freeze this exercise as-is, or deprecate it in favor of `iterator ops.

But what are your thoughts, @IsaacG @MatthijsBlom @safwansamsudeen ??