Bug in Python List Ops test

MatthijsBlom · April 7, 2023, 10:47pm

@safwansamsudeen Most of this know-how I have acquired just by reading docs. I guess you can too. For starters see exercism.org/docs/building and the docs in the Python repo. There are other docs elsewhere, but right now I do not know where. Click around in the Exercism org on GitHub. Browse.

To make sure that you understood, you can try to reproduce my finding of the discrepancy between the canonical and additional tests.

Even going in blind I was probably done finding the problem quicker than that I would be able to explain how to do it. Finding the problem took 30–60 minutes, and that is without the fix and review.

MatthijsBlom · April 7, 2023, 10:54pm

How do you reconcile these? It seems like an expensive change to me, with unclear payoff.

IsaacG · April 7, 2023, 11:01pm

Reconcile what? For the foldr exercise, Python diverged from the problem specs. We can either remain diverged (which means the order-dependent foldr problem spec tests won’t work for Python and will continue to never work) or we can update Python to fall in line with the problem specs.

Generally speaking, the problem specs are considered “correct” and/or the canonical approach. There’s an effort to bring tracks as close inline with the problem specs as possible. Doing so, combined with template-base test generators, has the long term benefits or inheriting any changes made to the problem spec and being uniform across tracks. This makes long-term maintenance of exercises cheaper.

As Bethany mentioned earlier about “biting the bullet”, sometimes it’s painful in the short term to fix things (bring them in line with the problem specs) … but it makes track maintenance easier in the long run.

MatthijsBlom · April 8, 2023, 8:10am

There is a third option: modify & add the missing tests to be consistent with the additional tests and hence all prior solutions.

IsaacG · April 8, 2023, 12:53pm

There’s already an order specific test. What would be the value in adding a second test that tests the exact same behavior if the new test isn’t the canonical test? Having it based on the canonical test but modified doesn’t give us the benefits of using the canonical data.

BethanyG · April 9, 2023, 12:05am

Just to note: I am not ignoring things here. I am … thinking.

And so it doesn’t get lost in my rather long reply, I am going to add here at the top that I like Matthijas suggestion of a iterator ops exercise a lot. And I think we may want to create that exercise irregardless of what we decide here. We will be making several concept exercises on iterators, generators, itertools and functools. We’ll need more practice exercises that nudge students to use those features / tools.

Yes – BUT the canonical data (and the exercise descriptions for that matter) are generic for a reason. I’d say that the desire is to make the exercises as consistent as possible, but I am uncomfortable with the term ‘uniform’. Tracks have to adapt those things to what makes sense for their programming language. For example, some languages don’t even have a list or array data structure, nor the same notion of a custom type (aka making a class or struct).

All that to say that the Python track (and others) deviate when it makes sense in support of idioms or features in that language. Or when maintainers feel that they need a certain ‘type’ of exercise (a more functional-focused, or class-focused, or what-have-you focused) – its why we have a scenario field in problem-specs, and also why we have instruction appends and test appends at the track level. And we also have the ability to construct practice exercises that aren’t in problem specifications, and don’t have any canonical data.

IMHO - It’s when we are arbitrary about deviating, or when we deviate without documenting why that we get into trouble.

SO
I don’t have a clear reason as to why Python deviated in the first place - I can only guess. I do know that both functools.reduce() and itertools.accumulate() have the initializer last in the argument list, and specifically work left → right so the logic may have been that expectation and tests should follow what is seen in the core lib. And we may want to add an instruction append that talks about that a bit, should we change this exercise.

I agree that for long term maintenance and general ease, it might be best to bring this exercise into line with canonical data, with an appropriate instruction append. Yes - that’s going to invalidate 1500 solutions. But the alternative is to keep something custom that we’re likely to forget about or decide to change down the road (invalidating even more solutions).

The big question for me is that with the proposed iterator ops exercise on the table, does it make sense to even keep this list ops practice exercise at all?

I am thinking yes – as Glenn pointed out in the other thread, should a student choose to accept the challenge, there is ample learning opportunity to be had here. But we could also just freeze this exercise as-is, or deprecate it in favor of `iterator ops.

But what are your thoughts, @IsaacG @MatthijsBlom @safwansamsudeen ??

IsaacG · April 9, 2023, 12:15am

Re: iter exercise. I think an iterator exercise could be interesting. I don’t think that would necessarily deprecate this exercise, though. This exercise is specifically about lists. Students can solve this exercise pretty concisely or go all in and get pretty messy with it. It is a fairly open ended exercise which can make it fun.

Re: changes here. I believe I’ve shared my thoughts on that already

safwansamsudeen · April 9, 2023, 9:03am

I think it does. Iterators in Python - especially when you get into yield and related syntax - are very different from lists, so there will be a learning benefit in keeping both exercises.

Canonical data: as you said, Bethany, “the Python track (and others) deviate when it makes sense in support of idioms or features in that language.” Considering that functools.reduce and `itertools.accumulate - what Python devs use in real life - implements have the initializer second, it makes in my opinion more sense from the (student’s) long-term benefit to standardize with that, and as Matthijs suggests, modify and add the missing tests to be consistent with the prior solutions. The obvious advantage of this approach is that current solutions aren’t invalidated, but the importance of abiding by Exercism canonical data vs. Python style is what we’ve got to decide.

MatthijsBlom · April 9, 2023, 2:16pm

What I suspect happened:

At some point the Python track might have been in sync with the canonical data.
The Python track then subconsciously adopted the Haskell tradition through an additional test.
Later on, in an effort to cover associativity, the canonical data adopted a different tradition.
Attempting to synchronize, the Python track ran into a conflict and as a quick solution removed the now-offending canonical test.

So I don’t think the Python track truly deviated.

Neither aspects are really the issue here. Instead, it is the parameter order of the accumulating function:

haskell_tradition = lambda acc, new: (acc, new)
racket_tradition  = lambda new, acc: (acc, new)

reduce(haskell_tradition, [1, 2, 3], ())
# ⟹ ((((), 1), 2), 3)
reduce(racket_tradition, [1, 2, 3], ())
# ⟹ (3, (2, (1, ())))

list(accumulate([1, 2, 3], haskell_tradition, initial=()))
# ⟹ [(), ((), 1), (((), 1), 2), ((((), 1), 2), 3)]
list(accumulate([1, 2, 3], racket_tradition, initial=()))
# ⟹ [(), (1, ()), (2, (1, ())), (3, (2, (1, ())))]

But yes, Python follows the Haskell tradition here. Python’s reduce is like Haskell’s foldl, and accumulate is like scanl.

(Why I use the terms «Haskell tradition» and «Racket tradition»: StackOverflow answer.)

By the way: Haskell’s foldr also works left-to-right, and any foldr in Python that works on infinite iterators will work left-to-right as well – though such a foldr might be annoying to construct in Python (because laziness is required).

Given an Iterator Ops I am not in a hurry to deprecate List Ops.

I would like both exercise appends/introductions to point to each other. List Ops could use some fleshing out of the appendage anyway.

I do not see the use of freezing List Ops.

Coming back to the original issue…

The current tests do not cover accumulating functions that are not associative. This does look like a bug to me: associativity is the defining difference between foldl and foldr.

The point of both d7fcad99-e88e-40e1-a539-4c519681f390 (foldl) and 8066003b-f2ff-437e-9103-66e6df474844 (foldr) and their predecessors was to cover these cases. Maybe we should just revert to said predecessors.

BethanyG · April 9, 2023, 10:48pm

That tracks.

Ah! Thank you. This (and the detail that follows) really helps my understanding.

I think fleshing out/adding an instruction append here is definitely needed. I am sure I am not the only person who is fuzzy on the relationships you’ve described.

I agree. But the fact that Python generally follows the Haskell tradition (and in light of Peters excellent note that you referred to earlier) means we either follow that tradition here with an instruction append that makes it clear why and implement the canonical cases to conform – or we follow the Racket tradition and write an instruction append that explains why.

I think I’d prefer to stick with Haskell tradition, unless someone can convince me that there is a compelling idiomatic Python / learning opportunity here.

Either way, this exercise now needs a detailed and clear instruction append.

Ok. Let me think on it a bit more. Thank you all for chiming in.

IsaacG · April 11, 2023, 2:38am

We could swap the foldr args to use func(el, acc) … but foldl uses func(acc, el). The foldr and foldl are not consistent. If we to Python to adhere to one style or the other, foldl or foldr is going to break. Or we could swap the order on foldr to make foldr use func(el, acc) and use the problem specs … but leave foldl as-is using func(acc, el).

IsaacG · April 11, 2023, 2:47am

I updated the PR so foldr uses func(el, acc), but left foldl as-is, calling func(acc, el). This approach (1) uses all the latest problem-spec tests and (2) does not break existing solutions.

BethanyG · April 12, 2023, 8:04pm

@IsaacG I’m way less concerned about breaking the 1500 solutions as I am being inconsistent between foldl and foldr, and then confusing students even further.

So I think we make the JinJa2 change to make d7fcad99-e88e-40e1-a539-4c519681f390 work, and then enable 8066003b-f2ff-437e-9103-66e6df474844 and change the example solution to pass.

As for the additional case - the easiest is to drop it. If you’d like to re-write it to conform to the pattern in 8066003b-f2ff-437e-9103-66e6df474844, be my guest. But I wouldn’t go to a lot of work, as I don’t really think its a necessary case, once we have 8066003b-f2ff-437e-9103-66e6df474844 enabled.

@MatthijsBlom - are you interested in doing the instruction append here? We can also wait, if you’d like to get a start on the iterator-ops exercise first…

MatthijsBlom · April 12, 2023, 9:39pm

I think we established that foldl should be like Python’s reduce / Haskell’s foldl, and so the following should be a valid implementation.

def foldl(f, xs, init):
    acc = init
    for x in xs:
        acc = f(acc, x)
    return acc

(The current example implementation is (extensionally) equivalent.)

As for foldr: there is no precedent in Python-the-language, but the track has required accumulator-second for years. Isaac has adapted the tests to this style, and it has my favour.

There are several applicable notions of ‘consistency’:

If both foldl and foldr take (acc, el) -> acc, then sure their signatures are the same
But if foldl takes (acc, el) -> acc and foldr takes (el, acc) -> acc then
```
foldl(⊕, [a, b, c], x)  ==  ((x ⊕ a) ⊕ b) ⊕ c
foldr(⊕, [a, b, c], x)  ==  a ⊕ (b ⊕ (c ⊕ x))
```
A symmetry all of its own. (And really nice to explain!)

Valid implementations of this foldr include the current example and

def flip(f): return lambda x, y: f(y, x)

# pardon my French
def foldr(f, xs, init):
    return foldl(flip(f), reversed(xs), init)

# pardon my Latin
def foldr(f, xs, init):
    xs = iter(xs)
    for x in xs:
        return f(x, foldr(f, xs, init))
    return init

With Isaac’s patch there is no need to change the example solution. It is entirely backwards compatible.

No need. This old associativity test agrees with the new ones. At worst it is redundant. But it is nicer to debug for a student.

Isaac has already ‘rewritten’ it. The (tiny!) change to additional_tests.json makes me wish (again ) that JSON allowed comments: for days I have thought that it now contains an error, but it doesn’t.

@IsaacG opening the PR would allow comments on code and seeing the tests turn out green.

BethanyG · April 12, 2023, 9:50pm

Can someone just file this PR please?
If not, I will.
And then we are going to close this discussion.
Thanks.

IsaacG · April 12, 2023, 10:07pm

I believe this is in line with what @MatthijsBlom is suggesting.

IsaacG · April 12, 2023, 10:16pm

PR version two: [list-ops]: change // to / and swap foldl/foldr arg order to both call func(el, acc) by IsaacG · Pull Request #3403 · exercism/python · GitHub
This swaps the order of both so foldl and foldr both use func(el, acc)

I have no preference between the two.

MatthijsBlom · April 12, 2023, 10:30pm

It is; I don’t see any fault in this PR.

IsaacG · April 12, 2023, 10:35pm

Awesome! However, our valiant track maintainer indicated a preference with having both folds use the same arg order to reduce student confusion, hence the second PR.

MatthijsBlom · April 12, 2023, 10:44pm

Sure. I’ll start tomorrow morning. Things to write about:

the meaning of foldl and foldr (both ‘direction’ and argument order)
that this exercise is as interesting as you make it yourself; some suggestions for self-imposed restrictions

Anything else?