Yacht doesn't cover all test cases

safwansamsudeen · April 7, 2023, 11:35am

In the Yacht exercise, the instructions say that for “four of a kind”, we must return the sum of the four dice. However, all the tests have the element that’s been repeated four times in the beginning of dice: so many solutions have just put dice[0] * 4 , which doesn’t cover all possible cases or indeed many obvious ones.

I propose, thus, that we add a test with a list like [3, 5, 5, 5, 5]. What do you say?

IsaacG · April 7, 2023, 11:38am

It’s worth addressing all the points listed there with any proposed change.

safwansamsudeen · April 7, 2023, 11:54am

Right.

Adding new tests may break existing solutions. This has the consequence of outdating thousands of people’s work, which is frustrating for them if the extra tests aren’t highlighting any existing bugs.

True, but I feel it is a notable bug in this case.

We therefore deliberately avoid trying to cover every edge-case, forcing lots of input validation, or other such real-world concerns. If your suggested improvement is to catch an edge-case or to check input-validation, it’s unlikely to be accepted unless it makes a substantial difference to the exercise.

This is the gray area: this is an edge case, but is it important enough to be considered for a change? I feel it is: few people are clearly writing wrong code right now, so implementing a new test will essentially catch a bug.

Most importantly, this helps people develop higher-order thinking skills for the exercise. The full house and four of a kind are the toughest bits of Yacht, and implementing this test will ensure that people don’t short cut through four of a kind, and instead develop a full-fledged solution to the problem…

IsaacG · April 7, 2023, 12:52pm

The purpose of tests is to point out what needs implementing. If students want to grab an arbitrary card and use that value for four of a kind, they can just use another index. Tests aren’t there to prevent students from taking shortcuts. They are there to point out what ought to be done.

The only way to prevent students from taking that shortcut is to test every ordering. But that’s not a goal. I don’t think this is worth changing.

safwansamsudeen · April 8, 2023, 9:11am

Hmm, okay.

safwansamsudeen · April 8, 2023, 12:57pm

@IsaacG, don’t you feel this bug is the same type as the thing I reported in List Ops? In that case, we need to implement an extra test to ensure that the student doesn’t bypass using the one test, and it’s the same here.

Adding a test with two different orders will suffice, and it’ll also accurately catch out multiple wrong solutions.

IsaacG · April 8, 2023, 1:07pm

We try to keep exercises consistent with the problem specs when possible. In list ops, the Python exercise is missing tests which are in the canonical data. That’s the primary issue I’m trying to resolve there.

Here, I feel like the tests demonstrate the expected behavior quite well. I believe if any student is taking this shortcut, they are doing so knowing it is a shortcut. Based on the existing problem description and test, I don’t think that’s likely an unintentional shortcut. Currently, they can use 4 of the 5 values as a shortcut. If it is intentional, adding a second test just means they can use 3 of the 5 values. That doesn’t force them to not take the shortcut.

Additionally, there are always various ways students can take shortcuts and bypass the tests. I’ve seen plenty of students hard code test inputs and outputs into their solutions. That’s perfectly fine, though. Exercism isn’t an exam or test. It’s an opportunity. To paraphrase Glenn, you get as much value as you put in work. As Jeremy often says, those students are only cheating themselves.

Preventing students from “cheating the system” or taking intentional shortcuts is not a goal.

BethanyG · April 8, 2023, 1:54pm

We have a general rule on the Python track (other tracks do this as well) that we don’t add additional track-specific tests unless they get at a specific feature of Python, or are for a Python-specific implementation of the exercise (for example, the extra tests we put in place for Clock around __repr__). This proposed change doesn’t fit that definition.

List Ops is a bit of an exception, since those additional tests and changes predate V3, and so are 3 or more years old. And there was something - now lost to time - that prompted a Python track deviation. And now we are thinking through how we might bring it in line (or not) with current canonical data.

If someone feels that additional tests are needed but they are for the exercise in general, we ask that they discuss those changes here, and then propose them in problem-specifications – not the track. That way (should the suggestion be accepted), all tracks can benefit, and the changes can be pulled in by various track maintainers.

safwansamsudeen · April 8, 2023, 2:37pm

Yes, I meant this discussion to be general, not Python-specific.

Ah, OK, got it.

That’s the question: but yes, I agree that it might seem unlikely. Of course, as you say, we can’t (and needn’t) prevent students from taking intentional shortcuts. However - and this is not Python-specific - the List Ops canonical tests had a test for both multiplication and addition so that (the few) students who unintentionally wrote wrong code concerning addition alone would get reminded. That’s what I feel we should do here. A student I mentored recently did get caught unintentionally by this, just as another student I mentored wrote code only for addition in List Ops.

I’m not sure how this is: if we test both [4, 4, 5, 4, 4] and [3, 3, 3, 3, 6], for example, students can’t hardcode values unintentionally. They still could intentionally, but as is clear, that’s beside the point. A possible misleading factor here is that the array provided in the instructions has the first four elements as the element that’s repeated four times (such as the latter example above), so people mistakenly think that they can use dice[0].

IsaacG · April 8, 2023, 3:24pm

I’ve seen solutions where students would approach these inputs with code like so.

if dice == [4, 4, 5, 4, 4]:
    return 16
if dice == [3, 3, 3, 3, 6]:
    return 12

Having both these test cases still allows students to return 4 * dice[0]. Having test cases [4, 4, 4, 4, 5] and [5, 4, 4, 4, 4] would invalidate that solution … but allow students to return 4 * dice[1].

Once again, preventing students from taking shortcuts is not a goal here. I am not remotely convinced that there is a common problem where students read the problem description and tests and mistakenly think that return 4 * dice[0] is the correct solution. Is there is no problem here, there is no need to fix anything.

safwansamsudeen · April 9, 2023, 9:05am

Lol, that’s not useful.

That’s true . You’re right, we’ll forget about this change… unless others have something to add?

MatthijsBlom · April 9, 2023, 12:28pm

At present I do not care for tightening the Yacht tests in the proposed manner. This might change when I am shown instances of the tests’ present weakness being detrimental to students’ learning, such as published faulty solutions (plural).

BethanyG · April 9, 2023, 7:19pm

I agree with Matthijs and Isaaac here. Not convinced that tightening the Yacht tests is warranted.

SleeplessByte · April 11, 2023, 11:40am

In JS things like this we solve with the analyzer. I also wouldn’t change the Yacht tests for that reason and the reasons meantioned.

Perhaps this is easy to check for in the Python Analyzer?

safwansamsudeen · April 11, 2023, 12:06pm

Hi, sorry for the delay, last school week here.

So published solutions, @MatthijsBlom. The third top community solution, although that use [1] instead of [0] . However, I might be mistaken in my understanding: this seems to be a very popular solution and is even applauded by Glenn. Even if it is wrong, Isaac’s point applies here: we’d have to test every ordering to ensure that this doesn’t happen.

I like @SleeplessByte’s solution - would that be possible with the Python analyser? Again, that is only if everybody feels this is a potential error (as opposed to intentional rule-breaking).

IsaacG · April 11, 2023, 12:20pm

That solution is completely fine. It calls sort first to ensure the second element is The value which appears at least four times. It’s a bit complicated, but it is completely valid.

safwansamsudeen · April 11, 2023, 12:21pm

My bad, sorry!

BethanyG · April 11, 2023, 3:25pm

Yes. The Python analyzer could probably check for this. However, this discussion/issue was for the exercise in general, and not the Python version. I can add to my analyzer list investigating adding a check tho.