Make tests more forgiving, regarding exact wording of error messages

foobarbarian · October 12, 2024, 10:06am

While solving exercises in the Python and PowerShell tracks, I noticed that some tests not only check if an expected exception is thrown, but also check the exact wording of the error message - and they expect some specific wording, which is not equal to the default messages, thrown by those languages. This second part is what forces students to write custom error messages, instead of relying on the built-in functionality of each language.
It makes students write non-idiomatic code and learn bad patterns.

Some examples to clarify my point:

In a Python exercise I used zip(a, b, strict=True), which throws an error in the cases where the tests expect an error (if a and b are of different length). But the tests expected a different error message, so I had to write cumbersome code to pass the tests, while the ideal solution simply required the use of strict=True.

In a PowerShell exercise I had to do parameter validation like this, to satisfy the tests:

param(
    [ValidateScript({$_ -gt 0}, ErrorMessage='error: Only positive numbers are allowed')]
    [int64] $Number
)

when the “proper” way would have been like this:

param([ValidateRange('Positive')] [int64] $Number)

But this second version doesn’t allow customizing the error message.

I’m not sure if something can be done about this, due to automatic test generation - just wanted to spread awareness of the problem.

glaxxie · October 12, 2024, 6:43pm

Hey there,

For the powershell track, I’d have love to have any potential error that can be dealt with in Validating Parameter and have their business done there. It is idiomatic and clean just like you said.

However this lead back to the problem of having to match the default error message somewhat, instead of just following the one from the specs.

So there are two ways to solve this:

Change the expected message to match some of the text from default error message, so you can match it however way you throw error.
Either let pester just catch the error without any message / with any message text.

I don’t like particular like the second one in general, because inside a function there could be multiple points when an error happen and it would be great to know exactly what’s going on.
This is important because we have told new learner repeatedly that the test suite is the authoritative of information for them to use and solve the exercise. There are only a handful of instance where I chose this option because the wording for an error message can be customize in many fashions, and they are mostly belong to harder / later exercises when learners are already adept at throwing error and reading the test suite.

As for the first option, there are some upsides :
. Many new learners will have no idea about the Validating Parameter property, but all of them can see how the throw keyword being used in every stub and use that. So changing the text will validate the default and the custom error msg as long as they match that part.
. Keeping the test suite coherent and clear for new learners (this exercise is quite early on)

Potential downsides:
. Could lead to some awkward error messages.
This is the default msg when using the ValidateRange('Positive') and -1 as input.

Cannot validate argument on parameter 'Number'. The argument  -1" cannot be validated because its value is not greater than zero.

vs

Only positive numbers are allowed

If i have to change it to something to match, it probably gotta be something like Error: Input value is not greater than zero.

Shouldn’t be a problem for native speakers but a lot of people learning using English as secondary language, so keeping the msg easy to understand is key.

. Another downsize which matter a lot to the website is re-running test and breaking existing solutions, which is often discouraged unless really needed. Sure the code might not be idiomatic but they didn’t do anything wrong, breaking their solutions it would annoy people.

I can see a potential solution for this is to modify the test so there will be different accepted error msg. This way if either of the error msg hit then the test pass, allowing more options to check for error. But I haven’t done this before so I will need to check pester documentation to see how plausible it is to implement this.

foobarbarian · October 14, 2024, 7:07pm

Whichever way you choose, if it allows the use of more “out of the box” validation logic, without customizing error messages, I think that’s a win.

However, ideally we would find a solution which improves the situation for all tracks, not just for PowerShell. I’m sure this same issue exists for many languages, not just for Python and PowerShell.

BNAndras · October 14, 2024, 8:27pm

For Racket, if a canonical test expects an error message, we only check if the tested function call fails. In the language, you can raise an exception manually or more idiomatically have a failing contract defined for the tested function. Testing for either possibility is complicated and might be confusing to the student.

This typically isn’t an issue because over the years a lot of work has been put in the problem-specs cases so if the test is using invalid input, it’s invalid for a single reason. An example is phone-number where we used to test for the student handles punctuation by putting invalid punctuation in the input like "123-@:!-7890". However, that input is also invalid because it has an area code starting with 1. In Racket, we wouldn’t be able to test whether that test passed because they the student had handled invalid punctuation or the area code starting with 1. The reimplemented case uses input "523-@:!-7890" so if an exception occurs here, it should be because they handled invalid punctuation. Or it’s failing for a different reason entirely but hopefully other tests would also fail at that point.

BethanyG · October 16, 2024, 6:06am

Hi @foobarbarian

Python track maintainer weighing in here. The TL;DR? This is not a bug or a limitation of the test generation. Specific error message checking was implemented on purpose. There are two reasons for this:

It is considered idiomatic in Python to be specific with error messages. You can see this in the following links:
- Python 3.13 docs on exceptions. Note how there is not an example on the page that does not include a print() with a specific message. I do think they should include the shorter version that uses the message as an argument, but the specific messages are there nonetheless.
- This SO Post which is old, but still true.
- This article from Pybites
Different development teams may have different conventions around this, but including a specific error message that can be searched for in logs or raised from a specific program location is a good starting point.
As @glaxxie mentioned in their response, we tell/teach students that the tests are the spec. Having them match the error message teaches them to read the spec carefully.

We used to have a generic regex in test files that matched anything in the message/argument portion of raised errors. The only thing this accomplished was a majority of students submitting an error message that was “.” or “_” or " ". All of which are very unidiomatic. As an add-on, many students were unable to understand what that regex was even checking for in the first place. None of it helped them learn how to raise and handle exceptions.

There are other things like exception groups, exception notes, annotations, custom exceptions, re-raising, and raising from that will need to be covered as specific exceptions and exception-handling concept/concept exercises. But those are quite far down on the list of track improvements at the moment.

…

A word on how you used zip(strict=True). As stated in the docs for zip, the error it tosses comes after one of the iterators has been exhausted. That means that if one iterable is 1000 items long and the other is 1002 items long, the code will have to iterate 1000 times before an “unequal length” error is thrown.

This is a bit unnecessary if what you are intending to do is “bail out” if, for example, the strands in Hamming are not of equal length. The intent is to avoid work if the initial conditions aren’t met. So even if zip() is used to iterate over the strands, a len() check and a raised error as a guard is quicker/clearer.

foobarbarian · October 16, 2024, 8:23am

I understand your points.
Obviously, teaching how to write proper error messages is a good idea. I was just hoping that this could be done in a “learning exercise” which specifically focuses on this topic.

Your point about performance, I can only half agree with. Performance doesn’t seem to be an important part of the exercises, as I have never seen it mentioned in any description (in any track) or tested in a test. Only the “deep dive” sections go into that. Many “community solutions” reflect that, in how they implement very inefficient algorithms.

The solution using strict=True is also not equivalent to the solution using an explicit length check beforehand. zip works with any kind of iterable, even those which don’t have a __len__ method. While the tests in question pass in a list, sometimes it is desirable to write code which can handle more generic use cases.

Anyway, now we’re arguing about one specific solution, which is not very relevant in my opinion. My point was about the general strictness of the tests, regarding error messages, which force solutions into a certain shape and remove some flexibility. Please see my PowerShell snippet for an example which doesn’t have any performance implications and is in my opinion strictly better than other solutions with a custom error message.