Exercises(pangram): Improve testing for making behaviour compliant

MarcTCruz · February 21, 2024, 2:22pm

This is about: exercises(pangram): Improve testing for making behaviour compliant by MarcTCruz · Pull Request #384 · exercism/zig · GitHub

Currently, test cases do not prove a correct for behaviour implementation.

E.g. given a-y or b-z, if an implementation considers those missing letters as existing, a program can perfectly pass try testing.expect(pangram.isPangram(“abcdefghijklmnopqrstuvwxyz”));

but not pass try testing.expect(pangram.isPangram(“abcdefghijklmnopqrstuvwxy”));

It is reproducible using a number as 26 bits to represent all alpha, where on the end it checks if the number is zero.
Or writing a function that checks abcdefghijklmnopqrstuvwx, for example, returning true if those are in.

This commit adds a new test that address the problem mentioned and removes two tests that becomes irrelevant.

More tests can be removed with this addition, probably only the empty check is needed, but I am leaving it.

IsaacG · February 21, 2024, 3:27pm

Can you provide an example solution/algorithm that this addresses?
You should read this prior to making exercise change suggestions: Suggesting Exercise Improvements | Exercism's Docs
Tests aren’t meant to prove solutions are correct. They are meant to guide users to writing code that’s largely correct, i.e. meets specific requirements.

MarcTCruz · February 26, 2024, 12:19pm

Hello!

Consider below solution, if alphabetLength is changed from 26 to 24 it still pass current tests.

pub fn isPangram(letters: []const u8) bool {
    const alphabetLength = 26; //z - a + 1
    var twentyFiveFullBitsValue: u32 = (1 << alphabetLength) - 1;

    for (letters) |letter| {
        const loweredCase = if (letter < 'a') letter + ('a' - 'A') else letter;
        if (loweredCase > 'z' or loweredCase < 'a') continue;

        const bitPositionToZero: u5 = @intCast(loweredCase - 'a');
        twentyFiveFullBitsValue &= ~((@as(u32, 1) << bitPositionToZero));
    }

    return twentyFiveFullBitsValue == 0;
}

I understand there is some bureaucracy, I saw it right after commiting, then I thought, one correct part is better than none. I part from the premise that each one should improve what each one can, also at least I tried.
As it is troublesome to change tests in Exercism today, I can provide a randomized test of this one for avoiding hardcoded solutions, but I want a previous confirmation of acceptance.
For Exercism tests may not be, but tests is general are to prove something works and the good test is a proof for correctness. We are dealing with programming and a main problem of programming is not just to program, lots of people can do that, but to program well. Attaining the required specification is the minimum.

If a function is to test all alphabet letters exist and an implementation just needs [A-x] to say to say it is true, it is not even implementing the requirement.

IsaacG · February 26, 2024, 5:14pm

I think this edge case you’re pointing out is (more generalized) an implementation logic error where the code is keeping track if some letters are used and silently ignores others. This could also manifest if a user were to track letters in a set and missed some out: unused = set("abcdefgxyz"). The only way to ensure the code didn’t skip a letter would be to have 26 tests, each missing exactly one letter. The specific implementation you highlight tracks the letters in a bitfield so it’s easier to miss letters at the start or end but it’s a similar class of bug.

Would it make sense to add 26 tests, each one testing for exactly one missing letter?

This is getting into semantics but, no. “Proving” code is “correct” a technical term. See Correctness. Proofs can demonstrate that code will always do the right thing. They show a positive result that this code is always right. Unit tests are negative tests; they show that, for a given input, the result is correct. Unit tests do not capture all cases; they tend to cover a sampling. The idea is that, if these examples are correct, the code is probably correct for all cases. “Probably correct” and “proven correct” are not the same.

Even with all the additional unit tests, it’s possible that the code may include a snippet such as if "always_return_true" is in input: return True. It’s unlikely a unit test would capture that edge case but a proof much capture it.

Unit tests should provide a high degree of confidence is the code being correct. Unit tests do not prove correctness.

MarcTCruz · February 27, 2024, 12:51am

“Would it make sense to add 26 tests, each one testing for exactly one missing letter?”
That is almost what my test does, it test one small and caps missing on each iteration.

“Unit tests should provide a high degree of confidence is the code being correct. Unit tests do not prove correctness.” That sentence seems different then what is happening currently.

For me, the “proof for correctness” is about validating desired behavior. Currently, tests do not check all missing letters so they don’t test the requirement. As I said implicitly, the tests I provided are not random, so your mention of something being false or true on the right time might apply, that is why I say I can randomize it if Exercism team accept the addition and wants the randomization, because if it is to be changed and changes seems hard due to impact, better now than never.

Even with all the additional unit tests, it’s possible that the code may include a snippet such as if "always_return_true" is in input: return True. It’s unlikely a unit test would capture that edge case but a proof much capture it.

I agree with you. I add that a proof of correctness does not need to check if user will hack the system or insert a glitch unless it’s a requirement, thus an expected behavior, also being correct is defined within its context, a+a can give a result and be correct in a system and another in an alternative one, perfection is another thing.

Checking if implementation pass with one missing letter at a time seems good.

IsaacG · February 27, 2024, 1:22am

Formal/technical definitions of “proof” and “correctness”, the Exercism tests in general do not aim to cover every scenario nor to catch every possible implementation bug. The purpose of the tests are to guide people to write code which likely implements the exercise requirements. It is near impossible to write a set of unit tests which would catch every possible implementation bug or prevent students from incorrect implementations, so the tests don’t even try to. This is called out explicitly in the docs over here: Suggesting Exercise Improvements | Exercism's Docs (Avoid trending towards entropy). As such, I’m leaning towards not adding a whole lot of unit tests to this exercise.

MarcTCruz · February 27, 2024, 1:35am

I’m leaning towards not adding

Fine

a whole lot of unit tests to this exercise.

It is one, some others could be removed.