[C++ / Pangram] passes without isalpha and tolower checks

This code passes all tests although it is not correct:

const int letterMask = 67108863; // 2^26 -1

bool is_pangram(std::string input)
{
	int charBits = 0;

	for (char& c : input)
	{
		charBits |= 1 << (c - 'a');
	}

	return letterMask == (charBits & letterMask);
}

Not using std::tolower actually works (with most compilers) as uppercase characters lead to negatiive numbers which shift from the left and end up at the same place as the lowercase charactes (although it is undefined behaviour).

Leaving out std::isalpha (or a similar range check) does not work correctly though as numbers and special characters end up occupying the same bits as the characters.

This problem is not currently covered by any of the tests.
I wrote a test that catches this issue

TEST_CASE("missing_letters_masked_with_numbers")
{
    REQUIRE(!pangram::is_pangram("abcdefghijklmno0123456789z"));
}

The test fails as 0-9 set the same bits as the letter p-y.

1 Like

I do not think that this is “undefined behaviour”. If you ensure your int charBits is 32 bit wide, then you actually use the ASCII code design that uppercase letters are 32 characters away from their lowercase counterparts. This was done to switch casing by toggling one bit only and so is not “undefined behaviour”.

Regarding the missing test case: We have a common repository of test cases called problem specifications. Would you prepare a Pull Request to add that test there? Link to this forum thread in the PR description, so everybody knows where to find details about the test case.

I’m not sure this test case makes sense to add to the problem set. Is this something that would occur across languages? Should it be tagged with a scenario if it applies to only some languages?

In the first sentence of Pangrams Deep Dive chapter bitwise operations are mentioned. So, yes, there are plenty of possible languages and some approaches using bit shifting.

We don’t have a scenario for such. The test does not test for anything special like unicode or big-integers.

The deep dive has plenty of language specific approaches. I don’t recall seeing any solutions on the Python, awk or bash track that would have this issue. This is why there is usually a discussion and input from maintainers prior to giving the go ahead to make changes to the problem specs.

What do other maintainers think about this?

Also of relevance, we explicitly aren’t trying to cover every possible edge case and approach. Suggesting Exercise Improvements | Exercism's Docs

The discussion for this language should happen first.

Then a discussion rather than a pull request, could happen for Problem Specifications, after which the work to get the change could happen (or not).

Since the goal is not to cover all potential solutions (nor problems) of an exercise, it may be that this “missing test” can remain missing and we keep it as a discussion point when mentoring when it comes up and the student is exploring this style of approach in any language.

1 Like

I don’t think we should reason about code that invokes undefined behavior (UB). With UB anything can happen (e.g. the often quoted “nasal demons” or “pregnant cats”).

I would be in favor of an additional test if that would catch an “obvious” issue, or issues that can be found in multiple accepted solutions.
IMHO this is not an “obvious” issue. Do you happen to know if multiple invalid solutions like the one from your snippet can be found in the community solutions?

1 Like

Can any of the long term maintainers remember the reasoning behind this test case: problem-specifications/exercises/pangram/canonical-data.json at 00fe66c4d57e38f9c88e45831ba1208f95299bd3 · exercism/problem-specifications · GitHub ? To me it looks like some arbitrary leet-code style letter-to-number mapping.

How about replacing that test with the one that actually covers some different possible failure class (as explained in the first post)? We have more “ignore non-alphabet character” tests that are not related to ASCII design issues.

I can see why you would not want cover every possible edgecase.
But I think that missing the isalpha check is something the test should cover.

I have not found any other solutions yet that have this problem.
This might be because most people think about sanatizing the input first.

Most solutions (even in other languages) follow the same structure though:

  1. Check if it is a character
  2. Convert to lower
  3. Add to bitfield or list
  4. Check if bitfield or list covers all characters.

You can just leave out 1. and still pass all the tests.

I think the same problem might occur in other languages depending if its possible to use negative indices (either for bit shifting or for array access)
Python would be a good example when using the negative index to access a list from the end.

Afaik it is because bitshifting by a negative value is not specified. It seems most compilers implement it by shifting right though.
I left it in my solution anyway as I think its a neat little trick :)

I thought about replacing this test. But I read in the guidelines that existing tests shall not be modified (which makes sense).
Thats why I propesed a new one.

Yes, I can create a PR. Should we further discuss it here first or should I just create a PR so we can discuss it on the PR?

Exercism policy is to discuss proposed changes to the problem spec and get maintainer approval prior to opening a PR. Please do follow our policy.