Unicode testing for anagram doesnt actually test for grapheme

The current tests doesnt actually test that the application can work with grapheme clusters. Current tests only tests if the system can handle unicode characters not that a character made of multiple unicode points actually gets treated as a single character.

Given the following condition would we expect that it would be true, since there are no letter which are similiar between them:

 Anagram.find("üy", ["uÿ"]).should eq([] of String)

But since if we actually split these 2 strings into all the unicode points we will be left with: ["u", "y", "<two dots>"], so the 2 arrays will actually be the same.

The current test cases doesnt test for a scenario like this.

1 Like

Excellent point! I think a new test case would make sense.

I noticed this when writing up the implementation for Microblog on the JS track, that’s why I also wrote a bunch of approaches to explain that some solutions will fail in some cases.

I think the point for such a test is excellently made, but in the case of Microblog for example, it’ll also invalidate almost all current solutions since almost all of them don’t take grapheme clusters into account.

Well, it’ll only invalidate them if tracks choose to implement it. That’s why we have the tests.toml file where tracks can opt-out from indidivual test cases.

1 Like

On the Rust track, some test cases are made optional by hiding them behind feature flags. The test runner ignores those, but people can run them locally. Optional bonus challenges are implemented this way, including handling grapheme clusters. Adding such tests would be backwards-compatible. Maybe a similar approach is possible in other languages.

1 Like

Do you have an example of such exercise at hand? I’d investigate.

sure: reverse-string