More unicode tests in problem-specifications

On the Rust track, there are several exercises with unicode related test cases. These are mostly custom test cases not from problem-specifications. I thought it had to be this way, because many languages make it hard to handle unicode well (out of the box).

But now I see that problem-specifications has a unicode scenario (only used by parallel-letter-frequency for now), meaning such test cases could easily be excluded by test generators of languages where unicode is tedious to deal with. So I think it would be a good idea to upstream these test cases with the scenario.

One disadvantage that comes to mind is that any language track that incorporates these test cases will slightly increase the difficulty of the exercise and risk invalidating many community solutions. It may be considered a breaking change.

Here’s the list of exercises on the Rust track where we have unicode tests that I consider suitable to be upstreamed. (There are others where I don’t quite see the added value, e.g. a test that unicode characters are simply ignored in scrabble-score.)

  • anagram
  • grep
  • rail-fence-cipher
  • reverse-string

What do you think, should I work on a couple PRs?

I’m +1 on this. I see no disadvantage to having tests upstreamed really. Tracks should just bare in mind the “risks” of adding them to their track as you mention. Thanks!

1 Like

Same here. If you’ll add them with the unicode scenario, we should be good.

Hi! I got a bit of feedback on the related PR for the exercise Anagram. I’ve noticed that the newly added Unicode tests contradict the task statement now, namely:

The target and candidates are words of one or more ASCII alphabetic characters (A-Z and a-z).

Shall the file instructions.md be updated to talk about Unicode? It, however, potentially opens the doors to all sort of things like what is a letter, case folding, etc.

(I haven’t checked the other mentioned exercises. They might be subject to a similar issue too.)