"Log levels" string format does not capture differences between languages when creating substrings

NachoGoro · July 25, 2024, 11:15am

Thinking of the C++ track at the moment. The Log Levels exercise has students using substr. The strings are in the form "[LEVEL] MESSAGE"

In C++, substr takes the starting position and the length of the substring, while in other languages (e.g. Python) it takes the starting and ending position.

To retrieve LEVEL, since in this exercise the substring starts in position 1, the tests will pass for students which erroneously try to use substr as if it were Python, because:

const size_t startPos = 1;
const size_t endPos = line.find("]") - 1;
const size_t length = endPos - startPos + 1; // Ends up being endPos because startPos == 1!

I have mentored a few students who have their tests passing while using the substr method as if it were Python, because it works by pure chance.

If we changed the format of the string so startPos is not (always?) 1, we would make the difference between C++'s and e.g. Python’s signature more obvious.

iHiD · July 25, 2024, 12:28pm

Hey! Could you give an example of what sample strings would look like please? Thanks

IsaacG · July 25, 2024, 12:46pm

Would that completely break all existing solutions?

NachoGoro · July 25, 2024, 12:56pm

@iHiD To minimize breaking solutions, it could be something like adding blank spaces at the beginning, or dashes. These are a couple of alternatives which would still pass for solutions not assuming an initial [ and using length as the second parameter:

"- [ERROR]: A message"
"=>[INFO]: Another message"

The main thing is that the substring does not start at index 1.

@IsaacG Depending on how the change is implemented, it could break all solutions or only those which assume the string starts with a [ without checking, or those which use “end of string” as the second argument to substr instead of “length” (which is the intended target).

NachoGoro · July 25, 2024, 2:11pm

@IsaacG has made me aware of the following doc in another thread: Suggesting Exercise Improvements | Exercism's Docs

I think this suggestion might have been a good one while designing the track, but with thousands of solutions already I see that changing the spec is not the way to go.

I am withdrawing the suggestion then :) Thank you both for indulging me!

iHiD · July 25, 2024, 3:10pm

One thing that would be nice here, would be to use the automated feedback to automatically give your suggestions after someone solves the exercise. Do you meet the supermentor threshold to do this?

NachoGoro · July 25, 2024, 3:21pm

Ah, I wasn’t aware of that feature! Nope, I just started mentoring a couple of weeks ago, not sure when “supermentor” is reached!

kotp · July 25, 2024, 3:50pm

If you are mentoring this exercise, there is value about talking about this, even if the automation can not yet be done. It will get you closer to being able to do so at some point as well, and by that time more experience in how to express it will also have been attained!

I am glad to see thoughtful mentors talking about these things in the exercises.

When I mentor this exercise (not on the C++ track) I talk about capturing exactly what we are looking for, so leading characters do not matter, trailing characters do not matter, we are looking for what is in the square brackets (if any) and what follows. No square brackets, then by definition nothing follows.

I think that you are saying that they are making assumptions that the first position is a specification, by coincidence, and so putting too much value in thinking that since they have seen it always be that way, they can assume it will always be that way.

I attend to that line of thought as well, by not paying attention to any position in the solutions that I am guiding for in the exercise.

NachoGoro · July 25, 2024, 10:02pm

Hi @ktop! I am not sure I got what you meant in your last two paragraphs, sorry! Just to clarify, my issue is that I see code like this in passing submissions:

return line.substr(line.find("[") + 1, line.find("]") - 1);

From that I take that the person assumes that the second parameter to substr is the position final character, while it should be the length of the substring. But because the first character of the substring is always in position 1 (right after the leading “[”), it works for them despite the misuse.

I think it would have been good for the log strings to have a different format which would cause the tests to fail if someone attempted to use the substr function incorrectly. But I see that ship sailed a long time ago! I have added a paragraph to my Scratchpad about this issue and I paste it whenever I detect it in a solution, so it is not a big issue

kotp · July 25, 2024, 10:25pm

Ah, I see what you are saying now, not that the first character in the line is a [ but that the last character of the substring that is wanted for that may not be the first ] character that is found.

So that “[[level]] some message” would erroneously end up being [level when it should be [level] as it is the thing inside of the brackets when the brackets are nested.

This will happen partially because our tests are not exhaustive for edge cases. And as you have done, making a note about it to let the student consider making it a bit more robust is a good thing.

IsaacG · July 25, 2024, 11:44pm

That sounds like the bracket matching exercise Very much not something we’d want to add to the tests here.

mk-mxp · August 3, 2024, 5:42am

@kotp The point @NachoGoro tries to make is, that the hardcoded 1 in line.find("]") - 1 is an assumption of the result of line.find("[") + 1, because in C++ (and other languages like PHP) this second parameter needs to be the length of the substring: line.find("]") - (line.find("[") + 1). Only for strings starting with [ this assumption holds true. That’s why shifting the pattern away from position 0 would make these assumptions obvious.

PS: I hope I have no off-by-one error in that code.

IsaacG · August 3, 2024, 6:20am

It would be a bit odd if that assumption didn’t hold. One of the nice things about log formats and parsing logs is that the format tends to be pretty reliable

mk-mxp · August 3, 2024, 7:53am

@IsaacG Which is, why @NachoGoro suggested to adjust the format of the log so one can distinguish between “assumption of starting at position 1 to calculate length” and “calculating the end position index backward from ]”.

kotp · August 3, 2024, 7:36pm

I understood that. I am not indicating to not do it, but also that a suggestion to not focus on the index, (such as when extracting information using a regular expression) when mentoring can help the student in avoiding assumptions based on the examples, and problem solving in general.

Since the tests are definitely focused on that “[” but also the “inside ‘[’” when there are more than one. No test examples show this, though, so it is currently something that needs to be intuited.

I am curious, though, why you are interpreting intentions by the original poster. Are you interpreting for them with consent? (I am not seeing “I think someone else is trying to say” language here, and so am wondering if you are interpreting because of some language difficulties in translation.)

mk-mxp · August 4, 2024, 4:29am

@kotp I don’t see me interpreting, but repeating intentions. @NachoGoro clearly states in their first post, that they argue all about having tests to distinguish between “substring end position” and “substring length” as intended arguments to the substring function. And to make such tests, they later suggest moving the string to extract away from the start. And you stated to see their intentions and started to argue about unmatched ] as their intention - which I can’t see in all of @NachoGoro’s writing.

Anyways, yes, I’m no native speaker and so nuances in language may be missing (both in my writing and in my understanding).

kotp · August 4, 2024, 6:32am

I understood and still understand what they were saying, and I was not arguing, but stating that if the focus changes from the index to the pattern that perhaps the index does not need to be the focus.

I thought you were talking with them and they were misunderstanding what I was saying and you were acting as an interpreter.

So I misunderstood your intention. No need to repeat what they have already stated, or rephrase it. I did not misunderstand. And I surely was not arguing. If I presented it in such a way, then I apologize for that.