The canonical data for Hamming, Grains, and Protein Translation all include error handling to various levels. Does the test runner have a way of testing whether an error occurred? Then, you could run a test case with an expected error message and see if the method causes an error with the same message. If you can’t compare the messages, you could just check if an error occurred for those tests expecting one. If an error occurred for tests not expecting one, they’d error out normally so I don’t think you need to assert that those won’t raise an error.
In a pinch, the tests expecting errors in Grains and Protein Translation can be excluded since they’re for input validation. For Grains, they make sure you can’t pass in a number too high or too low. For Protein Translation, they make sure incomplete / gibberish sequences can’t be translated, and the readme makes it clear the tests are skippable. Leap on the other hand doesn’t have any tests expecting an error so that’d be an easy one to add.
I made draft PRs for leap, pangram, resistor-color, resistor-color-duo, and triangle since those are fairly straightforward and don’t have any error handling. Once the test runner is working nicely, we should be able to use that Docker image in the CI to test the exercises as they’re being merged in. That’ll speed up the exercise porting.
Thanks for the PRs, I’ll review them soon ;) As for error handling in Godot, today I’ve added a new example to the test runner that shows a function validating its input and returning an error in some cases. Unfortunately, Error in Godot is just an enum, so effectively my test checks if the function returns a specific integer (in this case, 31). I’ll see if it makes sense to change the format of test runner files to check if the return value is an actual error, or if we can keep it as it is.
I’ve added more tests, including some that are checking for “invalid” scenarios (e.g. the method is called with a wrong number of arguments, or the method throws an exception when it’s executed). Because of GDScript’s limited error handling mechanisms, I ended up wrapping the actual execution of user defined method in several levels of functions/assertions, and if something goes wrong, that execution returns “null”. So, I came up with the following rule: if a tested method returns “null”, it means that something went wrong and the test runner shows a generic message (without any details).
Is this OK? There is no “try … catch …” in GDScript, so this seems like the best thing I can achieve for our purpose. It is possible to filter out the error output from Godot’s execution, and maybe put this output somewhere in the test runner’s output files. However, that would require mixing up the GDScript code that saves the “results.json” files with the bash code that executes our test runner script, which might get dirty … Also, my solution doesn’t make sense if there are any exercises in Exercism that expect a method to return “null” in valid scenarios (are there any?). So, I was wondering if this part of the test runner meets the standards?
Nice work!
As far as returning “null”, I can’t really recall any exercises that would require it.
There are a lot of returning empty array, 0, “” or False though.
If this is the acceptable way for the test runner, and somehow you got an exercise or function that want to return null, I think it would be appropriate to tweak the test itself a bit.
So, we can agree that all exercises added to this track will return a non-null value? Any “falsy” values are completely OK with the current implementation, but I’ll add a test just to be sure In this case, if the test runner encounters a “null” it will assume that something went wrong and will print a message.
I have only 2 questions at this point:
Is it OK for now to print a generic message: “Execution of the method ‘…’ has failed. Please make sure that it accepts a correct number of arguments and has a correct implementation.” ? To print more, I’ll have to filter the error output in bash and put it somehow into result.json files (as mentioned above). That’s the only option I see of providing more information about failed tests. It is complicated, but I also think it would be helpful for our users to know more about what went wrong … On the other hand, we should start this track ASAP, since Unity users are moving to Godot now …
It is possible for GDScript methods to return an Error, but Error is just an enum, so e.g. returning ERR_INVALID_PARAMETER is the same as returning an integer 31, which might be a correct value for some executions. We cannot return “null” (because of the point above), we might return “false”, but that will not work for methods that return a boolean (where “false” might be a correct value). This seems like a rather dirty workaround anyway, so I’d prefer to avoid it. I know there are some exercises that involve raising an error, but could we agree not to add them initially, and instead add them later when I figure out a proper solution?
I think if the test runner can point out which test fail, and a generic statement like : expecting “this predefined result” but got “user’s result”, that should be plenty good because that’s how a lot of the track operate.
The current implementation does that, but for “failed” tests only. What I have in mind are tests that end with an error (e.g. someone tries to divide by 0 inside the tested method). In that case, Godot will log some details to stderr, but there is no try … catch … in GDScript, so we will end up with a “null” as a returned value. Because of that, the runner will put a generic “something went wrong” message for each test that causes execution error.
I am currently checking if maybe there is a relatively easy way to put the full value of stderr after each test into the corresponding message in results.json file. If I find a way, I’ll add it to the initial version of the test runner, as I believe providing more details will be very useful for GDScript newbies ;)
Wow, that would be much more detail. You did mention that
It is possible for GDScript methods to return an Error, but Error is just an enum, so e.g. returning ERR_INVALID_PARAMETER is the same as returning an integer 31
Is this Enum of Errors a fixed built in or predefined, or you can change its value? Because if it can be change to custom values, we could come up of a list of error code for them and act accordingly to which one being returned.
So, I am thinking of removing the test that checks for Error being returned (as it is the same as checking for a specific integer is returned), and possibly choosing exercises that don’t involve error handling. Or, as an alternative, making a decision about error handlin case by case (e.g. if a method returns a number in normal conditions, then an incorrect input parameter will cause it to return a string with an error message instead).
On the bright side, I think I’ve figured out how to include the full error output from Godot Engine in our results.json files. That will provide a lot of useful details to our users :) The code is already pushed, and I think this (and error handling) are the last items on my TODO list for the script runner’s core funcionality. So, I will now proceed with refactoring
You don’t have to implement all the tests for an exercise. Just mark the test(s) you’re skipping in the tests.toml file to document that for future maintainers. There’s a test for I think gigasecond where the test runner checks if the date object being passed in is mutated. For languages with immutable date objects, the test is redundant and can be skipped
Oh, that sounds like a good idea! I will check the configuration to ignore tests in other tracks and will try to add something similar for GDScript. Thanks!
So, I realized that skipping tests works for exercises, not for the test runner itself (unless I’ve missed something). Considering that, I decided to temporarily remove the test for “returning an error on purpose” (in my case, it was grains and it was returning an error if the input value was out of range). With GDScript’s current architecture, I’m afraid that error handling will have to be done case-by-case when adding exercises (e.g. functions that return an integer can return a string message if they want to raise an error). Or, we can skip such exercises altogether, considering that there is enough exercises without error handling to get the track up and running. Please let me know what you think about this.
I’ve also added a test for returning a correct value, but incorrect type (e.g. returning an integer “3” instead of string “3”). That test works fine, except the error message is kinda useless (Expected '3' but got '3'). Did we have similar issues before? Is there a standard way of fixing them, like wrapping the values in something other than quotes, or explicitly stating the type of the value?
I am now working on refactoring the test runner and adding some docs (so far so good). i think soon we will be ready to merge :)
The error handling business sure is tricky. From my own observation and my work on the powershell track (that I often took inspiration from python, javascript and C# tracks), almost every exercise can have error cases simply just from wrong input but most of the time there was no tests for them.
And I understand why, because quite frankly there is no end to it if we have to go around and focus on every single edge cases of error, so most of the error that was raised often tied to the actual logic of the exercise itself.
For example, go back to the grains exercise, there is really no need to test for error if user input is not an int, or too many int, because it is not really relevant to the test. However the logic of the exercise would break if the number is out of range, like -1 or 65.
Maybe for the time being, we can avoid them like you said.
About the error message, I understand the need to wrap value inside apostrophe, however imo you can just do 3 to signify an int and ‘3’ to signify a string, and 3.0 for float for example. If you want to be more explicit per harp you could wrap them inside a typeof() func or getype() method?
I was thinking about adding the type explicitly, but that would probably clutter the output … Using a quote for strings and lack of quote for ints makes sense, but I was worried about other types (Arrays, Dictionaries etc). However, on second thought, I think that our exercises won’t use that many types, and the main ones should have no problem with just strings being wrapped in quotes. I’ll try to implement this.
Other news: I just finished refactoring the test runner! The PR is still in draft, because I’d like to add 3 more tests:
string vs int output (as discussed above)
method that returns null (test runner interprets this as an execution error, but our users might accidentally create such methods, and we need to be prepared)
test for multiple failures (error messages are stored in stderr file and then read in Godot, I need to make sure that we don’t read output from previous tests).
That being said, the code is ready for review already! ;) Please check it and let me know of any issues/typos/possible improvements. I’ve also added a short description of the test file format (since we are using a custom unit test solution here), please check it as well if possible
Hi! Quick update: I realized that this “return null if something went wrong” thing might actually cost us some valuable error messages, so now I’m trying a bit of a different approach to the test runner. The change will involve the test file format, but a lot of the test runner itself will stay the same. I would like to know, however, if we’re OK with test names like test_add_2_numbers in our final results.json file? (current implementation allows us to put any name we want, e.g. “Add 2 Numbers”, but the new implementation would probably use names of test methods)
Since Exercism encourages students to look at the test files to affirm/understand the problem requirements, I would say returning the name of the test methods in results.json would be helpful /important for their understanding.
We happen to remove the underscores for test names in Python for results.json, but otherwise stick to the formatting of the test method names - and I think it would be perfectly fine if the names included underscores.
Thank you @BethanyG ! :) I’ve started the refactoring process. In the end, I will squash all the commits, but for now there is a separate commit just for refactoring. Could you please take a look and check if the output format is satisfactory?
I’ve only changed 3 tests by far, but that should give a good idea of how the new tests will look like (both the test file format and the output JSON). There is also a weird trick for reading the error output from stderr file in that commit. I might find a better way, but if not, I’m afraid we will have to keep it. It should not really break anything, it’s just that Godot doesn’t have good error handling mechanisms, so I have to improvise ;)
Alright, I’ve added all the missing tests, refactored the test runner, and improved error message creation ;) To sum up the recent changes:
test runner will now automatically run all methods in the test file that start with test_ (so, similar to e.g. pytest). Those methods will be called with the solution script as the only argument. This allows us to test any method from the solution (also variables, although there are no tests for them yet) and perform any setup steps (if needed). But, most importantly, this means that the method will be called “normally” and will log any error output, which we can later retrieve and show to our users. This will give them much more detail about what went wrong with their code!
formatting of error messages has been adjusted a bit and now String values are wrapped in single quotes, but other values are not. This allows us to print a relevant error message if the expected and actual values look the same, but have different types (e.g. if our user’s method returns a String “3”, but we expect an int 3)
I still have one TODO left, but I’ve updated the docs already. Please let me know if the runner in its current form is OK, and if the error messages are good enough (e.g. they have a lot of details like the name of the test script, I don’t know if that’s the standard way of printing errors in Exercism).
I will fix the TODO soon, but I will be a bit busy for the next 2 weeks, preparing for GodotCon 2023 (as a speaker). I think I will still have some time to implement the fixes you will suggest, as I want this track to start soon! ;)
I’ve fixed the last TODO and removed “draft” status from the PR. @ErikSchierboom already approved it, but that was before some major changes (still in draft), so I would be grateful for another review ;) If there is no feedback, let’s merge it! :)