Testing file input/output syntax

axtens · November 28, 2024, 7:26am

Having seen a recent posting about I/O reminded me of something that’s been discussed from time to time and I was wondering if the status quo is still status-ing.

One of COBOL’s strengths is its file I/O. As far as I can remember of all the other languages that I’ve dabbled with here on Exercism, none have explicitly tested file i/o knowledge. Is that deliberate or did I miss something or things? Is it out of scope – getting “out of lane” – to write tests for code that does SORT and MERGE?

IsaacG · November 28, 2024, 7:32am

awk, bash, jq make use of STDIN/STDOUT in tests as a way to send data to a script and get output from that script. However, most languages have you write functions, and IO is generally not the most structures approach to getting data into and out of functions.

This also allows many test runners to show both a function’s return value and any STDOUT generated by the function as separate values – the test result and additional debugging info.

That said, I don’t think there’s any reason you can’t use IO. However, sorting and merging can be handled in most languages by passing in two lists/arrays of data.

keiraville · November 28, 2024, 7:48am

There is the grep exercise. implemented by awk, bash, cairo, csharp, elixir, fsharp, go, java, javascript, jq, lua, phix, powershell, python, racket, roc, ruby, rust, tcl, vbnet, wren.

For SORT and MERGE, one option would be to add a COBOL-specific concept exercise. The other option would be to propose a new practice exercise, suitable across tracks. Being comfortable with file IO is part of becoming fluent.

BethanyG · November 28, 2024, 4:49pm

We’ve considered file reading/writing for the Python track, and do have a concept exercise where values are imported from another file for manipulation and testing (Cater-Waiter), although we’re not doing any sort of whole-file comparison there.

But file I/O exercises are way down the list for us. Less of a philosophical disagreement than a “put-whats-on-fire-out” sorta situation. We still only have 17 concept exercises available, and most of our practice exercises are lacking approaches documentation. We are still in the process of upgrading our language versions, and there is a lot of work to be done with the tooling.

We do have context handlers, nested context handlers, and customized context handlers on our syllabus to-do list.

In Python, you most often open and manipulate files with a context handler. So eventually we’ll get to file I/O - at least in a constrained fashion (where file opening and creation is in the same directory as the exercise code). I haven’t quite thought through if that means we have an empty file for output or somehow make the exercise directory writable or …??

But there is one thing we probably won’t take a run at: navigating the file system. Since everything is in a Docker container and isolated, we can’t really mimic finding things on file systems in general. Not in a realistic way. And it would be quite a bit of work (and alteration of the Exercism ecosystem I suspect) to do any mimc of hitting an API and getting data back to use for an exercise.

One thing I have seen done on other platforms is to have a dedicated container that serves as an API point for retrieving info/files. Dunno what that looks like in terms of Exercism. Smells like it would be too many extra containers. It would also need to be accessible for students using the CLI, which might be a no-op.

BethanyG · November 28, 2024, 4:53pm

For Grep in Python, we do a fakeout, and put all of the text in the test file for that one.

Somewhat lame, but it gets the job done - at least for that exercise.

IsaacG · November 28, 2024, 5:03pm

It does do the trick! Though it means pathlib approaches fail
Would you be open to a PR that extends the mocking to cover pathlib.Path.read_text()?

glaxxie · November 28, 2024, 5:31pm

I copied that style for the powershell as well, thank you python track . I thought I had to went the extra files route at first but then realize the test framework allow mock open content.

I made an amend to the readme, basically telling people to grab the content normally with a simplified path and just focus on the logic of the exercise itself.

BethanyG · November 28, 2024, 6:28pm

I would! But it also has a jinja template, which is not very pretty, so the usual regeneration and retesting rules apply.

I would also be tempted to move the “test file” to an actual file and then mock various paths. I would have to play with it to know if that would fail existing solutions. I a pretty sure it would.

But maybe we do add pathlib.Path.read_text() as an option for now, and kick the can a bit.

IsaacG · November 28, 2024, 6:40pm

We could just use tempfile to make a temporary directory and dump the files in there. Then os.chdir(tmpdir). Everything else ought to just work.

IsaacG · November 28, 2024, 6:51pm

I made this change locally to the test file and it seems to work. I can definitely apply it to the JinJa template and run the tests.

-def open_mock(fname, *args, **kwargs):
-    try:
-        return io.StringIO(FILE_TEXT[fname])
-    except KeyError:
-        raise RuntimeError(
-            "Expected one of {0!r}: got {1!r}".format(list(FILE_TEXT.keys()), fname)
-        )
+class GrepTest(unittest.TestCase):

+    def setUp(self):
+        self.tmpdir = tempfile.TemporaryDirectory()
+        tmpdir_name = self.tmpdir.name
+        for name, content in FILE_TEXT.items():
+            (pathlib.Path(tmpdir_name) / name).write_text(content)
+        os.chdir(tmpdir_name)
+
+    def tearDown(self):
+        self.tmpdir.cleanup()

-@mock.patch("grep.open", name="open", side_effect=open_mock, create=True)
-@mock.patch("io.StringIO", name="StringIO", wraps=io.StringIO)
-class GrepTest(unittest.TestCase):
     # Test grepping a single file
-    def test_one_file_one_match_no_flags(self, mock_file, mock_open):
+    def test_one_file_one_match_no_flags(self):
         self.assertMultiLineEqual(
             grep("Agamemnon", "", ["iliad.txt"]), "Of Atreus, Agamemnon, King of men.\n"
         )

BethanyG · November 28, 2024, 7:40pm

Fo the JinJa template, you’ll need to run

bin/generate_tests.py grep

To regenerate the test file. The PR should include both the template and the newly generated file, as well as any changes (if needed) to the example.py file.

We’d have to do that in the context of the exercise directory copied to the test-runner Docker container, and then figure out how we’d change it for download via CLI (sub-directory of the exercise directories??). So I need to think on that one for a bit. Hence the “kick the can” comment.

Edited to add – aaand we’re now threadjacking. If you still have admin access, we should take this Python-specific discussion off to a new thread.

iHiD · November 29, 2024, 3:18am

Yes, probably.

File reading/writing falls into the world of proficiency, not fluency in most cases. There will be languages where IO is absolute core and essential to a language, or where it is bizarre to not use it to do normal things. But for most languages it is one of the ways that language interacts with the outside world, and therefore it falls out of the scope of learning the idioms of a language (the same as network calls, or frameworks, or other such things)

themetar · November 29, 2024, 9:36am

I have a slightly related comment, re: i/o and grep. Or maybe very related.

In Ruby, the tests expect a solution in a grep method which has pattern, flags, files arguments (search string, array of flags, array of filenames). The student has to open a file but otherwise is just writing a function.

JavaScript track on the other hand, invokes the solution as a script; effectively calling node with the .js file and arguments.

I think I prefer the latter approach. The student has to work with script arguments: see how the language accesses them, separate flags from file names, etc. So it’s an opportunity to get acquainted with something new. But, is JS the outlier here? Or Ruby? If I hadn’t seen the exercises I would’ve expected Ruby to be the more CLI-leaning one.

kotp · November 30, 2024, 9:11pm

Ruby is a language intended partly as a replacement for Perl, and so it can be used as a glue language, the history of Ruby and JavaScript are different, and so it is not surprising that the feelings are different.

Ruby could have presented this exercise so much differently than it does. As you note, we could have taken a much more pipe driven approach with Ruby.