Forth test suite may not be complete

Hello,

I did the Forth exercise in JavaScript homersimpsons's solution for Forth in JavaScript on Exercism I wanted it to be a correct Forth implementation so I checked my implementation for some details against Online Forth Compiler.

After the implementation I checked the community solutions, I found that most of them were incorrect:

Solution Incorrect Forth input Expected output Current output Comment
glennj : toto b ; Unknown command [] Use of unknown symbol b
dav0p : toto 1 ; : b toto ; : toto 2 ; b [1] [] Re-define symbol should not override existing symbol
anurat : toto 2 ; toto [2] Unknown command Simple define and use of a value
cjvolkert : toto 2 ; toto [2] [] Simple define and use of a value
cjvolkert 1 : toto 1 ; toto + [2] [] “Complex program” that add 1 to the stack, creates toto, then add toto to the stack

On top of my head I recall the following missing correctness:

  • It is possible to have multiple instructions on 1 line
  • It is possible to re-define operators, the following program will have the stack [0]:
    : + - ; 1 1 +
    

I do not know how complex this exercise should be, it is flagged as “hard”. I think having “more correct” implementations could be interesting for the students.

In fact such implementations forces to store procedures inside “words” and add some constraints on the parsing code.

Note: I didn’t tag the solutions authors to avoid spam, but feel free to tag them if you think it is relevant.

Suggesting Exercise Improvements | Exercism's Docs is probably worth reading.

1 Like

To defend my own solution, there are no requirements that state a word definition must consist of currently defined words. I applaud you for going above and beyond the tests.

1 Like

Yes, I read it, and the point of this post is to get feedback to either improve the test suite or keep it like this.

The main issue I see with the current test suite is that it does not enforce the basic of the language regarding “words” Forth (programming language) - Wikipedia :

where the user interacts via subroutines called words

And I feel like this is an important concept for the students.

But updating the test suite to enforce this will most likely break almost all existing solutions.

To be more specific I would add the following test cases (maybe not all, with different wordings):

  • : newword undefinedword ;: The language is supposed to compile “subroutines” when defining a new word, here it does not know the subroutine so it should fail.
  • : firstword 1 ; : secondword firstword ; : secondword 2 ; firstword : The language is supposed to compile “subroutines” when defining a new word, hence once a subroutine is compiled it should not be altered by another word override.
  • : myword 2 ; myword: A simple test case to check that we can define and execute a word
  • : + - ; 1 1 +: A simple test case to check that pre-defined words can be overriden
  • : dup dup dup ; 1 dup: A test case to show that we can re-define a word with himself

This is not an attack against published solutions, they pass the test cases which is the only thing that count.
The goal was to showcase that the test suite seems to be lacking some important concepts for the Forth language that should be relevant to learn the track (javascript in this case) language.

I’m not a Forth expert, I discovered this langague with this exercise.

You both contributed a lot to exercism, thank you!

I like this.

Doesn’t this existing test case cover that? javascript/exercises/practice/forth/forth.spec.js at 76988d603acacfe3fb9ae17d042bc513b37e4e91 · exercism/javascript · GitHub

Doesn’t this existing test case cover that? javascript/exercises/practice/forth/forth.spec.js at 76988d603acacfe3fb9ae17d042bc513b37e4e91 · exercism/javascript · GitHub

Doesn’t this existing test case cover that? javascript/exercises/practice/forth/forth.spec.js at 76988d603acacfe3fb9ae17d042bc513b37e4e91 · exercism/javascript · GitHub

Wouldn’t this result in infinite recursion?

I don’t think changing the exercise is worth the cost. The problem spec is used widely and has many solutions. Having the spec more inline with actual Forth syntax IMO doesn’t add much value to the exercise.

Oh, I should have checked the test suite more in depth.

The test case is slightly different:

  • : foo 5 ; : bar foo ; : foo 6 ; bar foo (existing test case)
  • : foo 5 ; : bar foo ; : bar 6 ; bar foo (proposed test case)

But I just re-tested this against dav0p and it works in bost case, the issue seem to be with 1-letter words, in fact : f 1 ; f leads to Unknown command.

This test case does not seem relevant then.

Technically speaking yes, the issue here is the fact that those solutions are expecting to evaluate either a word definition or a the result of an instruction.

  • : countup 1 2 3 ; then countup works
  • : countup 1 2 3 ; countup does not work

So basically a new test case with word declaration and evaluation in the same evaluate call should do the trick. I do not know if this is expected though because this is more about parsing issues.

Technically no, you can test it on the online compiler: : dup dup dup ; 1 dup . . . will output 1 1 1

What happens is that the “new” dup word will be compiled from the existing dup word.

Going further the following program will output 9 1s:
: dup dup dup ; : dup dup dup ; : dup dup dup ; 1 dup . . . . . . . . .
In fact the first dup, will duplicate twice, the second dup will duplicate twice-twice (4th), the third dup will duplicate twice-twice-twice (8th).

Conclusion

So 3 new test case may interesting:

  1. : foo undefinedword ;
  2. : dup dup dup ; then 1 dup: forces “subroutine compilation”
  3. : foo 1 ; foo: stricter parsing (optional)

Maybe only the 2 first suggested new test cases could be relevant then.

  1. I’ve mentored solutions that put the implementation for the “builtin” words in the same data structure as the user-defined words. This will break existing solutions.

    Using 8th, this segfaults (but in an Apple-specific GUI way: “Do you want to report this?” No thanks):

    echo '
    : dup dup dup ;
    1 dup .s
    ' | 8th
    

    If I specify that the definition uses dup from the global namespace, it works showing 1 1 1 on the stack

    echo '
    : dup G:dup G:dup ;
    1 dup .s
    ' | 8th
    

    This test would have to mandate that the user’s words live in a separate namespace from the builtin words. I don’t think the (potentially) extra complexity makes a good addition to the test suite.

  2. Given that the current tests have each user-defined word as a separate line/array element in the input, the stricter parsing is bound to break lots of existing solutions.

I’m not a Forth expert either, but disallowing undefined words feels quite wrong to me, they should only be a problem if still undefined at the time of being executed. A quick search led me to find out that gforth specifically has “defer” as (in my view a rather ugly-kludgy) workaround for such issues, and certainly not suggesting that be added to this exercise. I suspect it is more of an implementation weakness (I saw single-pass compilation being used as a feeble excuse) rather than a language specification, and cannot think of any benefit to deliberately replicating a restriction / prohibition on (say) mutual recursion, no matter how common that one seems to be.

Sorry, when I wrote 8th the goal was more to point that this dup will repeat 8 times.

Not really, in my solution I have a #words property that is populated with system words and that will hold user words too. Then when a word is defined I compile it first then append / overwrite it in the word dictionnary.

I keep reading everywhere that : indicates that the interpreter will run in “compilation” mode, hence it will translate the word definition to actual instructions that will be applied to the stack once called.

Having an “undefined” word prevent such compilation.

In the case of the defer word, my understanding is that gforth will keep a pointer to a dynamic instruction, this instruction will then be set using the IS word.