Issues with Parallel Letter Frequency in JavaScript

Cool-Katt · March 19, 2024, 11:23am

I think that this exercise is still perfectly valid for the track, as you can definitely execute code concurrently and there is a way to execute code in parallel but the point would be to show that Javascript is not really designed as a parallel language.

The exercise introduction also clearly states that JS is a single-threaded language and there is only one way to achieve actual parallelism. It also acknowledges that a solution that doesn’t implement concurrency can pass the tests. My reasoning behind that was that I don’t want to limit students to a particular way of solving the exercise, as I couldn’t really find a proper way to test if the implementation uses Workers. (Worker threads don’t have to be used is asynchronous functions, so an async test wouldn’t necessarily prove if a student has used Worker threads or not.)

As I said earlier, improvements are welcome, as long as they don’t disrupt current solutions and don’t limit ways to solve the exercise.

Cool-Katt · March 19, 2024, 11:31am

Also, for the sake of completeness, if the idea is that you need to await the Promise.all() in a synchronous function, you can do this

(async () => {
    await Promise.all(formatedTexts.map((t) => 
        processSingleText(t, result)))
})()

Not that it helps with the tests. I agree with the point that the current tests do not really apply to implementations using an async function. Maybe we can focus on how to make them better?

@iHiD that part about benchmarking against a synchronous solution seems interesting. Mind sharing an example?

themetar · March 19, 2024, 11:58am

Well, to put it bluntly, you shouldn’t have done that.

Let’s go a level above to the problem spec:

Count the frequency of letters in texts using parallel computation.

The whole reason for this exercise to exist is to do stuff in parallel. That’s what I was trying to express above when Jeremy replied with we should “allow” concurrency. We shouldn’t just allow it. We should expect it in this exercise.

As for JS being inherently single-threaded: With Promises you can fire off requests to separate servers and wait for their responses - parallelism! not necessarily on the same machine. Fibbing on the exact way that the sub-tasks execute is okay. That is not what I and @mk-mxp are arguing about.

Cool-Katt · March 19, 2024, 12:27pm

The goal of the exercise is to show how to execute code in a parallel fashion, on the language level. However, Javascript, as a language, was not designed to work in a parallel fashion, so it lacks the language features that many other languages have in order to handle parallel code execution.

There is, according to the docs and to my knowledge, only one way to execute code in parallel - Worker threads. If we have to be pedantic about things, we should expect a real parallel execution solution to use those and we should test for it.

I wasn’t able to find a way to test for worker threads implementation that would be reliable, so I didn’t do it.

If we point to a regular synchronous solution and say that it doesn’t work since it doesn’t fit the goal, then by extension, a concurrent one also doesn’t fit the goal, since it is not executing code in parallel, no matter how much it may look like it does, so why force students to comply with a requirement, only to tell them that “we know this isn’t parallelism but it’s good enough”?

It is better to teach students why concurrency is not the same as parallelism and why we say that it’s good enough, under the circumstance. Something that i’ve tried to explain as best as I can in the instructions to the exercise.

iHiD · March 19, 2024, 12:47pm

It does. But my point is that the issue is with the tests, not with the example. If the example passes the tests, it’s a valid file. Obviously, discussing the example is a good way to discuss what the tests should do, but I’d like the focus of the discussion not to be on whether the example is good/bad - but rather how the tests should be changed.

Here’s my take. If someone can write some tests that ensure only parallel (or concurrent) solution’s can pass, then we should probably make that change.

@mk-mxp’s set of tests allow concurrent/parallel code to be written, but I’m not sure they force people in that direction. Can anyone come up with some tests that do force concurrency or parallelism?

(Here’s the Ruby Bencmarking comparison test I mentioned - not that it’ll necessarily help, but for completeness).

mk-mxp · March 19, 2024, 1:44pm

I’d prefer sticking with “students can solve it the way it is meant to be solved, mentors teach how to do it properly”. Or, as you said here: “we don’t invest in anti-cheating mechansims”.

Cool-Katt · March 19, 2024, 2:00pm

jest does have mechanisms to test asynchronous code, in particular something like this (from the docs):

test('the data is peanut butter', async () => {
  await expect(fetchData()).resolves.toBe('peanut butter');
});

We could refactor the tests to expect asynchronous functions, but then we are going to invalidate all solutions that use a regular for loop, i.e. regular synchronous solutions. I’m not sure if that’s the desired outcome but I suppose it’s closer to it than what we have now.

I don’t think I follow. Care to elaborate?
If we’re just going to agree to disagree and not limit the solutions in any way, then what was the point of this discussion?

I’m also not sure if there is an easy way to tackle the bigger problem here, which I think should be ‘How do we test if a solution is really using Worker threads to execute code in parallel?’

themetar · March 19, 2024, 2:12pm

one proposal was:

mk-mxp · March 19, 2024, 2:13pm

See Allow synchronous and concurrent and parallel solutions to Parallel Letter Frequency by mk-mxp · Pull Request #2418 · exercism/javascript · GitHub, as posted above. No existing code is invalidated.

mk-mxp · March 19, 2024, 2:18pm

I just ran a benchmarking test as the one used in Ruby, but as I expected: The synchronous solution (~170msec) is >10x faster than Workers (~2300msec), and about 2x faster than Promises (400msec).

Edit: 20 Arrays of 100,000 characters each.

themetar · March 19, 2024, 2:21pm

Try with fewer but longer strings?

Cool-Katt · March 19, 2024, 2:24pm

That’s modern JS for you, optimized AF.

Maybe if the test suite was really large, like one billion characters, or one million arrays, it may make a difference?

Also, I don’t think I explained well my point about invalidating solutions. What I meant was that if we change the tests to expect an asynchronous function, you can no longer pass with a synchronous one, you’d have to always wrap your output in a promise, even if you’re executing code synchronously inside.
Maybe it wouldn’t matter if we tell student to just use an async function, but that’d be limiting them, in a sense

Cool-Katt · March 19, 2024, 2:32pm

If your machine allows for it, it may be faster with a large dataset and many workers, because I suspect there wouldn’t be a measurable difference with one worker since you’re just delegating all the work to another thread, that essentially executes the counting synchronously.

Technically, one worker thread and one main thread is still parallel, so it satisfies the goal, but it’s the same in term of actual work being done.

mk-mxp · March 19, 2024, 2:45pm

The discussion is about:

the current tests narrow students possibilites down to only synchronous solutions
the example solution is a synchronous solution that works by accident
the instructions, that made @ErikSchierboom accept the exercise at all, is elaborating on something, that cannot be done with existing tests
it is easy to widen the spectrum of possible solutions
it is easy to provide a concurrent example that works by design
it is possible to fulfill the promises made by the exercise title and the instructions

If we just agree to disagree leaves the exercise in an artificially narrowed state and no concurrent or parallel solution will ever be submitted. But, I believe we don’t have to enforce concurrent or parallel solutions. This is, where I don’t follow @iHiD.

What I showed in my merge request is, that your synchronous solution passes the asynchronous tests with no changes at all.

iHiD · March 19, 2024, 2:48pm

OK, so just to summarise. I feel like our general positions are:

@mk-mxp is arguing for having the exercise solvable both sequentially and using async.
@themetar is arguing for having the exercise only solvable using async (Based on “Well, to put it bluntly, you shouldn’t have done that.” to allowing a solution that doesn’t implement concurrency to pass the tests).
@Cool-Katt is happy for @mk-mxp’s proposal as long as existing solutions still work.

Am I clear on everyone’s positions before we continue discussing further? (If you could this if so, or clarify in a post if not, that would be ideal )

Cool-Katt · March 19, 2024, 3:48pm

I would be happy with @mk-mxp’s proposed solution, if we can all agree it makes sense.

or

I’d also be happy to change the tests to expect async solutions only, if we can agree that it’s better to force students into using promises, instead of allowing them to cheat with synchronous solutions.

In either case, I’d also really love to be able to test for use of Worker threads if this is possible at all, but I couldn’t figure it out so idk if it is.

SleeplessByte · March 19, 2024, 5:39pm

When I commented on the PR, I didn’t actually look at the contents (apologies @Cool-Katt, you generally have great PRs so I didn’t think about it, I just wanted to comment on the non-paralelistic nature of JS). My comment there was that concurrency is possible but parallelism is not, without workers, and it should be implemented as such.

Looking at the code now I believe the following changes must be made, which is pretty much in line with what @themetar said (albeit a bit strongly):

Expect something async, either via callbacks or promises; this means that the tests must be promise based. You can check the promises exercise to see how to do that, because the test-runner does support it.
Remove the results object passing. If the functions are pure then the promises “make more sense”, hollistically.

I also saw that @mk-mxp did a PR. I have not explored it yet, but given their contribution in this thread I actually think that it will contain exactly what I said above.

A worker solution using node workers will pass the same tests a promise based solution would, so perhaps proof.ci.js can be that, to show you can technically do worker based solutions. I also recommend we update the documentation of this exercise to hint at workers.

themetar · March 19, 2024, 6:35pm

Thank you.

Cool-Katt · March 20, 2024, 12:53pm

I have updated the tests to expect an async function so we can limit synchronous solutions.

I have also reworked the proof.ci.js to use worker threads as a proof that this can be solved with real parallel execution, and I’ve updated the docs to better reflect the proper way of doing it.

@mk-mxp and @themetar, you’re both listed as contributors for this exercise because you’ve contributed quite a bit to this discussion so credit where credit is due. Thanks for pointing out the flaws, even if we did get into a heated debate about it.

Let me know if there’s anything more to be done - @SleeplessByte and the other maintainers.

ErikSchierboom · March 26, 2024, 7:18am

I think that is indeed the best way to go about this. Thanks!