Can we improve the value people get from community solutions?

TL;DR: from @iHiD

I’ve moved this into it’s own thread as I think it’s an interesting conversation. @tasx suggests that we could try and improve the value a student gets from a solution - maybe be generating explanations of each solution (via ChatGPT or the like) so that student’s can discover new ways of doing things and have those ways explained. The conversation meanders a little as this was under a different conversation, but if you have any opinions on this idea, please add them to this thread :slight_smile:


Back to @tasx

My personal view: Instead of promoting specific solutions based on reputation or manual selection, it would be more valuable to make every solution a starting point for exploration. Even an inefficient or poorly structured solution highlights an approach that can be learned from. Rather than classifying solutions by some arbitrary metric, the focus should be on encouraging discovery and deeper understanding.

What I would like to see is an automated way to analyze every solution, making exploration and learning more accessible. Analyzers, maybe? That sounds like a much better approach to me.

1 Like

We have representers and analyzers that do this largely already. We also did lots of work on generating tags for solutions. But never got it finished (largely a cost-constraint in the end)

However, when you have 10,000s of solutions of an exercise, I’m not sure how this helps highlight the best ones, rather than just grouping them together.

Could you expand on your thinking a little pls?

An analyzer wouldn’t really pick the best one. It would rather analyze the exercise, explain how it works, discuss the pros and cons. Stuff like that. So i was thinking more along the lines of using ChatGPT-like tools.

The goal should be to focus on understanding, rather than simply identifying the best solutions. I’m not even sure if that’s possible, anyway.

Having an explanation of each solution like that is a really nice idea. But I’m not sure how that helps someone work through thousands of approaches. It feels like it’s solving a different problem.

If you mean an explanation of the exercise (rather than the solution), that sounds like you’re suggesting adding Articles/Approaches written by an LLM for each exercise?

If you’re trying to solve the problem of how to feature the best solution, I don’t have much to offer. Mostly because it’s unlikely that any single solution is truly exemplary. In my view, a much better approach than featuring solutions would be to identify noteworthy ones and have someone build an approach around them.

In any case, here’s what I proposed regarding analyzers: Someone solves an exercise and starts browsing community solutions. They come across something they don’t understand or find interesting. They request an automated analysis and get an explanation. They’ve learned something. That’s it. They don’t have to sift through thousands of solutions.

This is maybe something chatgpt might be good at. I just asked it “can you explain this bash code to me” and pasted in one of my solutions, and it did a nice job breaking it down line-by-line. Not perfectly, it missed one of the subtler syntax items, but overall OK.

1 Like

Yeah - I like the idea of that, but for a couple of million solutions, it probably won’t be cheap. Maybe someone could price it up for us?

I think though there’s two seperate discussions still.

  1. Could we improve the way existing solutions are discoverable
  2. Can we improve the amount of value someone gets from a community solution (e.g. via the LLM explanations)

Let’s keep this discussion focussed on the first (with the acknowledgement that some people’s opinion is that that’s not worth doing).

I’d happily see the second topic continue on another discussion!

Are we talking about allowing users to discover solutions based on specific criteria, like a concept? Or is this about featuring specific solutions? These are two different things.

I just asked VSCode copilotchat for an explanation of Go code, and it did pretty darn good:


And the response:

This Go code defines a package triangle that includes a type Kind and a function KindFromSides to determine the type of a triangle based on the lengths of its sides.

Type and Constants

  • Kind: A custom type defined as a string to represent the type of a triangle.
  • Constants:
    • NaT: Represents “not a triangle”.
    • Equ: Represents an equilateral triangle (all sides are equal).
    • Iso: Represents an isosceles triangle (two sides are equal).
    • Sca: Represents a scalene triangle (all sides are different).

Function: KindFromSides

The KindFromSides function takes three float64 parameters a, b, and c, which represent the lengths of the sides of a triangle. It returns a Kind indicating the type of the triangle.

Logic

  1. Invalid Triangle Check:
  • If any side length is less than or equal to 0, it sets the kind to NaT (not a triangle).
  • If the sum of any two sides is less than the third side, it sets the kind to NaT (not a triangle).
  1. Equilateral Triangle Check:
  • If all three sides are equal (a == b && b == c), it sets the kind to Equ (equilateral).
  1. Isosceles Triangle Check:
  • If any two sides are equal (a == b || b == c || c == a), it sets the kind to Iso (isosceles).
  1. Scalene Triangle Check:
  • If all sides are different (a != b && b != c), it sets the kind to Sca (scalene).

Example Usage

package main

import (
"fmt"
"triangle"
)

func main() {
fmt.Println(triangle.KindFromSides(3, 3, 3)) // Output: equilateral
fmt.Println(triangle.KindFromSides(3, 4, 5)) // Output: scalene
fmt.Println(triangle.KindFromSides(3, 3, 4)) // Output: isosceles
fmt.Println(triangle.KindFromSides(1, 2, 3)) // Output: not a triangle
}

This function ensures that the triangle type is correctly identified based on the properties of its sides.

____

Copilot is free for <= 50 chat queries a mo/user (requires a GitHub account).

So one “quick and dirty” change (while we figure out the details on on solutions) could be to make students aware of this on community solutions pages, and link to how to set it up for free in their editor. Supported editors are VSCode, Visual Studio, XCode, and the JetBrains IDEs — although there are other options for access via CLI, command line, etc.

The proposition under discussion is:

Endorsed solutions.

Mentors and maintainers can give endorsements to community solutions so that people browsing can see that “experts” think certain solutions are noteworthy.

The endorsers can optionally give a short blurb about why the solution stands out.

If an endorsed solution gets out of date with new tests, the endorser could receive a notification and can decide to rescind the endorsement.

I don’t think endorsements can be anonymous.

As a mentor I often encourage students to browse the community solutions, even knowing that it can be an unordered sea of daunting code. If students had a way to find the noteworthy solutions, it would be really helpful to help them find the islands in the sea. Also the solution owners get a nice prize, “Wow, Erik likes my solution.”

@ellnix then suggested:


So the discussion here is whether we add some sort of featuring mechanism - either crowd-sourced from high-rep users, or via GH.

If we presume about 500 tokens per solution and 500 tokens out (that’s the response length from Bethany’s post above)

We have 2M solutions, so 1000M tokens IN, and 1000M tokens out.

Pricing is:

  • $0.150 / 1M tokens IN
  • $0.600 / 1M tokens OUT

So:

  • $150 IN
  • $600 OUT
  • $750 all in.

Can someone check that’s in the right ballpark pls?

I might go with a higher averages for input and output. I think there are many languages and exercises that are going to have longer solutions.

Looks like they’re claiming a 50% discount if you use the batch API. Would that be a possibility?

FWIW, my example used the GPT-4o (copilot-refined) model. Copilot doesn’t have GPT-4o mini. So we would want to test with GPT-4o mini.

Pricing for GPT-4o is:

  • $2.50 / 1M tokens IN
  • $10.00 / 1M tokens OUT

So even at a 50% discount we have:

  • $1500 IN
  • $ 5000 OUT
  • $6500 all in.
1 Like

Are we talking about pre-generating explanations for each solution, or would it be better if users could generate explanations for the solutions they find interesting?

I’d definitely probably pre-generate on the top x solutions (where “top” is whatever shows on page 1/2 on each exercise). But we have millions of bot-page views per day, many from badly behaving bots, so I’d imagine they’d get generated pretty quickly, and so I’d probably rather deliberately manage the process of pre-generating all of them, so as not to suddenly fill queues etc.

My normal approach is to generate things on publish, so I’d likely do that here for new solutions.

So here’s my two cents:

About the mentor endorsement, I think this may be a good idea, but the caveat I see is that on Exercism, anyone can become a mentor so anyone can endorse any solution without moderation and on large tracks you often get the “community endorsement” for the quick and dirty solutions, not the best solution. Overall, maybe not a bad idea.

About bot generated explanations, Chat-GPT and the derivatives are good enough to generate a basic explanation, but I don’t think they’re good/consistent enough to be used as a learning resource. Some of them are also outdated (which may or may not be a problem). So the way I see it, this may still require manual verification before posting things, so it’ll still be involved.


Maybe if a student can press a button to request an explanation from the author (and potentially start a mentoring session)? Or maybe a student can get a prompt to try and explain their solution after posting, for extra rep points or whatnot?
(As a side-note, I’ve seen that some solutions on the JS track have a comment posted by the author with an explanation.)

Another parameter worth considering is that when AI provides an automated analysis of a specific solution, it can take away from the learning process. The student is essentially spoon-fed the explanation instead of making an effort to interpret the code on their own. This makes the learning process more passive.

So, simply generating explanations for solutions might not be the best option here.