[Proposal] Adding Tag Detection to Analyzers

Hi everyone. This post outlines our proposal for adding functionality to analyzers that detects what someone is actually doing. There are three main product outcomes we want with this (at least at first - maybe more later!):

  • Solution gets associated with Approaches, so we can help a student understand how they’ve solved it recommend alternative approaches
  • Solution gets associated with Concepts, so we can check/demonstrate whether (and where) students are using Concepts
  • Allow students to sort through Community Solutions using a set of filters.

The plan is as follows:

  • The analyzer creates a set of tags of different things that people are doing
  • Different config files match those things to Concepts, Approaches, search-filters, and other things.

We don’t want to have to rerun analyzers every time there is a new Concept or Approach added, so we think it’s better to get analyzers to output as much upfront as possible and the do the mapping later (much more computationally and financially cheap!).

We’ve decided that we think it’s best to split tags into four different categories. These are largely for namespacing purposes, but also we think it’ll just make it a little easier to work with. It also will potentially allow us to do interesting things comparing across tracks. The four categories will be fixed but that tracks can put any values into each category. We will, however, put together a canonical starting list of values for the first three categories, which can be consistent across tracks to allow us to do interesting things. We have also decided to use a "{category}:{thing}" pattern, rather than using JSON objects for this.

A sample set of tags that the analyzer returns might be:

"tags": [
  "paradigm:functional",  // Probably always common across tracks
  "technique:recursion",  // Generally common across tracks
  "construct:if",         // Often common across tracks
  "invokes:add_seconds"   // Totally track specific
]

I’ll now explain those different categories. All the lists in each category below are non-exhasutive, they’re just examples.

Paradigms

These are the highest level, programming paradigms.

  • functional
  • oop
  • imperative
  • logic
  • prototypical
  • reflective

Technique

This was the hardest to name, but these demonstrate different techniques or concepts people use in their code:

  • recursion
  • enumeration
  • algorithm-dijkstra
  • metaprogramming
  • higher-order-functions

construct

These are the lower level things that people use in their code. Probably the building blocks on which you determine more complex techniques, and later Concepts or Approaches.

  • if-statement
  • for-loop
  • foreach
  • switch-statement
  • structs

Invoked

The final list is going to be totally track specific, and is a list of different things that invoked by a solution. These could be methods, functions, classes, keywords, operators, whatever. We chose “invoked” as it’s generic - we are aware that it’s not the “correct” terminology for many languages, but feel it’s clear enough.

  • add_seconds
  • Time.new
  • Math#random

These will be useful specifically for approaches. For example, you might want to highlight that for Grains someone has used the ** operator by having a tag of invoked:exponent. For Leap you might want to check if someone has used the Time#leap? method


As a result of this we have a list of tags that we can then use to map onto Concepts and Approaches. We’re proposing adding a tags key to both approaches and concepts, which determine when a solution is marked with that approach or concept.

We have three keys: all (all tags must be present), any (any tag must be present), not (this tag must not be present). In terms of presedence, the order is: not, all, any. If something in not is present, it fails to match. If something is expected in all but isn’t present, it fails to match. At least one thing from any must match. You can put the same tags in both any and all. There must be one item in either all or any.

Approaches

For Approaches, we’ll put this in the approach’s config.json (exercises/practice/bob/.approaches/config.json).

{
  "approaches": [
    {
      "uuid": "c0bab2cf-3304-480c-a454-f8dfd274883e",
      "slug": "if",
      "title": "If",
      "blurb": "Use if statements to return the answer.",
      "authors": [ "bobahop" ],
      "tags": { 
        "all": ["construct:if"], // Could be in any or all, but default to all
        "any": []
        "not": ["construct:switch"] // If a solution contains both `switch` and `if`, the *not* takes precedence.
       }
    }
  ]
}

You can see an example of which tags you might choose for which approach here (!! this is using an old syntax !!): Add concepts to approaches by ErikSchierboom · Pull Request #2166 · exercism/csharp · GitHub

Concepts

For concepts, we propose adding this to the track’s main config.json We also considered creating a new .meta/config.json per concept, but felt that was overkill.

  "concepts": [
    {
      "uuid": "...",
      "slug": "enumerables",
      "name": "Enumerables"
      "tags": { 
        "any": ["construct:foreach"], 
        "all": [...], 
        "not": [...]
      }
    }
  ]

Community Solutions

On the community solutions, we show a dropdown containing tags that a user can filter. This is powered by a new tag_filters key in the exercise’s config.json file (exercises/practice/bob/.meta/config.json). This maps display names to an array of tags (using any logic). For example:

{
  "authors": [...],
  "contributors": [...],
  "files": {...},
  "blurb": "...",
  "source": "...",
  "source_url": "...",
  "tag_filters": {"If statements": ["construct-if"]}
}

(Erik is still thinking on this bit, feedback is particularly welcome here)


To recap:

  1. The analyzer outputs construct:foreach and construct:if for a solution to Bob.
  2. From the track’s config.json we determine that we should therefore assign the enumerables concept.
  3. From the approaches config.json, we determine that we should assign the if Approach.
  4. The exercise’s config.json has a key in the filters block for if statements and shows “If statements” on the filter dropdown on the community solutions page. This solution appears in the results.

The work for maintainers is:

  • Adding tags to the analyzer (the more comprehensive the better)
  • Adding tags to trigger approaches
  • Adding tags to trigger concepts

Nice work if you got to the end of this post. Thoughts/feedback welcome! :)

Did I understand correctly that only the 4th category, “invoked”, can contain track-specific values that aren’t part of the global spec?

No, all four can. We’ll have cross-track suggestions for the first three (as it’ll allow us to do more interesting cross-thing things) but we won’t make suggestions for the fourth. I’ll update the OP to clarify. Thanks.

I am thinking we may allow for hardcoded tags for exercises, for multi-paradigm languages, at least I try to have exercises using both different kinds of data types and paradigms for different exercises. So say I implement a ruby exercise that has in the test file: Class.new, I will be certain that the student will have to use oop, so I think it would make sense to allow that tags to be hard-coded into exercises. I think an analyzer will be required for most of the “tag” finding, but just think this would be more “accurate”. And also allow for tracks that do not have an analyzer to be able to have some basic functionality when it comes to tags.

If you’re analzyer is checking to see if a class is used to determine whether it adds the tag, having this extra step strikes me as just extra work. I imagine that for Ruby we’ll just have a class visitor method that adds “invokes:Class.new” to the tags. That said, if you do want to do it that way, you can add arbitary data (such as tags) to the exercise’s .meta/config.json and merge them in at runtime. Some tracks already use this approach for other similar things.

That said, I don’t really see much practical value to adding the same tag to every solution for an exercise. I can’t see where it will help achieve any of the stated product goals as every exercise will get the same tag(s)?

I mean I have no clue how the final product will look or how it will work.

I was thinking it poped up some message like: Good Job! you used x concept when you solved this exercise.

I am thinking mostly for tracks that doesn’t have analyzers so could they manually add concepts so such a message pops up, but I am not sure if that is the intention or not.

Yeah, I see that. The aim here is to distinguish someone doing something intentionally though I think, so I feel less like giving everyone the same message is probably not something this feature aims to achieve.