Building the Clojure representer: Coming in for a landing

This has been an extremely wild ride the past several weeks, it has been incredibly fun and this process has caused me to level up my skills considerably!

This PR outlines my current progress, which is the result of my long journey of struggling in public while building this. I’ve already received a lot of positive feedback from folks telling me they’ve found this to be a fascinating, deceptively hard problem!

Support external libs by BTowersCoding · Pull Request #27 · exercism/clojure-representer (github.com)

1 Like

I spent so much time working with Clojure’s AST library while doing this that I ended up catching a somewhat embarrassing typo in a function docstring…

Now, as a track maintainer, I accept PRs for typo fixes regularly, so I don’t really think much about it… it’s a “wink-wink, don’t mention it” kind of thing, so I figured I was just doing my duty as a conscientious user.

But much to my [very pleasant] surprise, I was showered with gratitude, mentioned prominently in the release notes, and made an official documentation contributor for the project :rofl: :heart_eyes:

docs: thank @BTowersCoding for the typo fix! · clj-commons/rewrite-clj@d7be92e (github.com)

I remember feeling a little envious of @ErikSchierboom when I noticed he contributed to xUnit:

I thought, “Gosh, I wonder how dirty one’s hands must get working on Exercism tooling to end up contributing upstream” … I guess now I know

5 Likes

But much to my [very pleasant] surprise, I was showered with gratitude, mentioned prominently in the release notes, and made an official documentation contributor for the project

And you should! All improvements are valid contributions, typo fixes included!

I remember feeling a little envious of @ErikSchierboom when I noticed he contributed to xUnit:

Well, to be fair, xUnit has a really nice codebase that made it easy to contribute to.

2 Likes

The representer “dogfooding” itself:

(do
 (do
  (clojure.core/in-ns 'main2)
  ((fn*
    PLACEHOLDER-0
    ([]
     (do
      (clojure.lang.Var/pushThreadBindings
       #:clojure.lang.Compiler{LOADER
                               (.getClassLoader
                                (.getClass PLACEHOLDER-0))})
      (try
       (do
        (clojure.core/refer 'clojure.core)
        (clojure.core/require
         '[clojure-representer.analyzer.jvm :as ana.jvm]
         '[clojure-representer.analyzer.passes.jvm.emit-form :as e]
         '[clojure-representer.analyzer.passes.uniquify
           :refer
           [mappings placeholder]]
         '[clojure.java.io :as io]
         '[clojure.string :as str]
         '[rewrite-clj.zip :as z]
         '[clojure.data.json :as json]
         '[clojure.pprint :as pp]))
       (finally (clojure.lang.Var/popThreadBindings)))))))
  (if
   (.equals 'main2 'clojure.core)
   nil
   (do
    (clojure.lang.LockingTransaction/runInTransaction
     (fn*
      ([]
       (clojure.core/commute
        @#'clojure.core/*loaded-libs*
        clojure.core/conj
        'main2))))
    nil)))
 (def
  normalize
  (fn*
   ([PLACEHOLDER-1]
    (str
     (e/emit-hygienic-form
      (ana.jvm/analyze+eval
       (z/sexpr (z/up (z/of-file (str PLACEHOLDER-1))))))))))
 (def
  represent
  (fn*
   ([PLACEHOLDER-2]
    (let*
     [PLACEHOLDER-3
      PLACEHOLDER-2
      PLACEHOLDER-4
      (if
       (clojure.core/seq? PLACEHOLDER-3)
       (if
        (clojure.core/next PLACEHOLDER-3)
        (clojure.lang.PersistentArrayMap/createAsIfByAssoc
         (clojure.core/to-array PLACEHOLDER-3))
        (if
         (clojure.core/seq PLACEHOLDER-3)
         (clojure.core/first PLACEHOLDER-3)
         clojure.lang.PersistentArrayMap/EMPTY))
       PLACEHOLDER-3)
      PLACEHOLDER-5
      (clojure.lang.RT/get PLACEHOLDER-4 :slug)
      PLACEHOLDER-6
      (clojure.lang.RT/get PLACEHOLDER-4 :out-dir)
      PLACEHOLDER-7
      (clojure.lang.RT/get PLACEHOLDER-4 :in-dir)]
     (let*
      [PLACEHOLDER-8
       (str (str/replace PLACEHOLDER-5 "-" "_") ".clj")
       PLACEHOLDER-9
       (reset! placeholder 0)
       PLACEHOLDER-10
       (reset! mappings {})
       PLACEHOLDER-11
       (z/sexpr
        (z/of-string
         (normalize (io/file PLACEHOLDER-7 PLACEHOLDER-8))))]
      (do
       (spit
        (str (io/file PLACEHOLDER-6 "mapping.json"))
        (json/write-str
         (into
          {}
          (map
           (fn*
            ([PLACEHOLDER-12]
             (let*
              [PLACEHOLDER-13
               PLACEHOLDER-12
               PLACEHOLDER-14
               (clojure.lang.RT/nth PLACEHOLDER-13 0 nil)
               PLACEHOLDER-15
               (clojure.lang.RT/nth PLACEHOLDER-13 1 nil)]
              [PLACEHOLDER-15 PLACEHOLDER-14])))
           (deref mappings)))))
       (spit
        (str (io/file PLACEHOLDER-6 "representation.txt"))
        (let*
         [PLACEHOLDER-16 (new java.io.StringWriter)]
         (do
          (clojure.core/push-thread-bindings
           (clojure.core/hash-map #'clojure.core/*out* PLACEHOLDER-16))
          (try
           (do
            (pp/pprint PLACEHOLDER-11)
            (clojure.core/str PLACEHOLDER-16))
           (finally (clojure.core/pop-thread-bindings))))))))))))
 (def
  -main
  (fn*
   ([PLACEHOLDER-17 PLACEHOLDER-18 PLACEHOLDER-19]
    (represent
     {:slug PLACEHOLDER-17,
      :out-dir PLACEHOLDER-19,
      :in-dir PLACEHOLDER-18})))))
2 Likes

Great work :) Please share a link to the videos of the “doing this in public” here and we’ll share them on the socials :)

1 Like

The bulk of it was in this 6-hour livestream (I added chapter timestamps to help make sense of it all):

The irony is that the approach that ended up winning out was the very first idea I had, which was to utilize the macroexpansion capability of the official Clojure analyzer.

Ha, I just noticed I misspelled “maintaining” in the thumbnail :grimacing: I’m probably going to have to fix that -done

I did some more work on it in the following stream but by then I figured the struggle was becoming tedious and it would be better to work on it off-stream and come back when I had something more in place…

1 month later, I am reminded of a classic Jewish joke about a 13 year-old boy on the day of his Bar Mitzvah, who gets up in front of the congregation after the ceremony and says “Today I am a fountain pen”.

Except in my case it’s “This month I am a representer”.

Yeah, you’ve heard of Programming Dad Jokes, but this one I heard from my grandfather. I didn’t get it so he had to spell it out for me: apparently the boy was supposed to say “Today I am a man”, but instead says “fountain pen” because that was apparently the stereotypical gift given to a Bar Mitzvah boy… yeah that’s classic Jewish humor from my grandpappy, RIP.

1 Like

Thinking out loud here, because I just had an interesting idea for another way to approach writing the representer that might be vastly more powerful than the current way. Perhaps through writing this or receiving comments from any of you, it can be determined whether it could work.

The problem is that although my current macroexpansion based normalization strategy seems to work, it still leaves way too much variation between normalized solutions to be very useful for separating them into approaches, which is indeed the entire point of a representer. For example, when I excitedly ran it against 500 solutions for armstrong-numbers, it still resulted in something like 470 unique representations! :unamused:

The only obvious way to improve upon this implementation would be to perform additional massaging of the resulting data, such as collapsing redundant do blocks and such. But this feels too much like painstakingly “chipping away” at the code for only minor incremental improvements. So I’ve been brainstorming whether there could be a better way.

Ever since we first started talking about these darn things called representers, a little voice in the back of my head has been whispering,

“Logic programming”.

That sounds kind of cool I guess, but what exactly made me think of that? For one thing, I’ve seen a couple of presentations about logic programming (I think by Wiiliam Byrd, Nada Amin, and Bodhil Stokke) where they performed absolutely wonderous things including writing code backwards, i.e. giving the compiler a set of inputs and desired return value and watching the program quite magically return the function. :exploding_head:

I also happen to know of a library called kibit. From the readme:

kibit is a static code analyzer . . . [which] uses core.logic to search for patterns of code that could be rewritten with a more idiomatic function or macro.

This sounds like it could actually be quite useful just on its own, as part of the analyzer perhaps. But I’m thinking much more could probably be learned by studying it to see exactly how core.logic is used to this effect.

But that’s not even the idea that I want to try first… this one was inspired by re-find by Michiel Borkent (the legendary borkdude), which is powered by clojure.spec. It allows you to find functions via reverse-lookup, by providing the desired arguments and return spec.

clojure.spec is a data specification library that serves as Clojure’s answer to not having type annotations, and is powered by generative testing using test.check (which is a port of QuickCheck for Haskell).

I’m thinking that spec could be used to determine whether a given function in a solution conforms to that of one in another solution.

But how do we do that without specifying the first one?

There’s another library (enough libraries for you?) called spec-provider that does just that, it infers Clojure specs from sample data. Inspired by F#'s type providers, it has experimental support for inferring the spec of functions.

IDK, I might be a bit too optimistic about the possibility of this actually working. But at least I’ll surely learn something while trying…

1 Like

When comparing the representations of 500 solutions for armstrong-numbers, I noticed that many of them would likely be identical were it not for the fact that the placeholder names were in a different order.

So in this stream I created a sort-placeholders function which normalizes the order of the variable placeholders:

Improving the Exercism Clojure Representer - Twitch

YouTube:

I updated the expected representations for the CI test suite and created a PR: Normalize placeholder names by bobbicodes · Pull Request #33 · exercism/clojure-representer · GitHub

1 Like

Hey… this thing actually works! (I’m kind of surprised :smiley:)

I was somewhat disappointed with it because when normalizing 500 solutions to armstrong-numbers, 476 of them are still unique :exploding_head:.

However, I realized that it’s not completely useless, because although most submitted solutions are functionally unique, many of them are not, and can be positively identified as common approaches which can be compared and documented, and have feedback provided for them. When used at scale, this will help many students well into the foreseeable future.

Finding common approaches with the Exercism Clojure Representer (github.com)

I wrote a function that takes all of the submissions for an exercise and sorts them by how many times each representation occurs, to find the most commonly used approaches.

It then builds a hashmap including each solution number, its source code, the representation, and how many solutions share its representation.

Then we can see all the different ways a solution can be written and still be recognized as following a common approach!

This is the first time during the whole process of building the representer that I’ve actually gotten to have some fun with it, and appreciate the value it provides :sunglasses:. In particular, it unlocks the value that resides in the rich dataset we’ve collected from all the exercise submissions.

I think it’s about time I write a blog post

2 Likes

I am going to launch the Clojure Representer live on stream:

Will it work? :grimacing: :sweat_smile:

2 Likes

I must still be missing something, because my recent submissions (made from an account I just made) are not showing up in the automation tab.

@porkostomus There is a very simple explanation for that: we only show representations when there are at least 2 instances of that representation (otherwise there is not much value in adding a representation comment). This also helps with performance.

2 Likes

I’ve got a loose draft for an article here:

BTowersCoding/representer-blog-post: WIP article demonstrating the Exercism Clojure Representer (github.com)

As Jeremy has said, the two-fer exercise is fascinating because it is so simple yet a staggering number of unique solutions exist, and in Clojure this is no exception. So I thought I’d use it as a starting point to introduce the representer by showing it in action.

Feedback would be much appreciated, as it is just beginning to take shape. I plan to turn this into an ongoing series, and the blog will be somewhat interactive, containing live code snippets that the reader can play around with. This is where it will be published: https://porkostomus.gitlab.io/

Update: I published the article here: Bobbi Towers: Introducing the Exercism Clojure Representer

The representer is live :sunglasses:

I now see the first 3 approaches available for feedback:

image

Thanks @ErikSchierboom for helping get it over the line!

1 Like

My pleasure! Very cool to see this live.

Pleased to see my blog post on the representer in this week’s official Clojure newsletter: Clojure - Clojure Deref (Dec 22, 2022)

This is the second time this has happened - the first was for the release of the v3 track.

1 Like

Congrats! That’s a really awesome achievement.

Brilliant. Awesome work!