Extracting concepts from AST

homersimpsons · August 14, 2023, 12:25pm

Context

This Topic is about the following video https://youtu.be/pRJ6H967sd4

I just wanted to make some feedbacks on it and a YouTube comment was not sufficient.

Analyzer vs Representer

Should the concepts be extracted by the analyzer or the representer

Analyzer: Makes sense as it analyzes

Representer: Should only represent to find similar solutions

Note: Maybe we should merge both the analyzer and the representer into one

In my opinion, a representer is rather straightforward to do if there is a library that can gets you the Abstraxt Syntax Tree (AST) of the code, hence adding the smartness to the representer may not be that easy. In the other hand the analyzer should already be analyzing the code and have more context on it which may help on implementing this concept extraction.

A more general approach

It looks like the ultimate goal is to be able to share a normalized representation between 2 or more languages. In fact doing so would help to share feedbacks accross the whole site. While the transliteration to a singular language is not possible (I guess) maybe we could use tools such as Tree-sitter in order to get a normalized AST across all languages.

Tree-sitter already supports a lot of languages, but for example it does not support Ballerina so this won’t directly work for all languages.

Disclaimer: I contributed to GitHub - exercism/php-representer

Meatball · August 14, 2023, 12:56pm

Maybe we should merge both the analyzer and the representer into one

I think there could be benefits to this kind of approach since it could lead to resource savings in the way that there is only a need for 1 tooling repo (so only 1 docker image), but also computing resources could also be saved, since when waiting for a representation you would likely also want do to an analyzation. I think the problem is that it would mean a huge infrastructure change since most tracks have already implemented tooling in their way.

It looks like the ultimate goal is to be able to share a normalized representation between 2 or more languages. In fact doing so would help to share feedbacks accross the whole site. While the transliteration to a singular language is not possible (I guess) maybe we could use tools such as Tree-sitter in order to get a normalized AST across all languages.

I think this goal would be very, very hard. My main concern is that tree sitter doesn’t produce ast for code that does the same thing across languages. It is a great tool for tracks which doesn’t have any “language” based libraries, but comparing those produced ast, do I belive is too hard.