Improving the TypeScript representer

chezwicker · June 15, 2024, 7:30pm

Hi all,

I’ve started with automation on the TS track and one of the first things I noticed is that the representer doesn’t currently merge certain things I think might be merged.

I’d like to add at least one / maybe a couple of things to the representer:

Make sure single and double quotes can be used exchangeably
Ignore the explicit return type of functions?
Ignore whether the solution declares a function or assigns an arrow function to a const?
Ignore order of declarations?

(1) I believe is a no brainer, but at the same time, I don’t think the solution used in JavaScript is completely valid as I believe it produces broken results for quotes used inside backticks. My suggestion is rather to follow the approach for identifiers and replace quotes by QUOTE_1, QUOTE_2.
(2) should be valid for all the exercises I’ve mentored so far, but I can’t be sure if there isn’t anything where the return type must be provided explicitly. I think with modern IDEs showing return types as inlays, it’s a “religious” question (like tabs vs. spaces) whether or not return types should be made explicit.
(3) I’m not completely convinced here, I’ve told some mentees the difference between functions and arrow functions, but I would be surprised if there was an exercise where this really makes a difference. And it can still be mentioned to those who request mentoring, right?
(4) Declaration order may matter in some cases, but if the exercise’s tests pass, it probably didn’t.

What does everyone think?

I’ve done the implementation for (1) and will add tests if we agree we should add this. I’ll work on (2)-(4) if we agree we should add them as well - some of these might take a bit more time to complete, though.

P.S. can we add a representer tag? I don’t seem to be able to do that - or am I missing something?
P.P.S. I would have wanted to create this in “Exercism”/“Typescript” since that’s more about extending, but that doesn’t (yet) exist. Should I have chosen “Exercism” or “Exercism”/“Building Exercism”?

mk-mxp · June 16, 2024, 8:20am

(1) Quotes inside backticks make no substantial difference between implementations. In fact, if that string matters, some test checks for quotes being single or double quotes. It may only make a difference when using the string in a mentoring answer. If I understand the current JS solution correctly, it will not alter backticked TemplateLiteral, only Literal.
(2) I would not use “modern IDEs” nor “technically required” as a reason here. Return type declaration is part of the contract specification. If you don’t specify return types, you force users to accept any. That’s OK inside a module, but for a public interface I would not accept it. So, for Exercism, it should be at least possible for mentors and automated feedback to see and comment on return type declarations.
I don’t know enough TypeScript for (3) and (4).

chezwicker · June 16, 2024, 11:00am

Thank you for your reply!

I’m not sure what you mean about (1), what I was trying to say is that we can’t turn e.g. `a_${foo("a")}` into `a_${foo(`a`)}` as I believe does the JS-representer, but it should be equivalent to `a_${foo('a')}`. Also, we need to distinguish between "'" and '"' Thus my suggestion of translating to QUOTE_1 and QUOTE_2. I hope that makes sense.

About (2): In TS, imho a method’s signature does not differ whether or not you make it explicit. And changing the signature (whether implicitly or explicitly) means changing API, which makes it a breaking change. As I said, in my mind, it’s a “religious” question, so either we need to be opinionated about it and ask people in the track docs to always specify return types - at least for exported / public functions - and then add according linter rules (and enable the linter in the online editor!) or ignore them in the representer, don’t you think?

Have a great Sunday!

mk-mxp · June 16, 2024, 1:08pm

About (1): If I understand this function in the representer correctly, it doesn’t touch `a_${foo("a")}` at all. Because it works on AST node type Literal and not on node type TemplateLiteral. But it turns '"' into `"` as well as "'" to `'`. Which preserves the required information, I think.

It may produce invalid code, if someone places backticks into normal string literals ("tablename `db_table`"). But the result is not executed, only used as text. So I think that is not a problem.

About (2): When ignored by the representer, solutions are seen equivalent that have or have not declared a return type. And that means, that what comes first hides what comes later in community solutions. So you less likely find different return type specifications when there are options. Which reduces the value of digging into community solutions to find things you didn’t know, yet.

PS: Quoting backticks in markdown is weird.

chezwicker · June 16, 2024, 1:52pm

Hm, the tests in the code seem to indicate that "a" would be turned into `a`, but to be honest I just found the JS representer after writing my own amendment to the TS one and briefly skimmmed the commit, so I may be wrong.

In any case, my intent is that "a is 'b'" and 'a is "b"' would be represented the same way, and I don’t think this would be the case in JS - am I wrong? If I am, we can also go the JS way, I mostly care about getting representations merged, and less about how we do it (even though I think it would be nice to keep approaches as consistent as possible and would thus favoe creating a mapping such as for identifiers).

For (2), maybe there need to be different representations for different use cases? Maybe one should be able to choose which transformations to apply for a given “Feedback” item? That way we could provide input once where a additional transformation doesn’t change what we’re interested in and omit a transformation where the difference does matter.

Cool-Katt · June 17, 2024, 7:16am

Some time ago I wrote the amendment to the JS representer after facing the same issue where single and double quotes would yield different representation. The idea was to just normalize all matched quotes to backticks so that I wouldn’t have to deal with nesting so in essence what @mk-mxp is completely correct (i think). It is definitely not a perfect ‘catch all’ solution, but at the same time I don’t think it introduces any breaking changes so things should still work regardless of the quotes used (and personally i think that backticks > quotes)

About point 2, I think that return type matters, after all it’s in the spec and it’s part of what makes Typescript special and different from just plain old JS + JSDoc combination. We shouldn’t rely on modern IDE creature comforts and implicit return types, we should try to encourage doing it the ‘right way’.

About point 3, I don’t agree at all here. It’s true that in a lot of the exercises on Exercism arrow functions can be used interchangeably where function declarations can be, and vice versa, but they are two different things and are technically different. For example:

arrow functions do not have a context of their own so you can’t use this bindings inside them.
arrow functions cannot use yield so they cannot be turned into generators
arrow functions cannot be used as constructors

For point 4, I have no preference, if it doesn’t break anything I guess it’s mostly fine…

chezwicker · June 17, 2024, 4:15pm

Hi @Cool-Katt,

With regards to (1), I’m not quite sure I’m following, if there are two different kinds of quotes in a string, are you saying replacing them all by backticks is good enough because if the code wasn’t valid, the tests would not have passed? That may be true, but would anything be worse if we applied the same logic as for identifiers?

For (2), I’m open to “believing in” explicit return types, but then we’d have to be consistently asking for them and providing a cording templates - we’re not:

e.g. eliuds-eggs and diamond specify return types (unknown in the template)
e.g. grains and leap do not

For (3), I totally agree, but it seems to me that whoever created the exercises did not: the two seem to be used interchangeably:

e.g. bob and binary-search have functions
e.g. wordy and resistor-color have arrow functions

So, I think we need to either

make our templates consistently use e.g. functions and explicit return types - and add linters to check return types or
merge the representations for the two

Do you agree? How do we move forward?

BNAndras · June 17, 2024, 6:25pm

I’d consider a separate PR to update those arrow functions to regular functions. We should default to whatever is simpler stub-wise so it doesn’t get in the way of the student being able to focus on the exercise they’re solving. If I don’t know an arrow function, opening up resistor-color and seeing one can be pretty discouraging. I’d have to look up this weird syntax to understand what I’m seeing even before I start writing my code.

If we use regular functions, that’s no longer an issue. If a student wants to use an arrow expression, they can replace the stub themselves. Also, arrow functions can be discussed in a follow-up mentoring discussion as a way to make the student’s existing code more concise. The resistor-color exercise series in particular is designed to ramp up in difficulty and lead to mentoring so there shouldn’t be any significant hurdles in the intro to that series.

Cool-Katt · June 18, 2024, 7:17am

@chezwicker
Normally you can nest single quotes into double quotes and vice versa, but you can’t use the two types of quotes at the same nesting level. The solution to that is to use backticks as the outermost pair of quotes, as this allows you to mix regular single and double quotes in the same string. What the JS representer does is this, It only changes the outermost pair of matched quotes (it was a long time ago that i wrote it so don’t quote me on that), regardless if they’re single or double, to backtick, so if you have two different types of quotes in the string it should work fine without escaping. e.g.

"this is 'fine'" becomes `this is 'fine'`
'this is also "fine"' becomes `this is also "fine"`
"this is "not" fine" becomes `this is "not" fine` (now works)
"this is 'also' "not" fine" becomes `this is 'also' "not" fine` (now works too)

As for the arrow function debate, I totally agree with @BNAndras for this one. It should be a separate PR and we should probably stick to a convention. I like arrow functions, but regular function declarations are less friction for the student.
I feel that we should do the same for return types - pick a convention and use it. I personally am for explicit return types and if the student wants to omit them in their solution then that’s fine and perhaps a good topic for a mentoring discussion.

PS: @mk-mxp I totally see what you meant. Backticks are weird

chezwicker · June 18, 2024, 11:12am

All right, I’m good with that, thanks for your replies.

I’ll thus create the following PRs:

function instead of arrow functions in templates (and example solutions - what else needs touching?) - any preferences with regards to size for reviews? All in one or any splits?
expicit return types
representer merging quoted StringLiteral - so am I right that you prefer the approach taken in JS to the one taken for identifiers (I’ve already got that one pretty much ready, but no big deal discarding it).

Anything else? Is one of you guys a maintainer who is going to review or do we need buy-in from someone else before proceding? thanks!

Cool-Katt · June 18, 2024, 11:17am

As far as PRs are concerned, you’d ~~probably~~ definitely need to consult with either @ErikSchierboom or @SleeplessByte (the maintainer for the JS and TS tracks) before anything can be merged, ideally with both.

The usual approach would be every exercise with a separate PR. I know it’s a lot of work but it’s easier for reviewing.

BNAndras · June 18, 2024, 1:50pm

Example solutions shouldn’t need to be updated. They exist to make sure an exercise can be solved for a track with tour given tests and aren’t student-facing. So this would be potentially one or more PRs for the stubs depending on what the maintainer wants.

SleeplessByte · June 19, 2024, 4:09pm

I have not yet read everything above, but the change we want in both representers is for the represented ast (so the one with the names normalised) to be written back to code in the same way tools such as prettier and eslint do. I would be perfectly okay with us using one of those tools, or whatever is compatible with the estree we are generating.

Why? Because whatever formatting settings are applied there will take care of some of the things mentioned such as quote style, where the commas are, parenthesis usage or no. ESTree just holds too much granular information (we already strip out location to ignore whitespace).

That said, to respond only to the OP:

Yes, see above
No. Having an explicit return type isn’t the same as relying on inference and we may (likely) have exercises about it in the future.
I am hesitant about this because it’s not the same thing. If we do cool things with (this: TypeOfThis, firstArg: string) at some point, this will also completely break.
For exercism except for perhaps grep this works, because we never rely on hoisting and we never rely on side-effects of imports. grep would need an exception though. I have not looked at the linked implementation and wouldn’t look at it. You only need to order the ast children of program body based on the nameLiteral to get this right, if 1 is implemented.

SleeplessByte · July 29, 2024, 9:28pm

I have now read everything and my previous response stands.