Tweak language and formatting in `protein-translation/description.md`

codingthat · April 23, 2025, 3:51pm

I appreciate your comment, but I’m confused, won’t this just be a new thread with the same discussion about the same proposed changes? Or do you mean I should have a separate thread for each change? Or one big thread but discussing the changes sequentially?

codingthat · April 23, 2025, 3:53pm

Sorry, I forgot to mention “by the ribosome.” I removed that too since the word “ribosome” isn’t used anywhere else in the description, so this seems confusing at best for those who aren’t familiar with it. “By the program” would be more consistent with the rest of the description, but it’s also redundant. Hence the removal.

BNAndras · April 23, 2025, 4:40pm

The tests don’t check for a STOP value. It’s up to the student how they want to handle it as long as translation stops and the sequence is returned. That’s the behavior the tests related to stop codons are checking.

I was instead referring to the test descriptions in the canonical test data since it was unclear if we should update those as well so “STOP codon” is “stop codon”. Generally, the descriptions get reused in a track’s test suite in one form or another so that’d also be public-facing. However, we can’t just update the test descriptions for existing tests so new tests re-implementing the old tests but with updated descriptions would be needed. Then maintainers know they have tests to update and can do so. Not all tracks have generators so that’ll take up maintainer time.

codingthat · April 23, 2025, 4:58pm

Oh — that must be specific to Elixir:

    @tag :pending
    test "identifies stop codons" do
      assert ProteinTranslation.of_codon("UAA") == {:ok, "STOP"}
      assert ProteinTranslation.of_codon("UAG") == {:ok, "STOP"}
      assert ProteinTranslation.of_codon("UGA") == {:ok, "STOP"}
    end

In fact, in the Elixir test descriptions it’s already “stop codon” almost everywhere, except in one.

Anyway thanks @BNAndras I see your point. I can remove that part of the PR (or rather, change the other instance of “stop codon” in the description to “STOP codon”) if that’s the final answer.

IsaacG · April 23, 2025, 5:04pm

Rather than using PRs to convey intended changes, can you update this forum thread with the latest proposed change(s) for discussion/approval?

codingthat · April 23, 2025, 5:59pm

Sure, sorry, I didn’t mean I’d put in a new PR until everything was agreed. Here’s where I believe we’re at:

Remove extraneous => for consistency
Change 'STOP' codon and stop codon to both be STOP codon (caps but no quotes) to avoid extra test maintenance.
Remove “(by the ribosome)” for consistency and accessibility.
Remove redundant “after” (given “subsequent”)
Remove the empty line just before the redundant “after” because it heavily depends on the previous line, so makes more sense in the same ¶ rather than as a new idea.

^^ @siebenschlaefer sorry, I missed explaining that last one too earlier. I believe what probably happened is that it used to say “after a STOP codon.” But I think what I propose (removing “after” and also the newline) is both more concise and more readable.

BNAndras · April 23, 2025, 7:49pm

The fat arrow sequence section is difficult for me to parse mentally since the info after each => rephrases what the previous transformation is. So I’m going forward in the sequence when I hit the => and expecting the next transformation, but instead I’m seeing the previous transformation rephrased.

I’d suggest we show the steps like this:

RNA => Three-letter codons => proteins
"AUGUUUUCU" => "AUG", "UUU", "UCU" => "Methionine", "Phenylalanine", "Serine"

BNAndras · April 23, 2025, 7:56pm

There are 64 codons which in turn correspond to 20 amino acids; however, all of the codon sequences and resulting amino acids are not important in this exercise. If it works for one codon, the program should work for all of them. However, feel free to expand the list in the test suite to include them all.

That should read not all codon sequences and resulting amino acids are important. However, we should rephrase this since only CLI users would be in a place to run modified tests locally. That whole section ca be shortened to:

There are 64 codons which in turn correspond to 20 amino acids; however, not all codons will be used in this exercise.

We later provide a table of the relevant codons so perhaps we don’t even need this line.

iHiD · April 24, 2025, 8:02am

Just as an FYI, I did some work on this here: website/bootcamp_content/projects/string-puzzles/exercises/protein-translation/introduction.md at 00934f71f43f8efb5cb28d256bf8b46c99b218a7 · exercism/website · GitHub

(I don’t have brainspace to engage with the discussion, but maybe there’s something there that’s useful?)

codingthat · April 24, 2025, 9:09am

Thanks @BNAndras , I agree with both those proposals, except I now see that “proteins” should be “amino acids” for the third step. Could I make it into a horizontal table to align the corresponding elements? (It’d fit a standard 80-char terminal still even for people reading unrendered Markdown that way.)

RNA	Three-letter codons	Amino acids
“AUGUUUUCU”	“AUG”, “UUU”, “UCU”	“Methionine”, “Phenylalanine”, “Serine”

And thanks @iHiD , that’s helpful, I’m seeing ideas we can pull from your changes and make consistent (like active voice):

RNA can be broken into three-nucleotide sequences called codons, and then translated to a protein like so:

becomes

You can break an RNA strand into three-nucleotide sequences called codons and then translate them into amino acids to make a protein like so:

and

There are also three terminating codons (also known as ‘STOP’ codons); if any of these codons are encountered (by the ribosome), all translation ends and the protein is terminated.

All subsequent codons after are ignored, like this:

becomes

There are also three STOP codons. If you encounter any of these codons, ignore the rest of the sequence — the protein is complete. For example, UAA is a STOP codon, so ignore any subsequent codons:
… (similar table here) …
(Note that the latter AUG is not translated into another methionine.)

tasx · April 24, 2025, 3:30pm

I haven’t been following the discussion, but just in case it’s helpful, here’s the past PR that updated the instructions

codingthat · April 24, 2025, 3:47pm

Thanks, good point. I see "property": "proteins" should probably also be fixed in the canonical data too, but since we were avoiding changes to that for “stop codon,” I’m not sure. @BNAndras ?

BNAndras · April 24, 2025, 3:50pm

I’m not sure about how changes to properties might play out, but I think we should avoid updating the canonical data if possible and focus on the instructions.

IsaacG · April 24, 2025, 3:52pm

What @BNAndras said. Updating the properties may mean reimplementing all the tests, which seems like a lot.

codingthat · April 24, 2025, 5:29pm

OK, given that, is everyone OK if I PR based on my summary above? Tweak language and formatting in `protein-translation/description.md` - #20 by codingthat

IsaacG · April 24, 2025, 5:34pm

I think you should mention that stop codons exist before explaining what to do when encountering them.

codingthat · April 24, 2025, 5:50pm

Good catch, my bad.

Also my link wasn’t the best, I meant Tweak language and formatting in `protein-translation/description.md` - #16 by codingthat plus whatever supersedes it in Tweak language and formatting in `protein-translation/description.md` - #20 by codingthat .

IsaacG · April 24, 2025, 5:57pm

Seems reasonable to me. Something like this?

- RNA: `"AUGUUUUCU"` => translates to
- Codons: `"AUG", "UUU", "UCU"`
- => which become a protein with the following sequence =>
- Protein: `"Methionine", "Phenylalanine", "Serine"`
+ RNA `"AUGUUUUCU"` translates to codons `"AUG", "UUU", "UCU"`.
+ That become a protein with the sequence `"Methionine", "Phenylalanine", "Serine"`.

- All subsequent codons after are ignored, like this:
- RNA: `"AUGUUUUCUUAAAUG"` =>
- Codons: `"AUG", "UUU", "UCU", "UAA", "AUG"` =>
- Protein: `"Methionine", "Phenylalanine", "Serine"`
+ All subsequent codons are ignored.
+ For example, RNA `"AUGUUUUCUUAAAUG"` translates to codons `"AUG", "UUU", "UCU".
+ That become a protein with the sequence `"Methionine", "Phenylalanine", "Serine"`.

codingthat · April 25, 2025, 10:17am

I was thinking a table would be more readable: Tweak language and formatting in `protein-translation/description.md` - #20 by codingthat What do you think?

IsaacG · April 25, 2025, 2:50pm

Got it. That looks good to me.