Protein-translation: wording should be changed in the exercise description

Related to the already open issue Protein Translation: task description, "codon to amino acid" table improvement · Issue #1639 · exercism/problem-specifications · GitHub

The update makes complete sense. Wanted confirmation from the community this is something we want to do.

Not only it makes sense, but the term “protein” is used incorrectly. Methionine etc, all of them are amino acids.

Even in line 30 of the file you are trying to change it states “Amino Acids”. It’s just the column label that’s incorrect.

Is this thread for just updating the docs? The issue conversation also refers to updating the tests.toml file.

If we’re just updating the column name, that seems fine to me. If we’re also discussing updating the tests, I believe changing the property name being tested would mean all existing canonical tests should be be invalidated and re-implemented as new ones. That’s a good bit of churn for maintainers since not all tracks have test generators. Adding new tests also means we should rerun existing solutions, almost certainly breaking them all.

As a result, such a substantial change to this exercise might not be well-advised. Perhaps we deprecate protein-translation and create a new amino-acid-translation exercise to replace it. Then, we can cleanly solve both problems mentioned in that GitHub issue.

I wonder if either change is worth the churn. Asking track maintainers to recreate tests for essentially the same exercise may be of limited value.

I think it would be sufficient to update the terminology used in the description. The exercise explanation is simple enough, even though technical details might be inaccurate.

I would also suggest we remove mentions of “polypeptide” from the description, as (although accurate) it only causes confusion - stating that codons form a protein chain is enough.

So instead of:
RNA can be broken into three nucleotide sequences called codons, and then translated to a polypeptide like so:

It should just state:
RNA can be broken into three nucleotide sequences called codons, and then translated to a protein like so:

This would be sufficient technical accuracy for the purposes of the exercise in my opinion. If we were to apply this as described, then there would be no need to update the tests, and other maintainers would have no additional work to do.

1 Like

I’d also hyphenate three nucleotide in that sentence. RNA can be broken into three-nucleotide sequences called codons, and then translated to a protein like so makes it clear that each sequence is three nucleotides. Without the hyphen, that sentence can be read as RNA can be broken into exactly three sequences with an unknown number of nucleotides per sequence.

1 Like

This PR contains the changes to the description. If the other maintainers agree with the changes, then the PR is ready for review.

1 Like

This looks like a better solution. The current tests have nothing to do with translating into proteins. For example:

    {
      "uuid": "47bcfba2-9d72-46ad-bbce-22f7666b7eb1",
      "description": "Tyrosine RNA sequence 2",
      "property": "proteins",
      "input": {
        "strand": "UAC"
      },
      "expected": ["Tyrosine"]
    },

This is just a sequence of amino acids—no polypeptides and consequently, no proteins.

If we were to be writing this exercise for the first time, I’d tend to agree with you and make it as factual as possible. Given that the exercise is already implemented, and that changing it would force every track to reimplement it only to change some terminology is IMO not worth the trouble.

IMO the purpose of the exercise “story” that we tell the student is only to give them some kind of purpose for which they’re to solve the exercise. Although I think we should aim to be as factual as possible, we should balance this out with the story being simple and easy to follow. Everyone (almost) knows what a protein is, the definition we have is just clear enough, the student is ready to solve the exercise.

1 Like