Need help with reworking the run-length-encoding exercise

kytrinyx · April 13, 2023, 1:41pm

This is part of the exercise overhaul described here:

Basically for each exercise, we want to make sure that it is in the context of a story. Not an elaborate story, but something that helps you imagine a concrete scenario and make it just a bit more interesting.

As an example, instead of just “figure out if a sentence is a pangram”, we reframed the exercise to give a reason why you need to do this (you work for a company that makes fonts, and they want to use pangrams to show off the fonts on their website. https://github.com/exercism/problem-specifications/pull/2215).

I’m trying to come up with ideas for the scenario for the run-length-encoding exercise.

This is what we have, currently:

github.com

exercism/problem-specifications/blob/main/exercises/run-length-encoding/description.md

# Description

Implement run-length encoding and decoding.

Run-length encoding (RLE) is a simple form of data compression, where runs (consecutive data elements) are replaced by just one data value and count.

For example we can represent the original 53 characters with only 13.

```text
"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"  ->  "12WB12W3B24WB"
```

RLE allows the original data to be perfectly reconstructed from the compressed data, which makes it a lossless data compression.

```text
"AABCCCDEEEE"  ->  "2AB3CD4E"  ->  "AABCCCDEEEE"
```

For simplicity, you can assume that the unencoded string will only contain the letters A through Z (either lower or upper case) and whitespace.
This way data to be encoded will never contain any numbers and numbers inside data to be decoded always represent the count for the following character.

The instructions for the exercise will basically be the same, we’re just trying to frame the exercise within a scenario that helps make it feel more concrete and interesting.

The questions to kick this off are:

What situations would require minimizing data for
- transmission?
- storage?
For a given scenario, what is the data that has lots of sequences of the same letter being repeated, making it ideal for run-length encoding?

Any ideas or suggestions?

(/cc @dreig – You were a huge help with the saddle-points exercise. Pulling you in, just in case this is up your alley)

glennj · April 13, 2023, 2:21pm

A multiple-choice exam would have a sequence of ABCDE answers that could be encoded into an answer key.

MatthijsBlom · April 13, 2023, 2:21pm

See also: History and applications of RLE on Wikipedia.

For RLE to be useful, the uncoded data needs to be monotonous. This suggests measurement to me. (Indeed, a television signal is an example of a measurement: by camera.)

In transmission, compression is only useful 1) if you need to transmit lots, or 2) if you need to transmit quick.

I’m thinking of spy submarines and satellites. The former needs to dive again as soon as possible; the latter may have only short timespans in which they are positioned above and can talk to ground stations. The focus on characters does not fit these scenarios very well however.

glennj · April 13, 2023, 2:23pm

I had this thought as well: something about daily/hourly weather observations

MatthijsBlom · April 13, 2023, 2:31pm

I’m guessing most LHC data is very boring. Also, there is very much of it: it is cheaper for CERN to transmit (some of) it via physical hard drives rather than over a wire network.

IsaacG · April 13, 2023, 2:52pm

This LHC comment reminded me how Google used to let customers “upload” data to the cloud via courier delivered hard drives

BethanyG · April 13, 2023, 11:14pm

Sadly, it’s historical data. But what about the weather on mars? here’s more updated weather at Gale Crater. Something around encoding it for transmission, or decoding it for reporting?

glennj · April 13, 2023, 11:55pm

It just occurred to me: this would be nice scenario for Hamming: count the wrong answers.

vaeng · April 14, 2023, 7:32am

What about knitting patterns?

Some examples for b/w.

Colors are encoded in the same way.

You could have a scenario where the user is tasked to transfer and compress the archives of a famous knitting pattern company via RLE?

MatthijsBlom · April 14, 2023, 9:13am

Aye!

You work for a company that produces colored nonogram booklets. You have been tasked with automating the conversion of drawings into puzzles.

kytrinyx · April 14, 2023, 12:08pm

Thank you so much for all the suggestions.

As I was drifting off to sleep last night, I was also thinking about the combination of space and measurements—especially if there are lots of measurements that are the same most of the time, but that when something happens you need to know quickly. So maybe temperature measurements on a tidally locked moon or something. I feel vindicated by @MatthijsBlom and @BethanyG’s suggestions :)

@glennj I love the idea of using Hamming to do test scores. I’m totally going to run with that. It’s way more approachable than DNA (and we already have DNA other places).

@vaeng I love the knitting patterns! I love that it’s so different, and what we in Norway would call “koselig”. (Also, @MatthijsBlom, thanks for the official name for these types of patterns.)

So, to conclude: I’m stealing the multiple choice thing for Hamming and running with the knitting patterns example for run-length-encoding.

Y’all are wonderful.

MatthijsBlom · April 14, 2023, 12:35pm

Gezellig is often brought up as an ‘untranslatable’ Dutch word. However, it now seems to me that it is shared in Scandinavian as hyggelig and koselig. All three look a bit like cognates to me, but I cannot find proof.

vaeng · April 14, 2023, 12:45pm

Happy to contribute! And I thought I would never need my degree in textile engineering ever again after switching to comp science.

I have to chip in with some German term.

Gemütlichkeit (German pronunciation: [ɡəˈmyːtlɪçkaɪt]) is a German-language word used to convey the idea of a state or feeling of warmth, friendliness, and good cheer. Other qualities encompassed by the term include cosiness, peace of mind, and a sense of belonging and well-being springing from social acceptance. The adjective “gemütlich” is translated as “cosy” so “Gemütlichkeit” could be simply translated as “cosiness.”

deleted-user-33691 · April 14, 2023, 12:46pm

Since we have “gesellig” in German, sounds like “koselig” could be to things, what “gesellig” is to persons.

MatthijsBlom · April 14, 2023, 12:53pm

We have gemoedelijk(heid) as well. In Dutch they are not exactly the same.

kotp · April 14, 2023, 2:47pm

I really was surprised to see the patterns on the screen, and I think it is an awesome example for RLE.

Also, run with the patterns, but never with the knitting hooks. (Safety first!)