Please improve the instructions for the "Variable Length Quantity" exercise

If you’re not from a computer science background you can’t even make heads or tails of this exercise with the current exercise. I have asked ChatGPT to provide me with a better instructions for this you can directly copy paste it there. I can’t contribute to the GitHub repo so posting it here.

Problem Overview

You are going to write two functions: encode and decode, which work with a way of storing numbers called Variable Length Quantity (VLQ).

VLQ is a method for storing integers using fewer bytes when the number is small. This helps save space when encoding many numbers.

Each number is broken into 7-bit chunks, and each chunk is stored in a single byte. The most significant bit (leftmost) of each byte is used to indicate whether there are more bytes to come:

  • If the most significant bit is 1, it means “more bytes follow.”
  • If the most significant bit is 0, it means “this is the final byte.”

This way, small numbers (0–127) need just one byte, while larger numbers can take more.

Example Conversions:

Decimal Number Binary (7-bit chunks) VLQ Bytes (in hex)
0 0000000 0x00
127 1111111 0x7F
128 0000001 0000000 0x81 0x00
8192 0000001 0000000 0000000 0xC0 0x80 0x00

You will only work with unsigned 32-bit integers in this exercise.


Functions to Implement

encode(numbers: List[int]) -> List[int]

Takes a list of integers and returns a list of bytes (as integers) in VLQ format.

decode(bytes: List[int]) -> List[int]

Takes a list of bytes (as integers) and reconstructs the original list of numbers.


Handling Errors

While decoding, you may get a list of bytes that doesn’t end properly — for example, it keeps saying “more bytes are coming,” but no more bytes are actually present.

This is known as an incomplete sequence, and your function must raise a ValueError when it happens:

raise ValueError("incomplete sequence")

This helps catch and report corrupted or invalid input.


What is an “Incomplete Sequence”?

When decoding, a byte with the most significant bit set to 1 tells your code: “there is another byte coming after me.”

If the list ends without a final byte (i.e. one with a leading 0 bit), then the input is incomplete, and it’s impossible to fully decode the number.

Example of invalid input:

decode([0x81])  # invalid — no final byte with MSB = 0
# should raise ValueError("incomplete sequence")

Do we get any clues about what you’re proposing to change or do we need to do a diff ourselves? :smiley:

If you’re proposing something change, it would be helpful to be specific and make it easy for us to figure it out.

27 tracks implement Variable Length Encoding using canonical test data and instructions from a shared repository. So we’d want to update the upstream instructions so all 27 tracks can benefit.

However, first we’ll need to discuss as a group whether the instructions need to be updated and if so, what those changes will look like.

For a start, we shouldn’t reference language-specific details since that won’t be true for other tracks. If something needs to be clarified, that can be appended to the end of the description by each track.

We also need to make sure that any changes improve the clarity for the most people possible. There might be other ways of framing the instructions that we should entertain.

What might be useful here to start the conversation is that the current instructions were unclear for you. How so? What possible changes could we consider short of replacing all the text?

(Complete aside: List is deprecated in Python)