Variable Length Encoding

Hello,

I hope I’m in the right place in the forum for this. I’m a big newbee in programming world. I’m currently struggling with the VLQ encoding exercise.
I’m not finding yet content to help me understanding the concepts. My first undertanding is that :

  • the function gets a hexadecimal value
  • I should convert to a binary value
  • I should then try to split into 7 bits and if there is more than 1 bit, I have the use 1/0 to indicate if this is the latest one.
  • I found “bitwise operators” in the PHP doc but… I’m not undertanding much of this… due to my poor programming background I guess :slight_smile:

I discovered the concept of big/little endian, the fact that 1 bit can’t exceed 127 (decimal) or 7F (hexadecimal) but still I’m lacking the glue to make this obvious for me.

Can you recommend me some good tutorial/documentation ? What’s I found so far is way too technical for me :frowning:

Thanks in advance !

  1. The function gets an array of numbers. The numbers are internally represented using binary (since computers use bits which are on or off) but generally discussed using decimal notation. To the computer, a number is a number is a number. 20 == 0o24 == 0x14 == 0b10100 (I think I got that right). You can use any notation/base, but a number is a number.
  2. For each number in the array, convert it to the VLQ encoding.
  3. Return the array of VLQ encoded values.

The tricky bit is step 2 – the VLQ encoding. Consider the value 128 * 128 + 128 + 25, also known as 16537 or 0b1_0000000_0000000 + 0b1_0000000 + 25 (which mixes representations!) or 0b1_0000001_0011001. If we split it into chunks of 7 bits, it is large enough to need three “chunks”. In VLQ encoding, this becomes three values (1, 1, 25), one chunk each. All VLQ chunks – except the last one, have the “highest” bit set, ie 0b1000_0000 == 128. The first two chunks, 1 and 1, become 128 (high bit)+ 1 (original value) == 129. The last chunk, 25, does not have the high bit set so it is just 25. Combined, 16537 gets encoded as 129, 129, 25 == 0x81, 0x81, 25 == 0b1000_0001, 0b1000_0001, 25.

If the input is an array with two values, [16537, 16537] then the result should be [129, 129, 25, 129, 129, 25].

Taking a number (eg 16537) and figuring out how to represent that in 7 bit chunks (eg 0b1_0000001_0011001) is easy. Write it out as a binary value then put spaces every 7 characters. To the computer, 16537 and 0b1_0000001_0011001 are the exact same thing! It’s just written slightly different with spaces added in different places for readability. No conversion needed.

However, turning 16537 into three numbers, [1, 1, 25] is where things get a bit tricky. That’s where bitwise operators become very useful. Specifically the bitwise AND and bitwise right-shift operators. As a first step, lets consider if we were dealing with base 10 instead of base 2. Take the number 123456. Can you convert that to [12, 34, 56] somehow? If you can do that, then you can use very similar logic to convert the binary value 0b01_01_01 into [1, 1, 1] or 0b1_0000001_0011001 into [1, 1, 25]. I don’t want to spoil too much so I’ll stop myself here.

1 Like

I want to thank you Isaac for your answer. I think that my background/brain is not yet fully up to speed for this. I’m going to focus on other exercises and come back to this later.