Explain ETL solution?

Could anyone explain this solution to the ETL problem, for example this one : IsaacG's solution for ETL in jq on Exercism

the part I’m trying to understand is how this key: (.value[] | ascii_downcase), works with multiple values.

Hey, that’s my solution! You can run the pipeline in pieces to see what each step outputs :slight_smile:

» jq -c . input

» jq -c '.legacy | to_entries' input

» jq -c '.legacy | to_entries | map(.)' input

» jq -c '.legacy | to_entries | map(.value[])' input

» jq -c '.legacy | to_entries | map(.value[] | ascii_downcase)' input

» jq -c '.legacy | to_entries | map(.key | tonumber)' input

» jq -c '.legacy | to_entries | map({key: (.value[] | ascii_downcase), value: (.key | tonumber)})' input

» jq -c '.legacy | to_entries | map({key: (.value[] | ascii_downcase), value: (.key | tonumber)}) | sort | from_entries' input

Does that help? Is there a specific part you would like explained?

The part that is confusing me is how the {key: <array of values> , value: <static variable>} breaks out into multiple objects.

By defining the mapped object to have a “stream” of values, jq makes one object per stream value.

In the first command, I set out to .b which is a single object. As a result, I get back a list with one object.

In the second command, the out is set to .b[] which is a stream of values. jq uses the same .a value for in and makes an object for each value in the stream.

 » jq -nc '[{"a": 3, "b": [4, 5]}] | map({"in": .a, "out": .b})'
    {"in":3, "out":[4,5]}

» jq -nc '[{"a": 3, "b": [4, 5]}] | map({"in": .a, "out": .b[]})'
    {"in":3, "out":4},
    {"in":3, "out":5}

Does that make sense?

I didn’t full understand how JQ handles the streaming data, but seeing this opens up JQ even more :) Thanks for explaining this.

The syllabus docs mention streams a bunch.

The word stream shows up:

It wasn’t so much the streaming data that I didn’t understand, I just didn’t realize that when given a stream in an object it would pop out multiple objects like that.

echo '{"a": 3, "b": [4, 5]}' | jq ' {"key": .b[], "value": .a}'

  "key": 4,
  "value": 3
  "key": 5,
  "value": 3

Now I’m going to be sad when I go back to python and it doesn’t do this for me.

1 Like

jq does a lot of “implicit iteration” when processing a stream.

For example,

$ jq -cn '[1,2,3,4,5] | [.[] * 2]'

The * operator operates on each element of the stream .[]
And then the outer [...] collects the resulting stream into an array.

That’s implicit iteration.