Explain ETL solution?

Could anyone explain this solution to the ETL problem, for example this one : IsaacG's solution for ETL in jq on Exercism

the part I’m trying to understand is how this key: (.value[] | ascii_downcase), works with multiple values.

Hey, that’s my solution! You can run the pipeline in pieces to see what each step outputs :slight_smile:

» jq -c . input
{"legacy":{"1":["A","E"],"2":["D","G"]}}

» jq -c '.legacy | to_entries' input
[{"key":"1","value":["A","E"]},{"key":"2","value":["D","G"]}]

» jq -c '.legacy | to_entries | map(.)' input
[{"key":"1","value":["A","E"]},{"key":"2","value":["D","G"]}]

» jq -c '.legacy | to_entries | map(.value[])' input
["A","E","D","G"]

» jq -c '.legacy | to_entries | map(.value[] | ascii_downcase)' input
["a","e","d","g"]

» jq -c '.legacy | to_entries | map(.key | tonumber)' input
[1,2]

» jq -c '.legacy | to_entries | map({key: (.value[] | ascii_downcase), value: (.key | tonumber)})' input
[{"key":"a","value":1},{"key":"e","value":1},{"key":"d","value":2},{"key":"g","value":2}]

» jq -c '.legacy | to_entries | map({key: (.value[] | ascii_downcase), value: (.key | tonumber)}) | sort | from_entries' input
{"a":1,"d":2,"e":1,"g":2}

Does that help? Is there a specific part you would like explained?

The part that is confusing me is how the {key: <array of values> , value: <static variable>} breaks out into multiple objects.

By defining the mapped object to have a “stream” of values, jq makes one object per stream value.

In the first command, I set out to .b which is a single object. As a result, I get back a list with one object.

In the second command, the out is set to .b[] which is a stream of values. jq uses the same .a value for in and makes an object for each value in the stream.

 » jq -nc '[{"a": 3, "b": [4, 5]}] | map({"in": .a, "out": .b})'
[
    {"in":3, "out":[4,5]}
]

» jq -nc '[{"a": 3, "b": [4, 5]}] | map({"in": .a, "out": .b[]})'
[
    {"in":3, "out":4},
    {"in":3, "out":5}
]

Does that make sense?

I didn’t full understand how JQ handles the streaming data, but seeing this opens up JQ even more :) Thanks for explaining this.

The syllabus docs mention streams a bunch.

The word stream shows up:

It wasn’t so much the streaming data that I didn’t understand, I just didn’t realize that when given a stream in an object it would pop out multiple objects like that.

echo '{"a": 3, "b": [4, 5]}' | jq ' {"key": .b[], "value": .a}'

{
  "key": 4,
  "value": 3
}
{
  "key": 5,
  "value": 3
}

Now I’m going to be sad when I go back to python and it doesn’t do this for me.

1 Like

jq does a lot of “implicit iteration” when processing a stream.

For example,

$ jq -cn '[1,2,3,4,5] | [.[] * 2]'
[2,4,6,8,10]

The * operator operates on each element of the stream .[]
And then the outer [...] collects the resulting stream into an array.

That’s implicit iteration.