Could anyone explain this solution to the ETL problem, for example this one : IsaacG's solution for ETL in jq on Exercism
the part I’m trying to understand is how this key: (.value[] | ascii_downcase),
works with multiple values.
Could anyone explain this solution to the ETL problem, for example this one : IsaacG's solution for ETL in jq on Exercism
the part I’m trying to understand is how this key: (.value[] | ascii_downcase),
works with multiple values.
Hey, that’s my solution! You can run the pipeline in pieces to see what each step outputs
» jq -c . input
{"legacy":{"1":["A","E"],"2":["D","G"]}}
» jq -c '.legacy | to_entries' input
[{"key":"1","value":["A","E"]},{"key":"2","value":["D","G"]}]
» jq -c '.legacy | to_entries | map(.)' input
[{"key":"1","value":["A","E"]},{"key":"2","value":["D","G"]}]
» jq -c '.legacy | to_entries | map(.value[])' input
["A","E","D","G"]
» jq -c '.legacy | to_entries | map(.value[] | ascii_downcase)' input
["a","e","d","g"]
» jq -c '.legacy | to_entries | map(.key | tonumber)' input
[1,2]
» jq -c '.legacy | to_entries | map({key: (.value[] | ascii_downcase), value: (.key | tonumber)})' input
[{"key":"a","value":1},{"key":"e","value":1},{"key":"d","value":2},{"key":"g","value":2}]
» jq -c '.legacy | to_entries | map({key: (.value[] | ascii_downcase), value: (.key | tonumber)}) | sort | from_entries' input
{"a":1,"d":2,"e":1,"g":2}
Does that help? Is there a specific part you would like explained?
The part that is confusing me is how the {key: <array of values> , value: <static variable>}
breaks out into multiple objects.
By defining the mapped object to have a “stream” of values, jq
makes one object per stream value.
In the first command, I set out
to .b
which is a single object. As a result, I get back a list with one object.
In the second command, the out
is set to .b[]
which is a stream of values. jq
uses the same .a
value for in
and makes an object for each value in the stream.
» jq -nc '[{"a": 3, "b": [4, 5]}] | map({"in": .a, "out": .b})'
[
{"in":3, "out":[4,5]}
]
» jq -nc '[{"a": 3, "b": [4, 5]}] | map({"in": .a, "out": .b[]})'
[
{"in":3, "out":4},
{"in":3, "out":5}
]
Does that make sense?
I didn’t full understand how JQ handles the streaming data, but seeing this opens up JQ even more :) Thanks for explaining this.
The syllabus docs mention streams a bunch.
The word stream shows up:
.[]
outputting a stream.It wasn’t so much the streaming data that I didn’t understand, I just didn’t realize that when given a stream in an object it would pop out multiple objects like that.
echo '{"a": 3, "b": [4, 5]}' | jq ' {"key": .b[], "value": .a}'
{
"key": 4,
"value": 3
}
{
"key": 5,
"value": 3
}
Now I’m going to be sad when I go back to python and it doesn’t do this for me.
jq does a lot of “implicit iteration” when processing a stream.
For example,
$ jq -cn '[1,2,3,4,5] | [.[] * 2]'
[2,4,6,8,10]
The *
operator operates on each element of the stream .[]
And then the outer [...]
collects the resulting stream into an array.
That’s implicit iteration.