glennj
1
gawk version 5.3 was recently released. It now has a --csv
switch!
$ cat file.csv
one,"two","three,four","""five"" and six"
$ gawk --csv '{
print NF
for (i=1; i<=NF; i++) printf "%d\t>%s<\n", i, $i
}' file.csv
4
1 >one<
2 >two<
3 >three,four<
4 >"five" and six<
$ gawk --version
GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
Copyright (C) 1989, 1991-2023 Free Software Foundation.
...
2 Likes
glennj
2
glennj
3
Previous versions required an arcane FPAT pattern that doesn’t remove quotes from the fields:
$ gawk -v FPAT="([^,]*)|(\"[^\"]+\")" '{
print NF
for (i=1; i<=NF; i++) {
value = gensub(/^"(.*)"$/, "\\1", 1, $i)
gsub(/""/, "\"", value)
printf "%d\t>%s<\t>%s<\n", i, $i, value
}
}' file.csv
4
1 >one< >one<
2 >"two"< >two<
3 >"three,four"< >three,four<
4 >"""five"" and six"< >"five" and six<
glennj
4
Starting with bash version 5.2, bash can do this too:
$ declare -p BASH_VERSION
declare -- BASH_VERSION="5.2.21(1)-release"
$ command -v bash
/home/linuxbrew/.linuxbrew/bin/bash
$ declare -p BASH_LOADABLES_PATH
declare -- BASH_LOADABLES_PATH="/home/linuxbrew/.linuxbrew/lib/bash"
$ enable dsv
$ while IFS= read -r line; do
dsv -a fields "$line"
declare -p fields
done < file.csv
declare -a fields=([0]="one" [1]="two" [2]="three,four" [3]="\"five\" and six")
1 Like