GNU awk and CSV

gawk version 5.3 was recently released. It now has a --csv switch!

$ cat file.csv
one,"two","three,four","""five"" and six"

$ gawk --csv '{
    print NF
    for (i=1; i<=NF; i++) printf "%d\t>%s<\n", i, $i
}' file.csv
4
1	>one<
2	>two<
3	>three,four<
4	>"five" and six<

$ gawk --version
GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
Copyright (C) 1989, 1991-2023 Free Software Foundation.
...
2 Likes

Doc link: Working With Comma Separated Value Files

Previous versions required an arcane FPAT pattern that doesn’t remove quotes from the fields:

$ gawk -v FPAT="([^,]*)|(\"[^\"]+\")" '{
    print NF
    for (i=1; i<=NF; i++) {
        value = gensub(/^"(.*)"$/, "\\1", 1, $i)
        gsub(/""/, "\"", value) 
        printf "%d\t>%s<\t>%s<\n", i, $i, value
    }
}' file.csv
4
1       >one<   >one<
2       >"two"< >two<
3       >"three,four"<  >three,four<
4       >"""five"" and six"<    >"five" and six<

Starting with bash version 5.2, bash can do this too:

$ declare -p BASH_VERSION
declare -- BASH_VERSION="5.2.21(1)-release"

$ command -v bash
/home/linuxbrew/.linuxbrew/bin/bash

$ declare -p BASH_LOADABLES_PATH
declare -- BASH_LOADABLES_PATH="/home/linuxbrew/.linuxbrew/lib/bash"

$ enable dsv

$ while IFS= read -r line; do
    dsv -a fields "$line"
    declare -p fields
done < file.csv
declare -a fields=([0]="one" [1]="two" [2]="three,four" [3]="\"five\" and six")
1 Like