因为unix
cat <A.csv
TABLE1, ABC_STRING
TABLE2, ABC_STRING
TABLE3, ABC_STRING
EOF
cat <
B.csv
TABLEA,SOMEVALUE,ABC_STRING,NULL,ABC_STRING
TABLEB,SOMEVALUE,ABC_STRING,NULL,ABC_STRING
TABLE1,SOMEVALUE,ABC_INT,NULL,ABC_INT
TABLEC,SOMEVALUE,ABC_STRING,NULL,ABC_STRING
TABLE2,SOMEVALUE,ABC_INT,NULL,ABC_INT
TABLE3,SOMEVALUE,ABC_INT,NULL,ABC_INT
EOF
join the first file on the first field
with the second file on the second field
print unmatched lines from the second file
the unknown matches are substituted with
the output is complicated - we output the matched correct line (6 fields)
and after it we output the original second file (6 fields)
when there is no match, the last field from the correct line
is the empty separator ‘##’
we can filter is later with sed
join -11 -22 -t, -a2 -e’##’ -o 2.1,2.2,2.3,1.2,2.5,1.2,2.1,2.2,2.3,2.4,2.5,2.6 <(
# dunno what the spaces are doing in A.csv, remove them
<A.csv tr -d ' ' |
# sort the file on the first field
sort -k1 -t,
) <(
# add a number of the lines to the second file
# so we can sort it like the original file later
<B.csv nl -s, -w1 |
# sort it on the second field (the first field is the number now)
sort -k2 -t,
) |
here the output looks like:
6,TABLE3,SOMEVALUE,ABC_STRING,NULL,ABC_STRING,6,TABLE3,SOMEVALUE,ABC_INT,NULL,ABC_INT
1,TABLEA,SOMEVALUE,##,NULL,##,1,TABLEA,SOMEVALUE,ABC_STRING,NULL,ABC_STRING
remove the first 6 fields from the lines with
##,
they were not matchedsed ‘s/.*,##,//‘ |
extract first 6 fields, less to sort, operation is cheap
cut -d, -f1-6 |
sort on the field numerical. This is the numbers we inserted in the second file
sort -k1 -t, -n |
extract 5 lines from the original
cut -d, -f2-6
</code>