从您的输入开始,使用Miller( http://johnkerl.org/miller/doc/index.html )
mlr --nidx --fs "|" put '$nf=NF' then cut -f nf then cat -n input
这些是每行的字段数
1|17 2|17 3|17 4|17 5|19 6|17 7|17 8|18 9|18 10|17 11|19 12|18 13|18
如果只想要包含17个字段的行
mlr --nidx --fs "|" put '$nf=NF' then filter '$nf==17' then cut -x -f nf input
给你
$~$TRN_FILE_DT$~$|$~$TRN_BANK_STATE_ID$~$|$~$ACCT_NUM_FULL$~$|$~$TRN_TRANSACTION_CD$~$|$~$TRN_TRANSACTION_DT$~$|$~$TRN_TRANSACTION_AMT$~$|$~$TRN_BAT_NUM$~$|$~$TRN_SEQ_NUM$~$|$~$TRN_LOCUSORTYPE$~$|$~$TRN_CITYST$~$|$~$TRN_PURPOSE$~$|$~$TRN_ATM_LOC_CD$~$|$~$TRN_AMT_LOC_ON_US$~$|$~$TRN_AMT_REMOTE_BR$~$|$~$TRN_ATM_GL_RC$~$|$~$TRN_POST_SEQ$~$|$~$TRN_POSTING_PRIORITY$~$ $~$2019-01-01 00:00:00$~$|$~$001$~$|$~$0xBFACD988EAF6ABE515C16CE33C10F0860B33C83D$~$|$~$0129$~$|$~$2018-12-31 00:00:00$~$|$~$1425.00$~$|$~$5912 $~$|$~$13312$~$|$~$TO CHECKING$~$|$~$$~$|$~$TO CHECKING$~$|$~$$~$|$~$$~$|$~$$~$|$~$$~$|$~$00001$~$|$~$35$~$ $~$2019-01-01 00:00:00$~$|$~$001$~$|$~$0xDFC9ACE6A089C648E74524847A7273763475655D$~$|$~$0170$~$|$~$2018-12-31 00:00:00$~$|$~$3503.71$~$|$~$7200 $~$|$~$90542$~$|$~$Morgan Stanley$~$|$~$$~$|$~$ACH CREDIT$~$|$~$$~$|$~$$~$|$~$$~$|$~$$~$|$~$00001$~$|$~$20$~$ $~$2019-01-01 00:00:00$~$|$~$001$~$|$~$0xDFC9ACE6A089C648E74524847A7273763475655D$~$|$~$0136$~$|$~$2018-12-31 00:00:00$~$|$~$34.00$~$|$~$8888 $~$|$~$51279$~$|$~$Village Nails & Sp$~$|$~$CRANSTON RI$~$|$~$DBT PURCHASE$~$|$~$7230$~$|$~$$~$|$~$$~$|$~$$~$|$~$00001$~$|$~$35$~$ $~$2019-01-01 00:00:00$~$|$~$001$~$|$~$0xDFC9ACE6A089C648E74524847A7273763475655D$~$|$~$0136$~$|$~$2018-12-31 00:00:00$~$|$~$75.19$~$|$~$8888 $~$|$~$66089$~$|$~$P.J.'S PUB$~$|$~$NARRAGANSETT RI$~$|$~$DBT PURCHASE$~$|$~$5812$~$|$~$$~$|$~$$~$|$~$$~$|$~$00003$~$|$~$35$~$ $~$2019-01-01 00:00:00$~$|$~$001$~$|$~$0xDFC9ACE6A089C648E74524847A7273763475655D$~$|$~$0136$~$|$~$2018-12-31 00:00:00$~$|$~$9.34$~$|$~$8888 $~$|$~$66093$~$|$~$Amazon.com*M26TD1R$~$|$~$Amzn.com/billWA$~$|$~$DBT PURCHASE$~$|$~$5942$~$|$~$$~$|$~$$~$|$~$$~$|$~$00004$~$|$~$35$~$ $~$2019-01-01 00:00:00$~$|$~$001$~$|$~$0xDFC9ACE6A089C648E74524847A7273763475655D$~$|$~$0136$~$|$~$2018-12-31 00:00:00$~$|$~$5.92$~$|$~$8888 $~$|$~$66091$~$|$~$AMZN Mktp US*M28O5$~$|$~$Amzn.com/billWA$~$|$~$DBT PURCHASE$~$|$~$5942$~$|$~$$~$|$~$$~$|$~$$~$|$~$00007$~$|$~$35$~$
正如我的评论中所解释的,我经常使用awk来测试分隔符问题。如果你有字符串封装字符,这可能会有点难看,但如果这不是问题,那么它的工作非常简单:
$ cat testfile.csv a,1,2,3,4,5,6 b,1,2,3,4,5 c,1,2,3,4,5,6,7 d,1,2,3,4,5,6 e,1,2,3,4,5,6 $ awk -F"," 'BEGIN{fieldcount=7}NF!=fieldcount{print $0>FILENAME"_bad";next}{print $0}' testfile.csv > testfile_good.csv $ cat testfile.csv_bad b,1,2,3,4,5 c,1,2,3,4,5,6,7 $ cat testfile_good.csv a,1,2,3,4,5,6 d,1,2,3,4,5,6 e,1,2,3,4,5,6