The file has the numeric value toward the end of the first field,
" La la la, bla bla bla 123 bla bla ", "ksdjf"
" La , Bla 123 la la blala bla", "ksdjkjf"
I want to check the -3 word of the first field, if it is numeric, then add "," before the -3 word to delimit a new field. If not check the -4 word , if it is numberic then add "," before it. This will isolate the numeric word and the following text in field 2. It needs to work from the end of the field.
Sed? Gawk? Awk? Grep?
A possible solution with 'awk'
-----------
awk '
{
# Get Field 1
if (match($0, /^"[^"]*",/) == 0) {
print $0;
next;
}
field1 = substr($0,1,RLENGTH-1);
# Search for number in word 3 or 4 starting from the end of field1
if (match(field1,/[0-9]+ +[^ ]+ +[^ ]+ *"$/) == 0) {
if (match(field1,/[0-9]+ +[^ ]+ +[^ ]+ +[^ ]+ *"$/) == 0) {
print $0;
next;
}
}
# Insert "," before number
print substr($0,1,RSTART-1) "\",\"" substr($0,RSTART,length($0)-RSTART+1);
}
' input_file
-----------
If your version of awk supports "interval expression", you can rewrite the two last if statements :
if (match(field1,/[0-9]+( +[^ ]+){2} *"$/) == 0) {
if (match(field1,/[0-9]+( +[^ ]+){3} *"$/) == 0) {
With the following input data :
" La la la, bla bla bla 123 bla bla ", "ksdjf"
" La , Bla 123 lala blala bla", "ksdjkjf"
" La , Bla 123 la la blala bla", "ksdjkjf"
The result is :
" La la la, bla bla bla ","123 bla bla ", "ksdjf"
" La , Bla ","123 lala blala bla", "ksdjkjf"
" La , Bla 123 la la blala bla", "ksdjkjf"
No comments:
Post a Comment