Unix: extracting data from a .dat file and inserting into SQL database?
This is not a good problem for grep
and sed
. I recommend awk
. An untested first cut:
awk '/<Name>/ {name=$1}/<Email>/ {emails[name] = $1}END {for (n in emails) {print n, print email[n]}}' *.dat
You could also try
END {for (n in emails) {print "sqlite db.sql INSERT INTO users VALUES (" n "," email");"}}
Seems like you are a great fan of grep
. Give it a try:
grep -Po '(?<=(Name|mail)>[\t\s])(.*)$' file | `xargs -n2 printf "sqlite db.sql INSERT INTO users VALUES (%s, %s)\n"`
The first part is doing a positive lookbehind to fetch the relevant info. Lookbehind doesn't support varibale lengths, that why mail
is being used instead of Email
. It outputs :
Name_1Email_1Name_2Email_2
The xargs -n2
is combining name and email as follows:
Name_1 Email_1Name_2 Email_2
This is formatted by the printf
and is being executed. Hope it helps.
Now please don't tell me your grep doesn't support -P
;-)
You can do it in (GNU) sed, altough the awk script is much simpler.
dat2sql.sed:
/<NAME>/I H # store name/<EMAIL>/I { H; # store email g # get stored strings s/<[^>]+>\s+//gI; # remove <NAME> and <EMAIL> s/^$\n/sqlite db.sql INSERT INTO users VALUES ("/; s/\n/", "/; s/$/" );/; p # print results s/.*//g; x; # clear hold space}
Use it like this: sed -rn -f dat2sql.sed your_file
.
The prerequisite is that Name is before Email for each record in the file.