What is an efficient way to replace list of strings with another list in Unix file?

bash unix scripting file-io

This will do it in one pass. It reads listA and listB into awk arrays, then for each line of the linput, it examines each word and if the word is found in listA, the word is replaced by the corresponding word in listB.

awk '    FILENAME == ARGV[1] { listA[$1] = FNR; next }    FILENAME == ARGV[2] { listB[FNR] = $1; next }    {        for (i = 1; i <= NF; i++) {            if ($i in listA) {                $i = listB[listA[$i]]            }        }        print    }' listA listB filename > filename.newmv filename.new filename

I'm assuming the strings in listA do not contain whitespace (awk's default field separator)

bash unix scripting file-io

Make one call to sed that writes the sed script, and another to use it? If your lists are in files listA and listB, then:

paste -d : listA listB | sed 's/\([^:]*\):\([^:]*\)/s%\1%\2%/' > sed.scriptsed -f sed.script files.to.be.mapped.*

I'm making some sweeping assumptions about 'words' not containing either colon or percent symbols, but you can adapt around that. Some versions of sed have upper bounds on the number of commands that can be specified; if that's a problem because your word lists are big enough, then you may have to split the generated sed script into separate files which are applied - or change to use something without the limit (Perl, for example).

Another item to be aware of is sequence of changes. If you want to swap two words, you need to craft your word lists carefully. In general, if you map (1) wordA to wordB and (2) wordB to wordC, it matters whether the sed script does mapping (1) before or after mapping (2).

The script shown is not careful about word boundaries; you can make it careful about them in various ways, depending on the version of sed you are using and your criteria for what constitutes a word.

bash unix scripting file-io

I needed to do something similar, and I wound up generating sed commands based on a map file:

$ cat file.mapabc => 123def => 456ghi => 789$ cat stuff.txtabc jdy kdtkdb def gbkqng pbf ghinon non nontry one abc$ sed `cat file.map | awk '{print "-e s/"$1"/"$3"/"}'`<<<"`cat stuff.txt`"123 jdy kdtkdb 456 gbkqng pbf 789non non nontry one 123

Make sure your shell supports as many parameters to sed as you have in your map.

CodeHunter

What is an efficient way to replace list of strings with another list in Unix file?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last