Shell script - search and replace text in multiple files using a list of strings Shell script - search and replace text in multiple files using a list of strings unix unix

Shell script - search and replace text in multiple files using a list of strings


I'd convert your changesDictionary.txt file to a sed script, with... sed:

$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' \      changesDictionary.txt  > changesDictionary.sed

Note, any special characters for either regular expressions or sed expressions in your dictionary will be falsely interpreted by sed, so your dictionary can either only have only the most primitive search-and-replacements, or you'll need to maintain the sed file with valid expressions. Unfortunately, there's no easy way in sed to either shut off regular expression and use only string matching or quote your searches and replacements as "literals".

With the resulting sed script, use find and xargs -- rather than find -exec -- to convert your files with the sed script as quickly as possible, by processing them more than one at a time.

$ find somedir -type f -print0 \   | xargs -0 sed -i -f changesDictionary.sed

Note, the -i option of sed edits files "in-place", so be sure to make backups for safety, or use -i~ to create tilde-backups.

Final note, using search and replaces can have unintended consequences. Will you have searches that are substrings of other searches? Here's an example.

$ cat changesDictionary.txt"fix" = "broken""fixThat" = "Fixed"$ sed -e 's/^"\(.*\)" = "\(.*\)"$/s\/\1\/\2\/g/' changesDictionary.txt  \   | tee changesDictionary.seds/fix/broken/gs/fixThat/Fixed/g$ mkdir subdir$ echo fixThat > subdir/target.txt$ find subdir -type f -name '*.txt' -print0 \   | xargs -0 sed -i -f changesDictionary.sed$ cat subdir/target.txtbrokenThat

Should "fixThat" have become "Fixed" or "brokenThat"? Order matters for sed script. Similarly, a search and replace can be search and replaced more than once -- changing "a" to "b", may be changed by another search-and-replace later from "b" to "c".

Perhaps you've already considered both of these, but I mention because I've tried what you were doing before and didn't think of it. I don't know of anything that simply does the right thing for doing multiple search and replacements at once. So, you need to program it to do the right thing yourself.


Here are the basic steps I would do

  1. Copy the changesDictionary.txt file
  2. In it replace "a"="b" to the equivalent sed line: e.g. (use $1 for the file name)

    sed -e 's/a/b/g' $1

    (you could write a script to do this or just do it by hand, if you just need to do this once and it's not too big).

  3. If the files are all in one directory, then you can do something like:

    ls *.txt | xargs scriptFromStep2.sh

  4. If they are in subdirs, use a find to call that script on all of the files, something like

    find . -name '*.txt' -exec scriptFromStep2.sh {} \;

These aren't exact, do some experiments to make sure you get it right -- it's just the approach I would use.

(but, if you can, just use perl, it would be a lot simpler)


Use this tool, which is written in Perl - with quite a lot of bells and whistles - oldie, but goodie:

http://unixgods.org/~tilo/replace_string/

Features:

  • do multiple search-replace or query-search-replace operations
  • search-replace expressions can be given on the command line or read from a file
  • processes multiple input files
  • recursively descend into directory and do multiple search/replace operations on all files
  • user defined perl expressions are applied to each line of each input file
  • optionally run in paragraph mode (for multi-line search/replace)
  • interactive mode
  • batch mode
  • optionally backup files and backup numbering
  • preserve modes/owner when run as root
  • ignore symbolic links, empty files, write protected files, sockets, named pipes, and directory names
  • optionally replace lines only matching / not matching a given regular expression

This script has been used quite extensively over the years with large data sets.