Find and replace html code for multiple files within multiple directories Find and replace html code for multiple files within multiple directories unix unix

Find and replace html code for multiple files within multiple directories


You have three separate sub-problems:

  1. replacing text in a file
  2. coping with special characters
  3. selecting files to apply the transformation to

​1. The canonical text replacement tool is sed:

sed -e 's/PATTERN/REPLACEMENT/g' <INPUT_FILE >OUTPUT_FILE

If you have GNU sed (e.g. on Linux or Cygwin), pass -i to transform the file in place. You can act on more than one file in the same command line.

sed -i -e 's/PATTERN/REPLACEMENT/g' FILE OTHER_FILE…

If your sed doesn't have the -i option, you need to write to a different file and move that into place afterwards. (This is what GNU sed does behind the scenes.)

sed -e 's/PATTERN/REPLACEMENT/g' <FILE >FILE.tmpmv FILE.tmp FILE

​2. If you want to replace a literal string by a literal string, you need to prefix all special characters by a backslash. For sed patterns, the special characters are .\[^$* plus the separator for the s command (usually /). For sed replacement text, the special characters are \& and newlines. You can use sed to turn a string into a suitable pattern or replacement text.

pattern=$(printf %s "$string_to_replace" | sed -e 's![.\[^$*/]!\\&!g')replacement=$(printf %s "$replacement_string" | sed -e 's![\&]!\\&!g')

​3. To act on multiple files directly in one or more directories, use shell wildcards. Your requirements don't seem completely consistent; I think these are the patterns you're looking for, but be sure to review them.

/www/mysite/board/today/[rsh][0-9][0-9][0-9]/index.html/www/mysite/person/[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9].html

This will match files like /www/mysite/board/today/r012/index.html and /www/mysite/person/4/5/6/card/2011/h7.html, but not /www/mysite/board/today/subdir/s012/index.html or /www/mysite/board/today/r1234/index.html.

If you need to act on files in subdirectories recursively, use find. It doesn't seem to be in your requirements and this answer is long enough already, so I'll stop here.

​4. Putting it all together:

string_to_replace='(div id="id")[code](/div)<--#include="(path)"-->(div id="id")[more code](/div)'replacement_string='(div id="id")<--include="(path)"-->(/div)'pattern=$(printf %s "$string_to_replace" | sed -e 's![.\[^$*/]!\\&!g')replacement=$(printf %s "$replacement_string" | sed -e 's![\&]!\\&!g')sed -i -e "s/$pattern/$replacement/g" \  /www/mysite/board/today/[rsh][0-9][0-9][0-9]/index.html \  /www/mysite/person/[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9].html

Final note: you seem to be working on HTML with regular expressions. That's often not a good idea.


Finding the files can easily be done using find -regex:

find www/mysite/board/today -regex ".*[rsh][0-9][0-9][0-9]/index.html"find www/mysite/person -regex ".*[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9][0-9][0-9].html"

Due to nature of HTML, replacing the content might not be very easy with sed, so I would suggest using an HTML or XML parsing library in a perl script. Can you provide a short sample of an actual html file and the result of the replacements?