Find and replace html code for multiple files within multiple directories
You have three separate sub-problems:
- replacing text in a file
- coping with special characters
- selecting files to apply the transformation to
1. The canonical text replacement tool is sed
:
sed -e 's/PATTERN/REPLACEMENT/g' <INPUT_FILE >OUTPUT_FILE
If you have GNU sed (e.g. on Linux or Cygwin), pass -i
to transform the file in place. You can act on more than one file in the same command line.
sed -i -e 's/PATTERN/REPLACEMENT/g' FILE OTHER_FILE…
If your sed doesn't have the -i
option, you need to write to a different file and move that into place afterwards. (This is what GNU sed does behind the scenes.)
sed -e 's/PATTERN/REPLACEMENT/g' <FILE >FILE.tmpmv FILE.tmp FILE
2. If you want to replace a literal string by a literal string, you need to prefix all special characters by a backslash. For sed patterns, the special characters are .\[^$*
plus the separator for the s
command (usually /
). For sed replacement text, the special characters are \&
and newlines. You can use sed
to turn a string into a suitable pattern or replacement text.
pattern=$(printf %s "$string_to_replace" | sed -e 's![.\[^$*/]!\\&!g')replacement=$(printf %s "$replacement_string" | sed -e 's![\&]!\\&!g')
3. To act on multiple files directly in one or more directories, use shell wildcards. Your requirements don't seem completely consistent; I think these are the patterns you're looking for, but be sure to review them.
/www/mysite/board/today/[rsh][0-9][0-9][0-9]/index.html/www/mysite/person/[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9].html
This will match files like /www/mysite/board/today/r012/index.html
and /www/mysite/person/4/5/6/card/2011/h7.html
, but not /www/mysite/board/today/subdir/s012/index.html
or /www/mysite/board/today/r1234/index.html
.
If you need to act on files in subdirectories recursively, use find
. It doesn't seem to be in your requirements and this answer is long enough already, so I'll stop here.
4. Putting it all together:
string_to_replace='(div id="id")[code](/div)<--#include="(path)"-->(div id="id")[more code](/div)'replacement_string='(div id="id")<--include="(path)"-->(/div)'pattern=$(printf %s "$string_to_replace" | sed -e 's![.\[^$*/]!\\&!g')replacement=$(printf %s "$replacement_string" | sed -e 's![\&]!\\&!g')sed -i -e "s/$pattern/$replacement/g" \ /www/mysite/board/today/[rsh][0-9][0-9][0-9]/index.html \ /www/mysite/person/[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9].html
Final note: you seem to be working on HTML with regular expressions. That's often not a good idea.
Finding the files can easily be done using find -regex
:
find www/mysite/board/today -regex ".*[rsh][0-9][0-9][0-9]/index.html"find www/mysite/person -regex ".*[0-9]/[0-9]/[0-9]/card/2011/[rsh][0-9][0-9][0-9].html"
Due to nature of HTML, replacing the content might not be very easy with sed
, so I would suggest using an HTML or XML parsing library in a perl script. Can you provide a short sample of an actual html file and the result of the replacements?