How to replace all XHTML/HTML line breaks (<br>) with new lines? How to replace all XHTML/HTML line breaks (<br>) with new lines? php php

How to replace all XHTML/HTML line breaks (<br>) with new lines?


I would generally say "don't use regex to work with HTML", but, on this one, I would probably go with a regex, considering that <br> tags generally look like either :

  • <br>
  • or <br/>, with any number of spaces before the /


I suppose something like this would do the trick :

$html = 'this <br>is<br/>some<br />text <br    />!';$nl = preg_replace('#<br\s*/?>#i', "\n", $html);echo $nl;

Couple of notes :

  • starts with <br
  • followed by any number of white characters : \s*
  • optionnaly, a / : /?
  • and, finally, a >
  • and this using a case-insensitive match (#i), as <BR> would be valid in HTML


You should be using PHP_EOL constant to have platform independent newlines.

In my opinion, using non-regexp functions whenever possible makes the code more readable.

$newlineTags = array(  '<br>',  '<br/>',  '<br />',);$html = str_replace($newlineTags, PHP_EOL, $html);

I am aware this solution has some flaws, but wanted to share my insights still.


If the document is well-formed (or at least well-formed-ish) you can use the DOM extension and xpath to find and replace all br elements by a \n text node.

$in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd"><html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';$doc = new DOMDOcument;$doc->loadhtml($in);$xpath = new DOMXPath($doc);$toBeReplaced = array();foreach($xpath->query('//br') as $node) {    $toBeReplaced[] = $node;}$linebreak = $doc->createTextNode("\n");foreach($toBeReplaced as $node) {    $node->parentNode->replaceChild($linebreak->cloneNode(), $node);}echo $doc->savehtml();

prints

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html><head><title>...</title></head><body>abcdef<p>ghijkl</p></body></html>

edit: shorter version with only one iteration

$in = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd"><html><head><title>...</title></head><body>abc<br />def<p>ghi<br />jkl</p></body></html>';$doc = new DOMDOcument;$doc->loadhtml($in);$xpath = new DOMXPath($doc);$linebreak = $doc->createTextNode("\n");foreach($xpath->query('//br') as $node) {  $node->parentNode->removeChild($node);}echo $doc->savehtml();