preg_match() + regex does not work in TXT file preg_match() + regex does not work in TXT file curl curl

preg_match() + regex does not work in TXT file


Your action of copying and pasting the output text manually seems to have actually changed its contents. Based on the pastebin output, the direct to file version contains non-breaking space characters rather than regular spaces. The non-breaking spaces have hex code 0xA0, ascii 160, as opposed to a regular space, hex 0x20 ascii 32.

In fact, it looks as though all the space characters in the direct to file example are non-breaking 0xA0 spaces.

To reform your regular expression to be able to accommodate either type of space, you can place the hex code into a [] character class along with the regular space character ' ' as in [ \xA0] to match either type. Further, you will need the /u flag to work with unicode.

$regex = [    'mora_dia' => '/R\$[ \xA0][0-9]{1,}\.[0-9]{1,}/iu',    'multa'    => '/R\$[ \xA0][0-9]{1,},[0-9]{1,}/iu'];

(note, the , comma does not require backslash-escaping)

This works correctly, using your raw pastebin as input:

$str = file_get_contents('http://pastebin.com/raw.php?i=H7D5xJBH');preg_match('/R\$[ \xa0][0-9]{1,}\.[0-9]{1,}/ui', $str, $matches);var_dump($matches);// Prints:array(1) {  [0] =>  string(8) "R$ 3.44"}

A different solution might be to replace the non-breaking spaces with regular spaces in the entire text before applying your original regular expression:

// Replace all non-breaking spaces with regular spaces in the// text string read from the file...// The unicode non-breaking space is represented by 00A0// and both are needed to replace this successfully.$dataTxt = str_replace("\x00\xA0", " ", $dataTxt);

Whenever you have input you expect to be identical, which appears visually to be identical, be sure to inspect it with a tool capable of displaying each characters hex codes. In this case, I copied your samples from pastebin into files and inspected them with Vim, where I have setup hex and ascii display for the character under the cursor.


 $PDFParse =''; foreach ($pages as $page) {     $PDFParse = $PDFParse.$page->getText(); }

If PDFParse is string and after fwrite try fflush($file)