RegEX (preg_match_all) to retrieve Authenticity Token from Hidden Login Form RegEX (preg_match_all) to retrieve Authenticity Token from Hidden Login Form curl curl

RegEX (preg_match_all) to retrieve Authenticity Token from Hidden Login Form


I'd try not to use a regex and instead extract it from the DOM using PHP's standard DOMDocument XML manipulation library:

$doc = new DOMDocument();$doc->loadHTML($html);$xpath = new DOMXPath($doc);$query = '//form[contains(@class, "user_session_form")]/input[contains(@name, "authenticity_token")]';$inputs= $xpath->query($query);foreach($inputs as $input) {    echo $input->getAttribute('value');}

The $query variable is an xpath query.


You could use this regex to get the authenticity token.
It comes out in capture group 4.

It doesn't matter the order of the attribute-values, this gets them
anywhere in the valid input tag.

(?s)<input(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\stype\s*=\s*(?:(['"])\s*hidden\s*\1))(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*(?:(['"])\s*authenticity_token\s*\2))(?=(?:[^>"']|"[^"]*"|'[^']*')*?\svalue\s*=\s*(?:(['"])\s*(.*?)\s*\3))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

https://regex101.com/r/NCjFxc/1

Quoting

Single, Tilde as regex delimiter:
'~(?s)<input(?=\s)(?=(?:[^>"\']|"[^"]*"|\'[^\']*\')*?\stype\s*=\s*(?:([\'"])\s*hidden\s*\1))(?=(?:[^>"\']|"[^"]*"|\'[^\']*\')*?\sname\s*=\s*(?:([\'"])\s*authenticity_token\s*\2))(?=(?:[^>"\']|"[^"]*"|\'[^\']*\')*?\svalue\s*=\s*(?:([\'"])\s*(.*?)\s*\3))\s+(?:"[\S\s]*?"|\'[\S\s]*?\'|[^>]*?)+>~'

Double, Tilde as regex delimiter:
"~(?s)<input(?=\\s)(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\stype\\s*=\\s*(?:(['\"])\\s*hidden\\s*\\1))(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\sname\\s*=\\s*(?:(['\"])\\s*authenticity_token\\s*\\2))(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\svalue\\s*=\\s*(?:(['\"])\\s*(.*?)\\s*\\3))\\s+(?:\"[\\S\\s]*?\"|'[\\S\\s]*?'|[^>]*?)+>~"

Readable version

 (?s) # Begin Input tag < input                # input tag (?= \s ) (?=                    # Type Hidden (a pseudo atomic group)      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?      \s type \s* = \s*      # Type      (?:           ( ['"] )               # (1), Quote           \s* hidden \s*         # Hidden           \1       ) ) (?=                    # Name authenticity_token      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?      \s name \s* = \s*      # Name      (?:           ( ['"] )               # (2), Quote           \s* authenticity_token \s*   # "Authenticity Token"           \2       ) ) (?=                    # Value of authenticity_token      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?      \s value \s* = \s*     # Value      (?:           ( ['"] )               # (3), Quote           \s*            ( .*? )                # (4), Authenticity Token Value            \s*            \3       ) ) # Have the Authenticity Token, just match the rest of tag \s+  (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+ >                      # End tag


You're trying to match

<input type="hidden" name="authenticity_token" value="{$token}"/>

Your pattern is:

"/<input type=\"hidden\" value=(.*?)\" name=\"authenticity_token\">/i"

Do you see it?

It should be: "<input type=\"hidden\" name=\"authenticity_token\" value=\"([^"]+)\"\/>"

Edit: If being able to match without being constrained to a particular attribute order is important:

<input (?:(?:type=\"hidden\"|name=\"authenticity_token\"|value=\"([^"]+)\"|(?!(?:name|type|value))[^=]+=\"[^"]+\")\s*)+

Will consume without capturing, any attribute and its value that is not named "type," or "name," which it requires to exist as type="hidden" and name="authenticity_token", and if the attribute "value" is encountered, its value will be captured in capture group 1.

Edit 2: preg_match() and preg_replace() etc. will require delimiters at the beginning and start of the pattern: http://php.net/manual/en/regexp.reference.delimiters.php

So you would simply encapsulate the expression like so: "/<expression>/" or "~<expression>~" where <expression> is your regex.