JavaScript parser in Python [closed]
Nowadays, there is at least one better tool, called slimit
:
SlimIt is a JavaScript minifier written in Python. It compiles JavaScript into more compact code so that it downloads and runs faster.
SlimIt also provides a library that includes a JavaScript parser, lexer, pretty printer and a tree visitor.
Demo:
Imagine we have the following javascript code:
$.ajax({ type: "POST", url: 'http://www.example.com', data: { email: 'abc@g.com', phone: '9999999999', name: 'XYZ' }});
And now we need to get email
, phone
and name
values from the data
object.
The idea here would be to instantiate a slimit
parser, visit all nodes, filter all assignments and put them into the dictionary:
from slimit import astfrom slimit.parser import Parserfrom slimit.visitors import nodevisitordata = """$.ajax({ type: "POST", url: 'http://www.example.com', data: { email: 'abc@g.com', phone: '9999999999', name: 'XYZ' }});"""parser = Parser()tree = parser.parse(data)fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '') for node in nodevisitor.visit(tree) if isinstance(node, ast.Assign)}print fields
It prints:
{'name': "'XYZ'", 'url': "'http://www.example.com'", 'type': '"POST"', 'phone': "'9999999999'", 'data': '', 'email': "'abc@g.com'"}
ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.
The ANTLR site provides many grammars, including one for JavaScript.
As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).
I have translated esprima.js to Python:
https://github.com/PiotrDabkowski/pyjsparser
>>> from pyjsparser import parse>>> parse('var $ = "Hello!"'){"type": "Program","body": [ { "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "$" }, "init": { "type": "Literal", "value": "Hello!", "raw": '"Hello!"' } } ], "kind": "var" } ]}
It's a manual translation so its very fast, takes about 1 second to parse angular.js
file (so 100k characters per second). It supports whole ECMAScript 5.1 and parts of version 6 - for example Arrow functions, const
, let
.
If you need support for all the newest JS6 features you can translate esprima on the fly with Js2Py:
import js2pyesprima = js2py.require("esprima@4.0.1")esprima.parse("a = () => {return 11};")# {'body': [{'expression': {'left': {'name': 'a', 'type': 'Identifier'}, 'operator': '=', 'right': {'async': False, 'body': {'body': [{'argument': {'raw': '11', 'type': 'Literal', 'value': 11}, 'type': 'ReturnStatement'}], 'type': 'BlockStatement'}, 'expression': False, 'generator': False, 'id': None, 'params': [], 'type': 'ArrowFunctionExpression'}, 'type': 'AssignmentExpression'}, 'type': 'ExpressionStatement'}], 'sourceType': 'script', 'type': 'Program'}