How Pony (ORM) does its tricks? How Pony (ORM) does its tricks? python python

How Pony (ORM) does its tricks?


Pony ORM author is here.

Pony translates Python generator into SQL query in three steps:

  1. Decompiling of generator bytecode and rebuilding generator AST(abstract syntax tree)
  2. Translation of Python AST into "abstract SQL" -- universallist-based representation of a SQL query
  3. Converting abstract SQL representation into specificdatabase-dependent SQL dialect

The most complex part is the second step, where Pony mustunderstand the "meaning" of Python expressions. Seems you are mostinterested in the first step, so let me explain how decompiling works.

Let's consider this query:

>>> from pony.orm.examples.estore import *>>> select(c for c in Customer if c.country == 'USA').show()

Which will be translated into the following SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"FROM "Customer" "c"WHERE "c"."country" = 'USA'

And below is the result of this query which will be printed out:

id|email              |password|name          |country|address  --+-------------------+--------+--------------+-------+---------1 |john@example.com   |***     |John Smith    |USA    |address 12 |matthew@example.com|***     |Matthew Reed  |USA    |address 24 |rebecca@example.com|***     |Rebecca Lawson|USA    |address 4

The select() function accepts a python generator as argument, and then analyzes its bytecode.We can get bytecode instructions of this generator using standard python dis module:

>>> gen = (c for c in Customer if c.country == 'USA')>>> import dis>>> dis.dis(gen.gi_frame.f_code)  1           0 LOAD_FAST                0 (.0)        >>    3 FOR_ITER                26 (to 32)              6 STORE_FAST               1 (c)              9 LOAD_FAST                1 (c)             12 LOAD_ATTR                0 (country)             15 LOAD_CONST               0 ('USA')             18 COMPARE_OP               2 (==)             21 POP_JUMP_IF_FALSE        3             24 LOAD_FAST                1 (c)             27 YIELD_VALUE                      28 POP_TOP                          29 JUMP_ABSOLUTE            3        >>   32 LOAD_CONST               1 (None)             35 RETURN_VALUE

Pony ORM has the function decompile() within module pony.orm.decompiling which canrestore an AST from the bytecode:

>>> from pony.orm.decompiling import decompile>>> ast, external_names = decompile(gen)

Here, we can see the textual representation of the AST nodes:

>>> astGenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

Let's now see how the decompile() function works.

The decompile() function creates a Decompiler object, which implements the Visitor pattern.The decompiler instance gets bytecode instructions one-by-one.For each instruction the decompiler object calls its own method.The name of this method is equal to the name of current bytecode instruction.

When Python calculates an expression, it uses stack, which stores an intermediateresult of calculation. The decompiler object also has its own stack,but this stack stores not the result of expression calculation,but AST node for the expression.

When decompiler method for the next bytecode instruction is called,it takes AST nodes from the stack, combines theminto a new AST node, and then puts this node on the top of the stack.

For example, let's see how the subexpression c.country == 'USA' is calculated. Thecorresponding bytecode fragment is:

              9 LOAD_FAST                1 (c)             12 LOAD_ATTR                0 (country)             15 LOAD_CONST               0 ('USA')             18 COMPARE_OP               2 (==)

So, the decompiler object does the following:

  1. Calls decompiler.LOAD_FAST('c').This method puts the Name('c') node on the top of the decompiler stack.
  2. Calls decompiler.LOAD_ATTR('country').This method takes the Name('c') node from the stack, creates the Geattr(Name('c'), 'country') node and puts it on the top of the stack.
  3. Calls decompiler.LOAD_CONST('USA').This method puts the Const('USA') node on top of the stack.
  4. Calls decompiler.COMPARE_OP('==').This method takes two nodes (Getattr and Const) from the stack,and then puts Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))])on the top of the stack.

After all bytecode instructions are processed, the decompiler stack containsa single AST node which corresponds to the whole generator expression.

Since Pony ORM needs to decompile generators and lambdas only, this is not that complex, becausethe instruction flow for a generator is relatively straightforward- it is just a bunch of nested loops.

Currently Pony ORM covers the whole generator instructions set except two things:

  1. Inline if expressions: a if b else c
  2. Compound comparisons: a < b < c

If Pony encounters such expression it raises the NotImplementedError exception. But even in this case you can make it work by passing the generator expression as a string.When you pass a generator as a string Pony doesn't use the decompiler module. Insteadit gets the AST using the standard Python compiler.parse function.

Hope this answers your question.