Efficient Array Storage for Binary Tree

arrays algorithm data-structures binary-tree

One method which I like is to store the preorder traversal, but also include the 'null' nodes in there. Storing the 'null' nodes removes the need for also storing the inorder of the tree.

Some advantages of this method

You can do better storage than pre/post + inorder method in most practical cases.
Serialization just takes one traversal
Deserialization can be done in one pass.
The inorder traversal can be gotten in one pass without constructing the tree, which might be useful if the situation calls for it.

For example say you had a binary tree of 64 bit integers, you can store an extra bit after each node saying whether the next is a null node or not (the first node is always the root). Null nodes, you can represent by a single bit.

So if there are n nodes, the space usage would be 8n bytes + n-1 indicator bits + n+1 bits for null nodes = 66*n bits.

In the pre/post + inorder you will end up using 16n bytes= 128*n bits.

So you save a space of 62*n bits over this pre/post + inorder method.

Consider the tree

       100      /   \     /     \    /       \   10       200  / \       /  \ .   .     150  300          / \    / \         .   .   .  .

where the '.' are the null nodes.

You will serialize it as 100 10 . . 200 150 . . 300 . .

Now each (including subtrees) 'preorder traversal with null' has the property that number of null nodes = number of nodes + 1.

This allows you to create the tree, given the serialized version in one pass, as the first node is the root of the tree. Nodes that follow are the left subtree followed by right, which can be viewed to be like this:

100 (10 . .) (200 (150 . .) (300 . .))

To create the inorder traversal, you use a stack and push when you see a node and pop (onto a list) when you see a null. The resulting list is the inorder traversal (a detailed explanation for this can be found here: C++/C/Java: Anagrams - from original string to target;).

arrays algorithm data-structures binary-tree

Think about XML. It's a kind of tree serialization. For example:

<node id="1">    <node id="2">                                   1    </node>                                       /   \    <node id="3">                                2     3        <node id="4">                                 / \        </node>                                      4   5        <node id="5">        </node>    </node></node>

Then, why the spaces and tags ? We can omit them, step by step:

<1>   <2></>   <3>     <4></>     <5></>   </></>

Remove the spaces: <1><2></2><3><4></><5></></></>.

Remove the angle brackets: 12/34/5///

Now the problem is: what if a node has a empty left subtree and non-empty right subtree?Then we can use another special charactor, '#' to represent an empty left sub-tree.

For example:

    1  /   \      2     /  \    3

This tree can be serialized as: 1#23///.

arrays algorithm data-structures binary-tree

The 2i, 2i+1 (Binary Heap) method is indeed the best way if you have a (nearly) complete tree.

Otherwise you won't escape storing a ParentId (parent Index) with each node.

CodeHunter

Efficient Array Storage for Binary Tree

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last