Json Encoder AND Decoder for complex numpy arrays Json Encoder AND Decoder for complex numpy arrays numpy numpy

Json Encoder AND Decoder for complex numpy arrays


Here is my final solution that was adapted from hpaulj's answer, and his answer to this thread: https://stackoverflow.com/a/24375113/901925

This will encode/decode arrays that are nested to arbitrary depth in nested dictionaries, of any datatype.

import base64import jsonimport numpy as npclass NumpyEncoder(json.JSONEncoder):    def default(self, obj):        """        if input object is a ndarray it will be converted into a dict holding dtype, shape and the data base64 encoded        """        if isinstance(obj, np.ndarray):            data_b64 = base64.b64encode(obj.data)            return dict(__ndarray__=data_b64,                        dtype=str(obj.dtype),                        shape=obj.shape)        # Let the base class default method raise the TypeError        return json.JSONEncoder(self, obj)def json_numpy_obj_hook(dct):    """    Decodes a previously encoded numpy ndarray    with proper shape and dtype    :param dct: (dict) json encoded ndarray    :return: (ndarray) if input was an encoded ndarray    """    if isinstance(dct, dict) and '__ndarray__' in dct:        data = base64.b64decode(dct['__ndarray__'])        return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])    return dct# Overload dump/load to default use this behavior.def dumps(*args, **kwargs):    kwargs.setdefault('cls', NumpyEncoder)    return json.dumps(*args, **kwargs)def loads(*args, **kwargs):    kwargs.setdefault('object_hook', json_numpy_obj_hook)        return json.loads(*args, **kwargs)def dump(*args, **kwargs):    kwargs.setdefault('cls', NumpyEncoder)    return json.dump(*args, **kwargs)def load(*args, **kwargs):    kwargs.setdefault('object_hook', json_numpy_obj_hook)    return json.load(*args, **kwargs)if __name__ == '__main__':    data = np.arange(3, dtype=np.complex)    one_level = {'level1': data, 'foo':'bar'}    two_level = {'level2': one_level}    dumped = dumps(two_level)    result = loads(dumped)    print '\noriginal data', data    print '\nnested dict of dict complex array', two_level    print '\ndecoded nested data', result

Which yields output:

original data [ 0.+0.j  1.+0.j  2.+0.j]nested dict of dict complex array {'level2': {'level1': array([ 0.+0.j,  1.+0.j,  2.+0.j]), 'foo': 'bar'}}decoded nested data {u'level2': {u'level1': array([ 0.+0.j,  1.+0.j,  2.+0.j]), u'foo': u'bar'}}


The accepted answer is great but has a flaw. It only works if your data is C_CONTIGUOUS. If you transpose your data, that will not be true. For example, test the following:

A = np.arange(10).reshape(2,5)A.flags# C_CONTIGUOUS : True# F_CONTIGUOUS : False# OWNDATA : False# WRITEABLE : True# ALIGNED : True# UPDATEIFCOPY : FalseA = A.transpose()#array([[0, 5],#       [1, 6],#       [2, 7],#       [3, 8],#       [4, 9]])loads(dumps(A))#array([[0, 1],#       [2, 3],#       [4, 5],#       [6, 7],#       [8, 9]])A.flags# C_CONTIGUOUS : False# F_CONTIGUOUS : True# OWNDATA : False# WRITEABLE : True# ALIGNED : True# UPDATEIFCOPY : False

To fix this, use 'np.ascontiguousarray()' when passing the object to the b64encode. Specifically, change:

data_b64 = base64.b64encode(obj.data)

TO:

data_b64 = base64.b64encode(np.ascontiguousarray(obj).data)

If I understand the function correctly, it takes no action if your data is already C_CONTIGUOUS so the only performance hit is when you have F_CONTIGUOUS data.


It's unclear just how much help you need with json encoding/decoding, or with working with numpy. For example, how did you create the complex array in the first place?

What your encoding has done is render the array as a list of lists. The decoder than has to convert that back to an array of the appropriate dtype. For example:

d = json.loads(encoded)a = np.dot(d['some_key'],np.array([1,1j]))# array([ 1.+1.j,  2.+5.j,  3.-4.j])

This isn't the only way to create such an array from this list, and it probably fails with more general shapes, but it's a start.

The next task is figuring out when to use such a routine. If you know you are going to receive such an array, then just do this decoding.

Another option is to add one or more keys to the dictionary that mark this variable as a complex nparray. One key might also encode its shape (though that is also deducible from the nesting of the list of lists).

Does this point in the right direction? Or do you need further help with each step?


One of the answers to this 'SimpleJSON and NumPy array' question

https://stackoverflow.com/a/24375113/901925

handles both the encoding and decoding of numpy arrays. It encodes a dictionary with the dtype and shape, and the array's data buffer. So the JSON string does not mean much to a human. But does handle general arrays, including ones with complex dtype.

expected and dump prints are:

[ 1.+1.j  2.+5.j  3.-4.j]{"dtype": "complex128", "shape": [3],     "__ndarray__": "AAAAAAAA8D8AAAAAAADwPwAAAAAAAABAAAAAAAAAFEAAAAAAAAAIQAAAAAAAABDA"}

The custom decoding is done with an object_hook function, which takes a dict and returns an array (if possible).

json.loads(dumped, object_hook=json_numpy_obj_hook)

Following that model, here's a crude hook that would transform every JSON array into a np.array, and every one with 2 columns into a 1d complex array:

def numpy_hook(dct):    jj = np.array([1,1j])    for k,v in dct.items():        if isinstance(v, list):            v = np.array(v)            if v.ndim==2 and v.shape[1]==2:                v = np.dot(v,jj)            dct[k] = v    return dct

It would be better, I think, to encode some dictionary key to flag a numpy array, and another to flag a complex dtype.


I can improve the hook to handle regular lists, and other array dimensions:

def numpy_hook(dct):    jj = np.array([1,1j])    for k,v in dct.items():        if isinstance(v, list):            # try to turn list into numpy array            v = np.array(v)            if v.dtype==object:                # not a normal array, don't change it                continue            if v.ndim>1 and v.shape[-1]==2:                # guess it is a complex array                # this information should be more explicit                v = np.dot(v,jj)            dct[k] = v    return dct

It handles this structure:

A = np.array([1+1j,2+5j, 3-4j])B = np.arange(12).reshape(3,4)C = A+B.Ttest = {'id': 'stream id',        'arrays': [{'A': A}, {'B': B}, {'C': C}]}

returning:

{u'arrays': [{u'A': array([ 1.+1.j,  2.+5.j,  3.-4.j])},        {u'B': array([[ 0,  1,  2,  3],                     [ 4,  5,  6,  7],                     [ 8,  9, 10, 11]])},        {u'C': array([[  1.+1.j,   6.+5.j,  11.-4.j],                     [  2.+1.j,   7.+5.j,  12.-4.j],                     [  3.+1.j,   8.+5.j,  13.-4.j],                     [  4.+1.j,   9.+5.j,  14.-4.j]])}],  u'id': u'stream id'}

Any more generality requires, I think, modifications to the encoding to make the array identity explicit.