How to unpack pkl file?

Generally

Your pkl file is, in fact, a serialized pickle file, which means it has been dumped using Python's pickle module.

To un-pickle the data you can:

import picklewith open('serialized.pkl', 'rb') as f:    data = pickle.load(f)

For the MNIST data set

Note gzip is only needed if the file is compressed:

import gzipimport picklewith gzip.open('mnist.pkl.gz', 'rb') as f:    train_set, valid_set, test_set = pickle.load(f)

Where each set can be further divided (i.e. for the training set):

train_x, train_y = train_set

Those would be the inputs (digits) and outputs (labels) of your sets.

If you want to display the digits:

import matplotlib.cm as cmimport matplotlib.pyplot as pltplt.imshow(train_x[0].reshape((28, 28)), cmap=cm.Greys_r)plt.show()

mnist_digit

The other alternative would be to look at the original data:

http://yann.lecun.com/exdb/mnist/

But that will be harder, as you'll need to create a program to read the binary data in those files. So I recommend you to use Python, and load the data with pickle. As you've seen, it's very easy. ;-)

python pickle deep-learning mnist

Handy one-liner

pkl() (  python -c 'import pickle,sys;d=pickle.load(open(sys.argv[1],"rb"));print(d)' "$1")pkl my.pkl

Will print __str__ for the pickled object.

The generic problem of visualizing an object is of course undefined, so if __str__ is not enough, you will need a custom script.

python pickle deep-learning mnist

In case you want to work with the original MNIST files, here is how you can deserialize them.

If you haven't downloaded the files yet, do that first by running the following in the terminal:

wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gzwget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gzwget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gzwget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

Then save the following as deserialize.py and run it.

import numpy as npimport gzipIMG_DIM = 28def decode_image_file(fname):    result = []    n_bytes_per_img = IMG_DIM*IMG_DIM    with gzip.open(fname, 'rb') as f:        bytes_ = f.read()        data = bytes_[16:]        if len(data) % n_bytes_per_img != 0:            raise Exception('Something wrong with the file')        result = np.frombuffer(data, dtype=np.uint8).reshape(            len(bytes_)//n_bytes_per_img, n_bytes_per_img)    return resultdef decode_label_file(fname):    result = []    with gzip.open(fname, 'rb') as f:        bytes_ = f.read()        data = bytes_[8:]        result = np.frombuffer(data, dtype=np.uint8)    return resulttrain_images = decode_image_file('train-images-idx3-ubyte.gz')train_labels = decode_label_file('train-labels-idx1-ubyte.gz')test_images = decode_image_file('t10k-images-idx3-ubyte.gz')test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')

The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is

train_images = train_images/255test_images = test_images/255

CodeHunter

How to unpack pkl file?

Generally

For the MNIST data set

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last