TensorFlow create dataset from numpy array TensorFlow create dataset from numpy array python python

TensorFlow create dataset from numpy array


The Dataset object is only part of the MNIST tutorial, not the main TensorFlow library.

You can see where it is defined here:

GitHub Link

The constructor accepts an images and labels argument so presumably you can pass your own values there.


Recently, Tensorflow added a feature to its dataset api to consume numpy array. See here for details.

Here is the snippet that I copied from there:

# Load the training data into two NumPy arrays, for example using `np.load()`.with np.load("/var/data/training_data.npy") as data:  features = data["features"]  labels = data["labels"]# Assume that each row of `features` corresponds to the same row as `labels`.assert features.shape[0] == labels.shape[0]features_placeholder = tf.placeholder(features.dtype, features.shape)labels_placeholder = tf.placeholder(labels.dtype, labels.shape)dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))# [Other transformations on `dataset`...]dataset = ...iterator = dataset.make_initializable_iterator()sess.run(iterator.initializer, feed_dict={features_placeholder: features,                                          labels_placeholder: labels})


As a alternative, you may use the function tf.train.batch() to create a batch of your data and at the same time eliminate the use of tf.placeholder. Refer to the documentation for more details.

>>> images = tf.constant(X, dtype=tf.float32) # X is a np.array>>> labels = tf.constant(y, dtype=tf.int32)   # y is a np.array>>> batch_images, batch_labels = tf.train.batch([images, labels], batch_size=32, capacity=300, enqueue_many=True)