TensorFlow create dataset from numpy array
The Dataset object is only part of the MNIST tutorial, not the main TensorFlow library.
You can see where it is defined here:
The constructor accepts an images and labels argument so presumably you can pass your own values there.
Recently, Tensorflow added a feature to its dataset api to consume numpy array. See here for details.
Here is the snippet that I copied from there:
# Load the training data into two NumPy arrays, for example using `np.load()`.with np.load("/var/data/training_data.npy") as data: features = data["features"] labels = data["labels"]# Assume that each row of `features` corresponds to the same row as `labels`.assert features.shape[0] == labels.shape[0]features_placeholder = tf.placeholder(features.dtype, features.shape)labels_placeholder = tf.placeholder(labels.dtype, labels.shape)dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))# [Other transformations on `dataset`...]dataset = ...iterator = dataset.make_initializable_iterator()sess.run(iterator.initializer, feed_dict={features_placeholder: features, labels_placeholder: labels})
As a alternative, you may use the function tf.train.batch()
to create a batch of your data and at the same time eliminate the use of tf.placeholder
. Refer to the documentation for more details.
>>> images = tf.constant(X, dtype=tf.float32) # X is a np.array>>> labels = tf.constant(y, dtype=tf.int32) # y is a np.array>>> batch_images, batch_labels = tf.train.batch([images, labels], batch_size=32, capacity=300, enqueue_many=True)