What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?
from_tensors
combines the input and returns a dataset with a single element:
>>> t = tf.constant([[1, 2], [3, 4]])>>> ds = tf.data.Dataset.from_tensors(t)>>> [x for x in ds][<tf.Tensor: shape=(2, 2), dtype=int32, numpy= array([[1, 2], [3, 4]], dtype=int32)>]
from_tensor_slices
creates a dataset with a separate element for each row of the input tensor:
>>> t = tf.constant([[1, 2], [3, 4]])>>> ds = tf.data.Dataset.from_tensor_slices(t)>>> [x for x in ds][<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>, <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]
1) Main difference between the two is that nested elements in from_tensor_slices
must have the same dimension in 0th rank:
# exception: ValueError: Dimensions 10 and 9 are not compatibledataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([10, 4]), tf.random_uniform([9])))# OK, first dimension is samedataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([10, 4]), tf.random_uniform([10])))
2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:
dataset1 = tf.data.Dataset.from_tensor_slices( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])dataset2 = tf.data.Dataset.from_tensors( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])print(dataset1) # shapes: (2, 3)print(dataset2) # shapes: (2, 2, 3)
In the above, from_tensors
creates a 3D tensor while from_tensor_slices
merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.
3) A mentioned in the previous answer, from_tensors
convert the input tensor into one big tensor:
import tensorflow as tftf.enable_eager_execution()dataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([4, 2]), tf.random_uniform([4])))dataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([4, 2]), tf.random_uniform([4])))for i, item in enumerate(dataset1): print('element: ' + str(i + 1), item[0], item[1])print(30*'-')for i, item in enumerate(dataset2): print('element: ' + str(i + 1), item[0], item[1])
output:
element: 1 tf.Tensor(... shapes: ((2,), ()))element: 2 tf.Tensor(... shapes: ((2,), ()))element: 3 tf.Tensor(... shapes: ((2,), ()))element: 4 tf.Tensor(... shapes: ((2,), ()))-------------------------element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))
Try this :
import tensorflow as tf # 1.13.1tf.enable_eager_execution()t1 = tf.constant([[11, 22], [33, 44], [55, 66]])print("\n========= from_tensors ===========")ds = tf.data.Dataset.from_tensors(t1)print(ds.output_types, end=' : ')print(ds.output_shapes)for e in ds: print (e)print("\n========= from_tensor_slices ===========")ds = tf.data.Dataset.from_tensor_slices(t1)print(ds.output_types, end=' : ')print(ds.output_shapes)for e in ds: print (e)
output :
========= from_tensors ===========<dtype: 'int32'> : (3, 2)tf.Tensor([[11 22] [33 44] [55 66]], shape=(3, 2), dtype=int32)========= from_tensor_slices ===========<dtype: 'int32'> : (2,)tf.Tensor([11 22], shape=(2,), dtype=int32)tf.Tensor([33 44], shape=(2,), dtype=int32)tf.Tensor([55 66], shape=(2,), dtype=int32)
The output is pretty much self-explanatory but as you can see, from_tensor_slices() slices the output of (what would be the output of) from_tensors() on its first dimension. You can also try with :
t1 = tf.constant([[[11, 22], [33, 44], [55, 66]], [[110, 220], [330, 440], [550, 660]]])