Why tensorflow uses channel-last ordering instead of row-major? Why tensorflow uses channel-last ordering instead of row-major? numpy numpy

Why tensorflow uses channel-last ordering instead of row-major?


Here's the explanation:

https://www.tensorflow.org/performance/performance_guide#use_nchw_image_data_format

Image data format refers to the representation of batches of images. TensorFlow supports NHWC (TensorFlow default) and NCHW (cuDNN default). N refers to the number of images in a batch, H refers to the number of pixels in the vertical dimension, W refers to the number of pixels in the horizontal dimension, and C refers to the channels (e.g. 1 for black and white, 3 for RGB, etc.) Although cuDNN can operate on both formats, it is faster to operate in its default format.

The best practice is to build models that work with both NCHW and NHWC as it is common to train using NCHW on GPU, and then do inference with NHWC on CPU.

The very brief history of these two formats is that TensorFlow started by using NHWC because it was a little faster on CPUs. Then the TensorFlow team discovered that NCHW performs better when using the NVIDIA cuDNN library. The current recommendation is that users support both formats in their models. In the long term, we plan to rewrite graphs to make switching between the formats transparent.

Moreover, digging into the code we can see here that when the input is in the format NHWC, tensorflow converts it for you to NCHW.

  if (data_format == FORMAT_NHWC) {    // Convert the input tensor from NHWC to NCHW.    TensorShape nchw_shape =        ShapeFromFormat(FORMAT_NCHW, in_batch, in_rows, in_cols, in_depths);    if (in_depths > 1) {      Tensor transformed_input;      OP_REQUIRES_OK(ctx, ctx->allocate_temp(DataTypeToEnum<T>::value,                                             nchw_shape, &transformed_input));      functor::NHWCToNCHW<GPUDevice, T, 4>()(          ctx->eigen_device<GPUDevice>(),          const_cast<const Tensor&>(input).tensor<T, 4>(),          transformed_input.tensor<T, 4>());      input = transformed_input;    } else {      // If depth <= 1, then just reshape.      CHECK(input.CopyFrom(input, nchw_shape));    }  }

You can specify the data format you want to use for every operation but tensorflow at default doesn't use NCHW but NHWC, that's why even the TF defelopers still use NHWC to avoid to specify in every operation the format


Your question is based on a misunderstanding.

There is no contradiction between row-major and NHWC. Row-major means that the rightmost index is the one that causes the smallest jumps in memory when it changes, and changes in the leftmost index cause the biggest jumps. In row-major, the last dimension is contiguous, in column-major, the first one is. See https://en.wikipedia.org/wiki/Row-_and_column-major_order#Address_calculation_in_general for how to calculate memory offsets for arbitrary number of dimensions.

So, TF's memory IS laid out in row-major. The differences in order of the indexes are subtle (some people even prefer CHWN - see https://github.com/soumith/convnet-benchmarks/issues/66#issuecomment-155944875). NCHW is popular because it's what cudnn does best. But basically every common memory layout in DL is row-major.