Why Pytorch officially use mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] to normalize images? Why Pytorch officially use mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] to normalize images? python python

Why Pytorch officially use mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] to normalize images?

Using the mean and std of Imagenet is a common practice. They are calculated based on millions of images. If you want to train from scratch on your own dataset, you can calculate the new mean and std. Otherwise, using the Imagenet pretrianed model with its own mean and std is recommended.

In that example, they are using the mean and stddev of ImageNet, but if you look at their MNIST examples, the mean and stddev are 1-dimensional (since the inputs are greyscale-- no RGB channels).

Whether or not to use ImageNet's mean and stddev depends on your data. Assuming your data are ordinary photos of "natural scenes" (people, buildings, animals, varied lighting/angles/backgrounds, etc.), and assuming your dataset is biased in the same way ImageNet is (in terms of class balance), then it's ok to normalize with ImageNet's scene statistics. If the photos are "special" somehow (color filtered, contrast adjusted, uncommon lighting, etc.) or an "un-natural subject" (medical images, satellite imagery, hand drawings, etc.) then I would recommend correctly normalizing your dataset before model training!*

Here's some sample code to get you started:

import osimport torchfrom torchvision import datasets, transformsfrom torch.utils.data.dataset import Datasetfrom tqdm.notebook import tqdmfrom time import timeN_CHANNELS = 1dataset = datasets.MNIST("data", download=True,                 train=True, transform=transforms.ToTensor())full_loader = torch.utils.data.DataLoader(dataset, shuffle=False, num_workers=os.cpu_count())before = time()mean = torch.zeros(1)std = torch.zeros(1)print('==> Computing mean and std..')for inputs, _labels in tqdm(full_loader):    for i in range(N_CHANNELS):        mean[i] += inputs[:,i,:,:].mean()        std[i] += inputs[:,i,:,:].std()mean.div_(len(dataset))std.div_(len(dataset))print(mean, std)print("time elapsed: ", time()-before)

In computer vision, "Natural scene" has a specific meaning which isn't related to nature vs man-made, see https://en.wikipedia.org/wiki/Natural_scene_perception

* Otherwise you run into optimization problems due to elongations in the loss function-- see my answer here.

I wasn't able to calculate the standard deviation as planned, but did it using the code below. The grayscale imagenet's train dataset mean and standard deviation are (round it as much as you like):

Mean: 0.44531356896770125

Standard Deviation: 0.2692461874154524

def calcSTD(d):    meanValue = 0.44531356896770125    squaredError = 0    numberOfPixels = 0    for f in os.listdir("/home/imagenet/ILSVRC/Data/CLS-LOC/train/"+str(d)+"/"):         if f.endswith(".JPEG"):                        image = imread("/home/imagenet/ILSVRC/Data/CLS-LOC/train/"+str(d)+"/"+str(f))                            ###Transform to gray if not already gray anyways              if  np.array(image).ndim == 3:                matrix = np.array(image)                blue = matrix[:,:,0]/255                green = matrix[:,:,1]/255                red = matrix[:,:,2]/255                gray = (0.2989 * red + 0.587 * green + 0.114 * blue)            else:                gray = np.array(image)/255            ###----------------------------------------------------                                       for line in gray:                for pixel in line:                    squaredError += (pixel-meanValue)**2                    numberOfPixels += 1        return (squaredError, numberOfPixels)a_pool = multiprocessing.Pool()folders = [][folders.append(f.name) for f in os.scandir("/home/imagenet/ILSVRC/Data/CLS-LOC/train") if f.is_dir()]resultStD = a_pool.map(calcSTD, folders)StD = (sum([intensity[0] for intensity in resultStD])/sum([pixels[1] for pixels in resultStD]))**0.5print(StD)

Source: https://stackoverflow.com/a/65717887/7156266