How to load a list of numpy arrays to pytorch dataset loader?

python numpy pytorch

I think what DataLoader actually requires is an input that subclasses Dataset. You can either write your own dataset class that subclasses Datasetor use TensorDataset as I have done below:

import torchimport numpy as npfrom torch.utils.data import TensorDataset, DataLoadermy_x = [np.array([[1.0,2],[3,4]]),np.array([[5.,6],[7,8]])] # a list of numpy arraysmy_y = [np.array([4.]), np.array([2.])] # another list of numpy arrays (targets)tensor_x = torch.Tensor(my_x) # transform to torch tensortensor_y = torch.Tensor(my_y)my_dataset = TensorDataset(tensor_x,tensor_y) # create your datsetmy_dataloader = DataLoader(my_dataset) # create your dataloader

Works for me. Hope it helps you.

python numpy pytorch

Since you have images you probably want to perform transformations on them. So TensorDataset is not the best option here. Instead you can create your own Dataset. Something like this:

import torchfrom torchvision import transformsfrom torch.utils.data import Dataset, DataLoaderimport numpy as npfrom PIL import Imageclass MyDataset(Dataset):    def __init__(self, data, targets, transform=None):        self.data = data        self.targets = torch.LongTensor(targets)        self.transform = transform            def __getitem__(self, index):        x = self.data[index]        y = self.targets[index]                if self.transform:            x = Image.fromarray(self.data[index].astype(np.uint8).transpose(1,2,0))            x = self.transform(x)                return x, y        def __len__(self):        return len(self.data)# Let's create 10 RGB images of size 128x128 and 10 labels {0, 1}data = list(np.random.randint(0, 255, size=(10, 3, 128, 128)))targets = list(np.random.randint(2, size=(10)))transform = transforms.Compose([transforms.Resize(64), transforms.ToTensor()])dataset = MyDataset(data, targets, transform=transform)dataloader = DataLoader(dataset, batch_size=5)

python numpy pytorch

PyTorch DataLoader need a DataSet as you can check in the docs. The right way to do that is to use:

torch.utils.data.TensorDataset(*tensors)

Which is a Dataset for wrapping tensors, where each sample will be retrieved by indexing tensors along the first dimension.The parameters *tensors means tensors that have the same size of the first dimension.

The other class torch.utils.data.Dataset is an abstract class.

Here is how to convert numpy arrays to tensors:

import torchimport numpy as npn = np.arange(10)print(n) #[0 1 2 3 4 5 6 7 8 9]t1 = torch.Tensor(n)  # as torch.float32print(t1) #tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])t2 = torch.from_numpy(n)  # as torch.int32print(t2) #tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=torch.int32)

The accepted answer used the torch.Tensor construct.If you have an image with pixels from 0-255 you may use this:

timg = torch.from_numpy(img).float()

Or torchvision to_tensor method, that converts a PIL Image or numpy.ndarray to tensor.

But here is a little trick you can put your numpy arrays directly.

x1 = np.array([1,2,3])d1 = DataLoader( x1, batch_size=3)

This also works, but if you print d1.dataset type:

print(type(d1.dataset)) # <class 'numpy.ndarray'>

While we actually need Tensors for working with CUDA so it is better to use Tensors to feed the DataLoader.

CodeHunter

How to load a list of numpy arrays to pytorch dataset loader?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last