Efficient way of inputting large raster data into PyTables Efficient way of inputting large raster data into PyTables numpy numpy

Efficient way of inputting large raster data into PyTables


I do not have a geotiff file, so I fiddled around with a normal tif file. You may have to omit the 3 in the shape and the slice in the writing if the data to the pytables file. Essentially, I loop over the array without reading everything into memory in one go. You have to adjust n_chunks so the chunksize that gets read in one go does not exceed your system memory.

ds=gdal.Open('infile.tif')x_total,y_total=ds.RasterXSize,ds.RasterYSizen_chunks=100f = tb.openFile('myhdf.h5','w')dataset = f.createCArray(f.root, 'mydata', atom=tb.Float32Atom(),shape=(3,y_total,x_total)#prepare the chunk indicesx_offsets=linspace(0,x_total,n_chunks).astype(int)x_offsets=zip(x_offsets[:-1],x_offsets[1:])y_offsets=linspace(0,y_total,n_chunks).astype(int)y_offsets=zip(y_offsets[:-1],y_offsets[1:])for x1,x2 in x_offsets:    for y1,y2 in y_offsets:        dataset[:,y1:y2,x1:x2]=ds.ReadAsArray(xoff=x1,yoff=y1,xsize=x2-x1, ysize=y2-y1)