Reading hdf files into R and converting them to geoTIFF rasters Reading hdf files into R and converting them to geoTIFF rasters r r

Reading hdf files into R and converting them to geoTIFF rasters


Ok, so my MODIS hdf files were hdf4 rather than hdf5 format. It was surprisingly difficult to discover this, MODIS don't mention it on their website but there are a few hints in various blogs and stack exchange posts. In the end I had to download HDFView to find out for sure.

R doesn't do hdf4 files and pretty much all the packages (like rgdal) only support hdf5 files. There are a few posts about downloading drivers and compiling rgdal from source but it all seemed rather complicated and the posts were for MAC or Unix and I'm using Windows.

Basically gdal_translate from the gdalUtils package is the saving grace for anyone who wants to use hdf4 files in R. It converts hdf4 files into geoTIFFs without reading them into R. This means that you can't manipulate them at all e.g. by cropping them, so its worth getting the smallest tiles you can (for MODIS data through something like Reverb) to minimise computing time.

Here's and example of the code:

library(gdalUtils)# Provides detailed data on hdf4 files but takes agesgdalinfo("MOD17A3H.A2000001.h21v09.006.2015141183401.hdf")# Tells me what subdatasets are within my hdf4 MODIS files and makes them into a listsds <- get_subdatasets("MOD17A3H.A2000001.h21v09.006.2015141183401.hdf")sds[1] "HDF4_EOS:EOS_GRID:MOD17A3H.A2000001.h21v09.006.2015141183401.hdf:MOD_Grid_MOD17A3H:Npp_500m"   [2] "HDF4_EOS:EOS_GRID:MOD17A3H.A2000001.h21v09.006.2015141183401.hdf:MOD_Grid_MOD17A3H:Npp_QC_500m"# I'm only interested in the first subdataset and I can use gdal_translate to convert it to a .tifgdal_translate(sds[1], dst_dataset = "NPP2000.tif")# Load and plot the new .tifrast <- raster("NPP2000.tif")plot(rast)# If you have lots of files then you can make a loop to do all this for youfiles <- dir(pattern = ".hdf")files [1] "MOD17A3H.A2000001.h21v09.006.2015141183401.hdf" "MOD17A3H.A2001001.h21v09.006.2015148124025.hdf" [3] "MOD17A3H.A2002001.h21v09.006.2015153182349.hdf" "MOD17A3H.A2003001.h21v09.006.2015166203852.hdf" [5] "MOD17A3H.A2004001.h21v09.006.2015099031743.hdf" "MOD17A3H.A2005001.h21v09.006.2015113012334.hdf" [7] "MOD17A3H.A2006001.h21v09.006.2015125163852.hdf" "MOD17A3H.A2007001.h21v09.006.2015169164508.hdf" [9] "MOD17A3H.A2008001.h21v09.006.2015186104744.hdf" "MOD17A3H.A2009001.h21v09.006.2015198113503.hdf"[11] "MOD17A3H.A2010001.h21v09.006.2015216071137.hdf" "MOD17A3H.A2011001.h21v09.006.2015230092603.hdf"[13] "MOD17A3H.A2012001.h21v09.006.2015254070417.hdf" "MOD17A3H.A2013001.h21v09.006.2015272075433.hdf"[15] "MOD17A3H.A2014001.h21v09.006.2015295062210.hdf"filename <- substr(files,11,14)filename <- paste0("NPP", filename, ".tif")filename[1] "NPP2000.tif" "NPP2001.tif" "NPP2002.tif" "NPP2003.tif" "NPP2004.tif" "NPP2005.tif" "NPP2006.tif" "NPP2007.tif" "NPP2008.tif"[10] "NPP2009.tif" "NPP2010.tif" "NPP2011.tif" "NPP2012.tif" "NPP2013.tif" "NPP2014.tif"i <- 1for (i in 1:15){  sds <- get_subdatasets(files[i])  gdal_translate(sds[1], dst_dataset = filename[i])}

Now you can read your .tif files into R using, for example, raster from the raster package and work as normal. I've checked the resulting files against a few I converted manually using QGIS and they match so I'm confident the code is doing what I think it is. Thanks to Loïc Dutrieux and this for the help!


The following worked for me. It's a short program and just takes in the input folder name. Make sure you know which sub data you want. I was interested in sub data 1.

library(raster)library(gdalUtils)inpath <- "E:/aster200102/ast_200102"setwd(inpath)filenames <- list.files(,pattern=".hdf$",full.names = FALSE)for (filename in filenames){     sds <- get_subdatasets(filename)     gdal_translate(sds[1], dst_dataset=paste0(substr(filename, 1, nchar(filename)-4) ,".tif"))}


This script has been very useful and I managed to convert a batch of 36 files using it. However, my problem is that the conversion does not seem correct. When I do it using ArcGIS 'Make NetCDF Raster Layer tool', I get different results + I am able to convert the numbers to C from Kelvin using simple formula: RasterValue * 0.02 - 273.15. With the results from R conversion I don't get the right results after conversion which leads me to believe ArcGIS conversion is good, and R conversion returns an error.

library(gdalUtils)library(raster)setwd("D:/Data/Climate/MODIS")# Get a list of sds namessds <- get_subdatasets('MOD11C3.A2009001.006.2016006051904.hdf')# Isolate the name of the first sdsname <- sds[1]filename <- 'Rasterinr.tif'gdal_translate(sds[1], dst_dataset = filename)# Load the Geotiff created into Rr <- raster(filename)# Identify files to read:rlist=list.files(getwd(), pattern="hdf$", full.names=FALSE)# Substract last 5 digits from MODIS filename for use in a new .img filenamesubstrRight <- function(x, n){        substr(x, nchar(x)-n+1, nchar(x))}filenames0 <- substrRight(rlist,9)# Suffixes for MODIS files for identyfication:filenamessuffix <- substr(filenames0,1,5)listofnewnames <- c("2009.01.MODIS_","2009.02.MODIS_","2009.03.MODIS_","2009.04.MODIS_","2009.05.MODIS_",                    "2009.06.MODIS_","2009.07.MODIS_","2009.08.MODIS_","2009.09.MODIS_","2009.10.MODIS_",                    "2009.11.MODIS_","2009.12.MODIS_",                    "2010.01.MODIS_","2010.02.MODIS_","2010.03.MODIS_","2010.04.MODIS_","2010.05.MODIS_",                    "2010.06.MODIS_","2010.07.MODIS_","2010.08.MODIS_","2010.09.MODIS_","2010.10.MODIS_",                    "2010.11.MODIS_","2010.12.MODIS_",                    "2011.01.MODIS_","2011.02.MODIS_","2011.03.MODIS_","2011.04.MODIS_","2011.05.MODIS_",                    "2011.06.MODIS_","2011.07.MODIS_","2011.08.MODIS_","2011.09.MODIS_","2011.10.MODIS_",                    "2011.11.MODIS_","2011.12.MODIS_")# Final new names for converted files:newnames <- vector()for (i in 1:length(listofnewnames)) {        newnames[i] <- paste0(listofnewnames[i],filenamessuffix[i],".img")}# Loop converting files to raster from NetCDFfor (i in 1:length(rlist)) {        sds <- get_subdatasets(rlist[i])        gdal_translate(sds[1], dst_dataset = newnames[i])}