MODIS: the many rastered beast, a how to guide

I’ve finally bitten the bullet and brought my data sources up to 2014 using snow cover data from the MODIS project. MODIS flies on two satellites, Aqua and Terra with the latter being space-borne from 2000 and Aqua following two years later. Each satellite collects data from Scotland at least once a day and these are processed to gridded images of various variables. Obviously I am interested in the snow outputs.

Given there are lots of satellites collecting lots of data I thought it would be a relatively straightforward thing to process, albeit requiring a respectable amount of computing time. It turns out not, so here’s my step by step guide to getting something useful out of MODIS data. Many parts of this tutorial will work with other raster time series data, adapt as you see fit!

First to get MODIS data: download tools are available, but these limit you to 1000 files at a time and are a bit of a faff. You can adapt the following script to download via ftp and run it from the command line. For this to work you’ll need lftp installed (sudo apt-get install lftp) and I’m assuming you’re on a Linux box. Obviously you’ll need to adjust the ftp and location depending on your requirements. The following script downloads two files per day: h17v02 and h17v03, to add more just put more numbers in the square brackets.

# download Aqua
lftp -c 'open -e "mirror -I *h17v0[23]*hdf . ." ftp://n5eil01u.ecs.nsidc.org/SAN/MOSA/MYD10A1.005/'
# download Terra
lftp -c 'open -e "mirror -I *h17v0[23]*hdf . ." ftp://n5eil01u.ecs.nsidc.org/SAN/MOST/MOD10A1.005/'

Be warned: the above download can take a long time! For both satellites it took longer than overnight.

The next jobs you might have are:

  1. Translating to a useful format, hdf4 is a total pain!
  2. Merging the two daily images together (you can skip this step if your subject area is contained on one tile)
  3. Using a local projection
  4. Clipping to subject area to reduce file size.

For these I used the excellent gdal package. This is a bedrock of most GIS: QGIS, GRASS, ArcGIS for reading and writing data, but is a standalone tool which you can run from the command line. I chose to run it from R (as it is what I know) but you could also script this in Python, bash, etc.. I recommend trialling this on a data subset first! To make life easier rename the Terra folder Terr, so it is four characters long, like Aqua… crude – but it works. Thanks for hints from this StackExchange page.

# ---------------------------------------
# Raster merge script
# ---------------------------------------

# ---------------------------------------
# Set up
# ---------------------------------------
setwd("~/Directory/MODIS/data/")
install.packages(c("rgdal", "raster", "gdalUtils"))
library(rgdal)
library(gdalUtils)
library(raster)

# ---------------------------------------
# Merge all
# ---------------------------------------

# Find all directories
d = list.dirs(".", recursive=T, full.names=T)
d = d[nchar(d)>10]

# Define subject area in target projection
coord = c(048900, 527000, 475000, 1260953.928)

# Write hdf to single OSGB tif
# Set up a timer so you can estimate how long it will take (based on a subset)
ptm = proc.time()
# Loop through directories
lapply(d, function(i){
   # Progress indicator
   print(i)
   # File list
   f = list.files(i, full.names=T, recursive=T)
   # Check folder has enough files
   if(length(f)==2){
   # Define new file name
   n = paste0(substr(f[1], 1, 18), substr(f[1], 28, 34))
   # Process each folder
   for(j in 1:2){
      # Convert to tif
      # use gdal_info to check which hdf layer you need and adjust sd_index
      gdal_translate(f[j], paste0(n, j, ".tif"), sd_index=1)
   }
   # Merge
   system(paste0("gdal_merge.py ", paste0(n, 1, ".tif"), " ", paste0(n, 2, ".tif"), " -o ", paste0(n, ".tif")))
   # Reproject to OSGB and clip to subject area
   gdalwarp(paste0(n, ".tif"), paste0(n, "OSGB", ".tif"), s_srs="+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs", t_srs="EPSG:27700", srcnodata=-3000, dstnodata=-3000, te=coord)
   # Delete mid step tifs
   f = list.files(i, full.names=T, recursive=T, pattern="tif")
   # Vary this depending on your directory length and choice of file names
   f = f[nchar(f)<31]
   file.remove(f)
   } else {
      print("no data")
   }
})
proc.time() - ptm
rm(ptm)

# ---------------------------------------
# Remove files to rerun test
# ---------------------------------------
f = list.files(".", recursive=T, pattern="tif", full.names=T)
file.remove(f)
rm(f)

So now you should have a lot less disk space and a lot more useable files! Next step is to put them somewhere you can access them. For this I recommend GRASS as a 30+ year mature GIS. Before completing the next section be sure to set up your GRASS location and mapset. To import data I again wrote my scripting in R, purely because it’s what I know. You could easily do this in bash or Python. Key thing is to run GRASS from the command line and then call R from inside GRASS (just ‘R’ and return at the prompt will do it). You can then source your own version of the following code to read all your new tif files into GRASS: source(“~/Directory/MODIS/repo/yourfile.R”)

# ---------------------------------------
# Write to GRASS
# ---------------------------------------

# ---------------------------------------
# Set up
# ---------------------------------------
install.packages("spgrass6")
library(spgrass6)

# ---------------------------------------
# List files
# ---------------------------------------
f = list.files("~/Documents/MODIS/data", recursive=T, full.names=T, pattern="tif")

# ---------------------------------------
# Write to GRASS
# ---------------------------------------
lapply(f, function(i){
   # Write to GRASS
   system(paste0("r.in.gdal input=", i, " output=", substr(i, 32, 35), substr(i, 37, 46)))
})

If you’ve chosen to work with both satellites you might want to merge them together, to reduce cloud pixels for example. You can do this with the GRASS tool r.mapcalc. Here’s an example to only show cloud cells (hence improvement of combining satellite), again scripted from R:

# ---------------------------------------
# Combine satellites
# ---------------------------------------

# ---------------------------------------
# Set up
# ---------------------------------------
library(spgrass6)

# List dates with data from both satellites
# use intern=T to write to R variable
f = execGRASS("g.mlist", parameters = list(type = "rast"), intern=T)
f = as.data.frame(table(substr(f, 5, 14)))
f = f[f$Freq == 2, 1]

# Recode to reduce cloud cells
lapply(f, function(i){
   system(paste0("r.mapcalc 'Comb", i, " = if((Aqua", i, " == 50 && Terr", i, " == 50), 50)
})

Great! But you’re still faced with 1000s of rasters, how can you get anything out of them? Well you have access to the full catelogue of GRASS raster functions. One way is to use r.stats and reduce each raster to a count of cell types. To do this I’m writing the output from each raster to a SQLite database. Again using R from within GRASS:

# ---------------------------------------
# Write raster summary to a database
# ---------------------------------------

# ---------------------------------------
# Set up
# ---------------------------------------
#install.packages("RSQLite")
library(spgrass6)
library(RSQLite)

# ---------------------------------------
# Generate file list
# ---------------------------------------
f = execGRASS("g.mlist", parameters = list(type = "rast"), intern=T)

# ---------------------------------------
# Connect to database (and create table)
# ---------------------------------------
db = dbConnect("SQLite", dbname="~/Directory/MODIS/data/MODIS.sqlite")

dbSendQuery(conn=db,
   "CREATE TABLE data
   (Date DATETIME,
   Sat TEXT,
   Obs TEXT,
   Area INT)
")

# ---------------------------------------
# Loop through rasters
# ---------------------------------------
lapply(f, function(i){
   # Raster summary
   x = execGRASS("r.stats", flags="a", parameters=list(input=i, nv="NA"), intern=T)
   x = as.data.frame(do.call("rbind", strsplit(x, " ")))
   # Convert m2 to km2
   x[,2] = round(as.numeric(levels(x[,2])[x[,2]])/1000000)
   # Generate db entry
   x = data.frame(Date=substr(i, 5, 14), Sat=substr(i, 1, 4), Obs=x[,1], Area=x[,2])
   dbWriteTable(conn=db, name="data", x, append=T, row.names=F)
   }
})

# ---------------------------------------
# Convert Date to date format
# ---------------------------------------
dbSendQuery(conn=db, "
   UPDATE Snow SET Date=replace(Date, '.', '-')
")

# ---------------------------------------
# Remove mask and disconnect from db
# ---------------------------------------
system("r.mask -r")
dbDisconnect(db)

That’s it, you’re basically done! All you need to do now is query your new database and plot the results (running R from wherever):

# ---------------------------------------
# Plot results
# ---------------------------------------

# ---------------------------------------
# Set up
# ---------------------------------------
setwd("~/Directory/MODIS/")
MODIS.db = dbConnect("SQLite", dbname="~/Directory/MODIS/data/MODIS.sqlite")

# ---------------------------------------
# Read sum of cloud cells per month from db
# ---------------------------------------
Cloud = dbGetQuery(MODIS.db, "
   SELECT Date, Sat, Sum(Area) AS Area FROM data
   WHERE Obs=50
   GROUP BY strftime('%Y-%m', Date), Sat
")
# Make Date a Date object
Cloud$Date = as.Date(Cloud$Date)

# ---------------------------------------
# Plot Cloud cell scatter
# ---------------------------------------
x = merge(Cloud[Cloud$Sat=="Comb",], Cloud[Cloud$Sat=="Terr",], by ="Date")
x = merge(x, Cloud[Cloud$Sat=="Aqua",], by ="Date")
x = x[,c(1,3,5,7)]
colnames(x) = c("Date", "Comb", "Terr", "Aqua")

# Mean % improvement per month
y = (100* mean(x$Comb - x$Terr)) / mean(x$Terr)

par(mfrow=c(2,1), mar=c(5, 4, 2, 2))
# Terra vs Aqua
plot(Aqua ~ Terr, x, xlab="Terra (monthly area sum)", ylab="Aqua (monthly area sum)")
# Add 1:1 line
abline(0,1)
# Terra and Aqua vs Comb
plot(Comb ~ Terr, x, xlab="Terra or Aqua (monthly area sum)", ylab="Combined (monthly area sum)", col="red")
points(Comb ~ Aqua, x, col="blue")
# Add 1:1 line
abline(0,1)
legend("topleft", c("Terra", "Aqua"), pch=1, col=c("red", "blue"))
text(1800000, 480000, paste("% improvement = ", round(y)))

And you’re done, your plot should look something like this: merging the satellites reduced the cloud cells by ~13%. Good luck with your own satellite work.

MODIS cloud observations of Scotland, Terra, Aqua and a combination.

MODIS cloud observations of Scotland, Terra, Aqua and a combination.

Advertisements