Lubridate/ggplot date helpers

This post collates a couple of functions to help with dates. I often work with daily data which spans multiple years, but want to visualise annual patterns. To do this I can extract the julian day for each date – i.e. the day of the year. Here are a couple of ways to do this:

# Olden days
format(Sys.Date(), format="%j")

# Tidyverse
library(lubridate)
yday(Sys.Date())

This site is a great resource for more date formats. Otherwise, you can view the lubridate website for guides.

Great, so far. However, most folk like their axis labels spelt out for them and prefer to see month labels on an annual axis instead of a numeric day. Let’s grab some data to demonstrate. Here’s the citation, with download and cleaning below:

Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. 2017, updated daily. Sea Ice Index, Version 3. [N seaice extent daily]. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: https://doi.org/10.7265/N5K072F8. [2020-04-24].

library(tidyverse)

download.file("ftp://sidads.colorado.edu/DATASETS/NOAA/G02135/north/daily/data/N_seaice_extent_daily_v3.0.csv",
              "Downloads/arctic_ice.csv")

df_names = read_csv("Downloads/arctic_ice.csv",
                    col_names = F,
                    n_max = 1)

df = read_csv("Downloads/arctic_ice.csv",
              skip = 2,
              col_names = as.character(df_names)) %>% 
  janitor::clean_names() %>% 
  mutate(date = paste(year, month, day, sep = "-"),
         date = as.Date(date),
         extent = replace(extent, missing > 0, NA)) %>% 
  select(date, extent)

wrap_width = scales::wrap_format(150)
ice_cite = wrap_width("Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. 2017, updated daily. Sea Ice Index, Version 3. [N seaice extent daily]. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: https://doi.org/10.7265/N5K072F8. [2020-04-24].")

We’ve now got a two column tibble containing date and sea ice extent. We can see this by plotting our data. (Something similar can be achieved with base graphics using plot(df)):

ggplot(df, aes(date, extent)) +
    geom_line() +
    labs(title = "N. hemisphere sea ice extent",
         x = "Year",
         y = "Extent (10^6 sq km)",
         caption = ice_cite) +
  theme(text = element_text(size = 15))

What do our data look like when we overlay years? i.e. the problem posed at the beginning of this post.

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Day of year on x axis",
       x = "Day of year",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15))

The above looks fine. Perfect for exploratory data analysis. We can quickly see an annual pattern. However, other viewers may wish or expect to see month labels on the x axis. We can do this by setting up a tibble with values we’d like on the axis. With this in place we can call this as breaks and labels for the axis (no doubt there is a fancy function way of doing this). It’s not perfect, the labels appear at the start of each month and given months have differing lengths it’s not easy to place them in the middle (one option is to use the 15th of each month). It could make sense to have variable grid line spacing, where the lines match the month breaks, but this would be awkward to implement and be unexpected to viewers!

doy = date(c("2016-02-01",
             "2016-04-01",
             "2016-06-01",
             "2016-08-01",
             "2016-10-01"))
  
doy = tibble(mon = month(x, label = T),
               jul = yday(x))

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_x_continuous(breaks = doy$jul, labels = doy$mon) +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Month on x axis",
       x = "",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15))

The above is OK, but as mentioned the label position is problematic. We can solve this by hacking at the ggplot theme. We could also label the beginning of every month with this solution, but I haven’t here.

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_x_continuous(breaks = doy$jul, labels = doy$mon) +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Month on x axis",
       x = "",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15),
        axis.ticks.length.x = unit(0.5, "cm"),
        axis.text.x = element_text(vjust = 5.5,
                                   hjust = -0.2))

Finally, we can apply this idea to seasons:

season_lab = tibble(jul = yday(as.Date(c("2019-03-01",
                                     "2019-06-01",
                                     "2019-09-01",
                                     "2019-12-01"))),
                    lab = c("Spring", "Summer", "Autumn", "Winter"))

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_x_continuous(breaks = season_lab$jul, labels = season_lab$lab) +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Season on x axis",
       x = "",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15),
        axis.ticks.length.x = unit(0.5, "cm"),
        axis.text.x = element_text(vjust = 5.5,
                                   hjust = -0.2))

And even do a function to convert dates into seasons for fancy plotting/tables/etc.:

season = function(in_date){
  br = yday(as.Date(c("2019-03-01",
                      "2019-06-01",
                      "2019-09-01",
                      "2019-12-01")))
  x = yday(in_date)
  x = cut(x, breaks = c(0, br, 366))
  levels(x) = c("Winter", "Spring", "Summer", "Autumn", "Winter")
  x
}

df %>% 
  mutate(year = year(date),
         sea = season(date)) %>% 
  group_by(year, sea) %>% 
  summarise(obs = n(),
            q25 = quantile(extent, 0.25),
            q50 = quantile(extent, 0.5),
            q75 = quantile(extent, 0.75)) %>% 
  filter(obs > 40) %>% 
  ggplot(aes(year, q50)) +
  geom_pointrange(aes(ymin = q25, ymax = q75)) +
  facet_wrap(~sea, scales = "free_y") +
  labs(title = "Seasonal change in N. hemisphere sea ice extent",
       subtitle = "Showing median and interquartile range",
       x = "Year",
       y = "Extent (10^6 sq km)",
       caption = ice_cite) +
  theme(text = element_text(size = 15),
        axis.ticks.length.x = unit(0.5, "cm"),
        axis.text.x = element_text(vjust = 5.5,
                                   hjust = -0.2))

5 Comments

  1. ## very useful post but try this for month labelled x-axis
    ## DoY needs to be a date
    ## then scale_x_date(date_breaks=”months”,date_labels = “%b”) …

    df %>%
    mutate(year = year(date),
    DoY = as.Date(yday(date), “1970-01-01”)) %>% # added origin for plotting code further down …
    ggplot(aes(DoY, extent,
    group = year,
    colour = year)) +
    theme_bw() +
    geom_line() +
    scale_colour_viridis_c() +
    labs(title = “annual fluctuation in northern hemisphere sea ice extent”,
    x = “\nmonth\n”,
    y = “extent (10^6 sq km)\n”,
    colour = ” year\n”,
    caption = ice_cite) +
    theme(text = element_text(size = 15)) +
    scale_x_date(date_breaks=”months”,date_labels = “%b”) +
    theme(panel.grid.minor=element_blank()) +
    theme(plot.title=element_text(hjust=0.5),plot.margin=unit(c(0.5,0.5,0.5,0.5),”cm”)) +
    theme(axis.text=element_text(size=12),axis.title=element_text(size=12),title=element_text(size=14)) +
    theme(panel.border=element_rect(colour=”black”,fill=NA,size=0.6)) +
    theme(axis.text.y=element_text(margin=unit(c(0,2,0,0),”mm”))) +
    theme(axis.text.x=element_text(margin=unit(c(2,0,0,0),”mm”))) +
    theme(axis.text.y=element_text(angle=90,hjust=0.5))

  2. When I try :to run R script I get this :

    impossible d’ouvrir le fichier de destination ‘Downloads/arctic_ice.csv’, à cause de ‘No such file or directory’

    Please help mie …

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s