Writing to a database (R and SQLite)

Some time ago I wrote a post on preparing data for a database. Since then I’ve not been idle, just busy not writing for my blog! This article is a follow up and describes how to add data to a database. For this I’ll use R for the leg work and SQLite as the database. In reality as SQLite uses SQL you could also used any other SQL based database and it wouldn’t take much work to use something like Python instead of R to write data.

Once you’ve got R installed (it’s free and runs on pretty much all operating systems) you should make sure you can access SQLite appropriately. This will almost certainly already be installed on your computer as it’s a backend to web browsers and other software. There’s also an excellent plugin, SQLite-manager, for Firefox that lets you access SQLite. The best place to get this is through Firefox add-ons.

To access SQLite through R you’ll need the RSQLite package, add this to the start of your script (first line you only need to do once):

install.packages("RSQLite")
library(RSQLite)

I created the shell of my database using SQLite-manager, i.e. set up the most basic tables, but you can equally do this through RSQLite. From the proviso that you’ve already created your database file, connect to it from R by:

db = dbConnect(SQLite(), dbname="~/directory/your-db.sqlite")

If you haven’t already created your database, the dbConnect line above will create it for you and you can set up a table using the below. There is included provision to set a primary key and also specify data types (INT = interger). Avoid using . in column or table names – this will confuse SQLite.

dbSendQuery(conn=db,
   "CREATE TABLE SSGB_obs
   (Date DATETIME,
   Station TEXT,
   Snowline TEXT,
   PRIMARY KEY (Date, Station))
")

You can then do basic tasks like checking what tables are available, or what columns are in a table:

# List tables in your database
dbListTables(db)

# List columns in a table
dbListFields(db, "SSGB_obs")

Super basic, but what about writing to the database, as promised with the blog title?! For my work I have 60+ csv files of Snow Survey data with columns belonging to different stations and rows for different days. To make matters more complicated the station columns don’t always appear in the same order and the same stations don’t occur each year.

After a discussion with the excellent Magi Hagdorn I settled on a table format with columns of Date, Station and Snowline. This format makes most sense for database querying and managing data, and it’s relatively simple to transform data to a format I need.

To deal with this I wrote a short script to read each csv file and write it to the SQLite database. This script works by first listing all the files in the directory, then reading in each file in that list and within that file processing each column at a time to be written to the database in the required format.

# List files
f = list.files("./csv/clean/")
# Read csvs
# Format for database
# Write to database
for (i in f){
  x = read.table(paste0("./csv/clean/", i), sep ="|", stringsAsFactors=F)
  for (j in names(x[-1])){
    Date = x[-1,1]
    Station = x[1,j]
    Snowline = x[-1,j]
    data = cbind.data.frame(Date,Station,Snowline)
    dbWriteTable(conn=db, name="SSGB_obs", data, append=T, row.names=F)
  }
}

In a handful of lines of code I went from nearly unuseable data to only needing a few lines of SQL to be able to use all my data! In future blog posts I’ll post some tips for working with your new SQLite database through R, and perhaps more excitingly – Scottish snow cover trends from all the Snow Survey sites.

As a teaser, here’s how you query (all data) from you SSGB_obs table in your new database:

x = dbGetQuery(db, "
   SELECT * FROM SSGB_obs
"))
Advertisements