Operations on giant files

I’ve been preparing a paper recently on the Snow Survey of Great Britain. As part of this I’ve needed to work with some elevation data. I’d usually do this using a GIS like QGIS or GRASS.

In this case I needed to check some results GRASS results. My aim was to open the file in R and sum some values. Unfortunately the raw data in x,y,z form was ~4 Gb. R tends to use twice the file size of RAM, and my work machine is equipped with 8 Gb. Needless to say it didn’t work out well with other system processes needing some as well!

Thankfully I didn’t require the x and y columns in the data file. I had a quick search online and came up with the following line of code keep the third column, but ditch the other two:

awk -F " " '{print $3}' ~/filepath/originalfile > ~/filepath/newfile

I ran this in the Linux terminal, which managed to work through a 4 gB file in ~30 secs. Impressive!

Note: the ” ” contains the file delimiter, in this case a space.

Another useful quick line is the following, this displays the top few lines of the file enabling you to check the file structure.

head ~/filepath/file

As scary as the terminal is, it’s enormously powerful and thankfully other people have cracked its dark ways so a quick google often throws up the code you’re looking for. A minor alteration later and you’ve saved loads of time!

For MS Windows users out there, don’t dispair. You can download a Linux OS image and run it from a USB drive or DVD! Amongst other things this is an excellent way to:

  • Remove difficult viruses
  • Access your file system if you’ve forgotten your password (provided your hard drive isn’t encrypted)
  • Try out a flavour of Linux to see if you like it and it runs easily on your machine
  • Have a secure OS when you’re using computers in strange places…
  • Scare your friends/colleagues by accessing their computer without a password

My Linux choice is Ubuntu, for the simple reasons that: I was introduced to it by friends and colleagues, and I like it!

Give it a shot, what are you waiting for?!

About these ads