Electronic organisation of projects

I’ve been working on a number of different small projects recently, consequently I have been improving my file management. When I’m setting up a new project/thesis chapter/paper/etc. I’ve been trying to do it in a uniform way so that it’s easy switching between them.

This is what I’ve found to work well for my folder structure:

File_structureWhere most of the folders are self explanatory, and repo contains code/scripts used for processing and plotting data. On top of this I might also add a GIS folder, but I don’t tend to do much point and click GIS work these days.

Slightly more usefully than the above I’ve been thinking about structuring code, mainly to make it more manageable and to reduce repetition. I generally settle for the following default files within a repo directory (because I work with the R language):

  • calls.R calls data.R, munging.R and plots.R to save me typing each
  • data.R contains functions/scripts to load data
  • munging.R contains functions to process data
  • plots.R contains plotting functions
  • paper.R calls plots for poster
  • poster.R calls plots for poster
  • presentation.R calls plots for presentation
  • summary.Rmd analysis notes

The paper/poster/presentation.R files all do the same thing, but plot with different graph and font sizes. With this in mind I make sure my plot functions (plots.R) let me specify those variables as well as a dataset!

I use the summary.Rmd file like a lab notebook. It uses a language called markdown which lets you format text and include code snippets and plot outputs. You can see an example here. The best thing about using markdown is that you can output to an html file, which any computer can open with no need for any software other than a web browser. It becomes really easy to share your ideas!

I’ve recently been using version control for my repo directories and haven’t looked back at all. It’s great not stressing about making big changes to code and having a fully documented change log. My tool of choice is Git using the Bitbucket service. The main reason I choose Bitbucket over Github is that you get free private repositories. You can download/view my template repo files from here: https://bitbucket.org/mikerspencer/repo

To get started using Git this is a great resource. The three main commands I use with git are (once I have set up the repo):

# Add files in your current directory
git add *
# Commit these to be uploaded with a description of the update
git commit -m "What I did to my code/document"
# Push it to the remote storage
git push origin master

I’m also using version control for my writing, in this case with LaTeX. The key is to keep each sentence on its own line. My generic article template is available to fork here: https://bitbucket.org/mikerspencer/article

So I strongly recommend making your life easier by arranging projects in a consistent and logical way and recording them with version control!

Advertisements