Chapter 4 Computer File Systems and Working Directories

In my experience, the thing students learning R have the hardest time with actually doesn’t have much to do with R. The biggest issue is often locating and managing files on their computer. When using R, you often have to access data from files on your computer and save results to other files on your computer. Knowing where these files are and how to get to them is important if you are ever going to get off the ground. This topic isn’t quite as important for students using R Studio Cloud since the file system there is much simpler, but it will still help answer many questions.

Computer Files and Directories

Every file on your computer has a name, an extension, and a location. The file name is straightforward - this is the name you give to the file when you save it.

File extensions are the letters that come after the file name and are separated from the file name by a period. They tell the type of file the file is. For example, files ending in .docx (or .doc for older files) are Microsoft Word files. Similarly, .xlsx (or .xls) are Microsoft Excel files. Many internet websites are saved as .html (hypertext markup language). It is common to have datasets in comma separated value files, which have the extension .csv. Though .csv files usually open in Excel, they are not Excel files and do not support much of the formatting that is available in Excel, such as bold or color text, cell borders, or cell fill colors.

R script (code) files have a .R extension and R Markdown files have a .Rmd extension. Most R data files have a .RData or .rda extension. Knowing the extensions tells you which program the file will likely be opened in.

The location of a file is where in the computer’s file system the file can be found. We often think of files as residing in folders because of the common folder icon. The formal name for a file folder on a computer is directory. As an example, all the files for this guide are in a directory on my computer named uris. This in turn is in a directory projects which is in my Documents directory, and so on. The full location of these files is C:\Users\travi\OneDrive\Documents\projects\uris\, where each back-slash (\) is separating directory names, each directory containing the next in the list. The back-slash convention is unique to Windows operating system. R and most other systems use forward-slashes, so for R to interpret the location above, I would need to replace the back-slashes with forward-slashes: C:/Users/travi/OneDrive/Documents/projects/uris/.

R Working Directories

Any time you begin R, it is looking in a certain folder for files. This folder is called the working directory and can be changed as needed. You can get the current working directory by running the code

getwd()

(standing for get working directory) in the console.

To change the working directory, you can either type out the new directory using the setwd function such as

setwd("path/to/new/working/directory")

or you can use the R Studio GUI menus and go to Session -> Set Working Directory and choose from the options there. Choose Directory might be the most straightforward option to use for now, even if it isn’t the most efficient.

While it’s possible to change your working directory any time you need to get to a file that isn’t in the current working directory, that is time consuming and might be difficult to repeat at another time. Instead, you can ask R to open files from other directories without changing the working directory.

Absolute and Relative Paths

Because we can have multiple files on a computer with the same name, if we want to access any file on our computer we need to tell R (or any other program) the location of the file as well as the name and extension. There are two ways to do this.

First, let’s suppose my computer has a file system that starts in the C drive and looks like the location I gave above for the files for this guide. Within the C drive, there is my user directory, travi, a directory Documents, and a directory projects. Now let’s say within projects, I have a directory for every analysis I am doing. For example, maybe a directory named NHIS-PHQ for one project and YRBS-PA for another. Within each of the project folders, I have a data folder where I store the original data and any codebooks and a code folder where I put my R code. The structure would start with C:/travi/Documents/ then look something like:

  • projects
    • NHIS-PHQ
      • code
        • analysis.R
      • data
        • nhis2019.csv
    • YRBS-PA
      • code
        • analysis.R
      • data
        • yrbs2019.csv

The absolute path to a file is the full location of the file on your computer. You can think of the absolute path as a latitude and longitude of a place on earth. The latitude and longitude identify a specific place no matter where you are on earth at the time. Similarly, a absolute path tells you where a file exists on your computer and allows you to access the file from anywhere. In Windows, the absolute path will likely start with the C drive, as it does above in the location of the files for this guide. If I was working on the NHIS project and wanted to refer to the data using an absolute path, it would look like C:/travi/Documents/projects/nhis-phq/data/nhis2019.csv.

Absolute paths are pretty straightforward, but they can take a while to write out. Additionally, if you move the nhis-phq folder somewhere else, like an archives folder, the absolute path won’t work anymore - all the files have a new location and you will need to update your code to this location.

The relative path to a file is a set of directions to get from the directory you are in (R’s working directory) to a specific file. It follows the same approach as the absolute path, with directories separates by forward slashes, but it starts in the working directory rather than the drive or highest-level directory. Think of the relative path like you would think of driving directions: they depend on where you want to go, but also where you are starting. One additional point in relative paths is going up a level. In absolute paths, you are digging one level deeper at each step. In a relative path, you may need to back up a level or two before you can get to your file. To back up a level, use ../ and if you need to back up more levels, add additional ../.

For example, say you are working on the nhis-phq project and your working directory is the code directory. In order to access the dataset nhis2019.csv, you need to go up to the nhis-phq directory, then down into the data directory to get to the data. The relative path from the code directory to the nhis2019.csv file is ../data/nhis2019.csv.

If you wanted to get from the nhis-phq/code directory to yrbs2019.csv, the relative path would be ../../yrbs-pa/data/yrbs2019.csv. The sequence ../../ moves you from the code directory up to nhis-phq then up to projects. From there, you go down into yrbs-pa and so on.

Relative paths are a bit more complicated than absolute paths. Unlike absolute paths, relative paths depend on where you start. It’s important to remember this is your working directory, which isn’t necessarily where your code file is located3.

Benefits of relative paths include they are usually shorter to type and if you move an entire project’s directory (such as to an archives folder like in the previous example) the relative paths will still work. Relative paths are also better if you are working on a cloud system that syncs to multiple computers. Since the different computers may have different file structures, the absolute paths might differ from one computer to the next. However, the relative paths should be the same as long as they never leave the synced directory.

IMPORTANT: Absolute paths are like latitude and longitude of point A. Relative paths are like driving directions from point A to point B.

Overall, the use of absolute and relative paths might come down to preference (with the exception of cloud-synced directories). Absolute paths are easier but more tedious to type out, whereas relative paths are quicker but more likely to run into issues from starting in different working directories.


  1. If you start R by opening a code file, your working directory is the location of your file. This is how I prefer to work so I always set my relative paths assuming the working directory and code file location are the same. It’s not perfect, but it works 95% of the time and when it doesn’t it’s an easy fix.↩︎