Chapter 5 Loading and Saving Data

Most of what you need to know about loading and saving data is covered in Computer File Systems and Working Directories. When you want to access a dataset or save information to your computer, you need to go outside of R to get or save a file to a certain location. The ability to use absolute and/or relative paths is paramount.

Beyond locating a file, how to work with data files depends on file types. While there are built-in functions for loading and saving common types of data, there are also extension packages for other file types. Some of the more popular ones are readxl for working with Microsoft Excel files and haven for working with data from other statistical packages such as SAS, Stata, and SPSS. Functions in these packages work similarly to read.csv and write.csv discussed below, with special options particular to the various data file types.

Loading and Saving RData Files

Files with extension .RData hold data and other information already formatted for R, so they are the easiest to open and save, with self-explanatory functions load and save.

Suppose I am working in a directory that includes a sub-directory called data which hold all of my data, including a file dataset1.RData. To load the RData file into my R session, I simply type

load('data/dataset1.RData')

See how I am using the relative path from my working directory to the file and I am enclosing that path in quotation marks. The data I load into my R session may not be named dataset1 - this is the name of the file containing the dataset, not the dataset itself. An RData file can hold multiple datasets in it, so the file name and dataset name do not need to match.

If I have results from an analysis that I want to save into an RData file, I use the function save. Say I have a dataset named mydata which I have updated to a new version, mydata2 and I want to save both datasets into an RData file. I would type

save(file='data/clean-data.RData', mydata, mydata2)

This code tells R to save mydata and mydata2 in a file called clean-data.RData that sits inside the data folder. Now when I re-open R and load clean-data.RData, both mydata and mydata2 will be available.

Loading and Saving csv Files

CSV, or comma separated value, files are one of the most common and universally accepted ways to store and share data. This means there are built-in functions in R to work with them. The main ones are read.csv to read data into R and write.csv to save an R dataset to a csv file.

Let’s say you have the same setup as above with another file, dataset3.csv in the data folder. To read in the data, use

mydata3 = read.csv('data/dataset3.csv', header=TRUE)

This will read the file dataset3.csv and assign it to an R object named mydata3. Since the dataset isn’t already an R dataset, we need to use mydata3 = to assign an R name to the dataset. In addition, the header=TRUE option reads in the first row of the file as the column names rather than a row of data. If the first row of the file was data, then use header=FALSE. There are other options in the read.csv function, but this should cover about 80-90% of cases.

If you’ve done some work on a dataset and want to save it as a csv, say to share it with others, you use write.csv. Unlike the save function, write.csv can only handle one dataset at a time. To save mydata3 in a new csv file, use

write.csv(file='data/dataset3-updated.csv', x=mydata3, row.names=FALSE)

This code will save mydata3 as a new csv file in the data folder named dataset3-updated.csv. The row.names=FALSE option is almost always what you want. If row.names=TRUE, then the first column of the new csv file will be the the row names (or row numbers if the rows aren’t explicitly named, which is most likely). This adds a column to the dataset that didn’t exist before and is probably redundant, since R and other software already uses row numbers for any dataset. If the row names of your dataset are important, they should be saved as a column of the dataset instead of row names.