Chapter 9 Functions

If objects are the “nouns” of the R language, then functions are the “verbs.” This analogy isn’t perfect because technically functions are objects, too, but they are special types of objects. Functions don’t hold data or information, they do things.

It’s nearly impossible to write R code without using functions, so there are numerous examples of functions in the previous chapters. As described in R Language and Syntax, functions can usually be identified by their use of parenthesis to contain their inputs. In this chapter we will look at functions in a little more detail in the hopes of demystifying their inner workings, without digging too deep into their actual code.

Using Functions

We use functions for everything from installing and loading packages to reading in data, analyzing and graphing data, and saving results. With so many functions for so many purposes, it is important that we don’t forget they all have very similar structures.

Nearly all functions require some number of inputs, or arguments in order to run. Many times the arguments have default settings so that we do not need to provide values for every possible argument every time. You can learn about the arguments for any function by reading the function’s help file, which you can access by running the code ?function_name in the R console. For example, to learn about rnorm, which generates observations of random variables from a normal distributions, run ?rnorm. This particular help file is a bit more complicated because it combines the help files or dnorm, pnorm, qnorm, and rnorm, all of which are functions for working with the normal distribution and use many of the same arguments.

At any rate, under the Usage section, we see rnorm(n, mean = 0, sd = 1), telling us that rnorm takes three arguments: n, mean, and sd. Reading into the Arguments section, we see what those arguments represent:

n: number of observations
mean: vector of means
sd: vector of standard deviations

So, n tells R how many observations we want to generate, mean is the mean of the underlying distribution, and sd is the standard deviation of the underlying distribution. When we see mean = 0 and sd = 1 in the Usage section, we are reading the defaults. If we don’t provide a mean or standard deviation when we run the function, R will assume values of 0 and 1, respectively. For example:

rnorm(n=5)

## [1] 2.1866493 2.7822521 0.3528365 0.0514373 0.4489899

rnorm(n=5, mean=0, sd=1)

## [1] 2.1866493 2.7822521 0.3528365 0.0514373 0.4489899

We see the function produces the same values with and without the mean and sd arguments specified⁸. I can choose my own values for the arguments, though. For example, IQ scores are approximately normally distributed with mean 100 and standard deviation 15, so if I wanted to generate seven values from this distribution, I could run

rnorm(n=7, mean=100, sd=15)

## [1] 134.84225 106.60976 116.56707  91.85262 119.12863 112.35178 101.05142

R help files can be difficult to read. Being able access the help files and at least read and understand the Usage and Arguments sections will save hours of heartache over the course of a semester.

Function Outputs

Most R functions output some result from their analysis of data. These outputs are objects of various classes. In an intro class, most of these classes will be among those described in the Outputs section of More Data Structures. If you simply run a function, most of the time the output will be printed to the screen, but then it is forgotten. If you want to save outputs to recall them later, you need to save them to a named object. Again, this is something we’ve seen throughout these notes. Suppose I want to generate a bunch of observations from a theoretical IQ distribution then plot them in a histogram. This will require me (or R) to: (1) generate the values, (2) save them as an object in the envirnoment, and (3) plot the new object. In the code below, I (1) generate 1,000 observations from a normal distribution with mean 100 and standard deviation 15, (2) save these 1,000 values to a numeric vector named my_iqs⁹, and (3) create a histogram of the values saved in my_iqs.

my_iqs = rnorm(1000, mean=100, sd=15)
hist(my_iqs)

Functions and Packages

As we described in Packages, one of R’s most powerful aspects is the ability of users to contribute their own functions through packages. All functions in R are housed in packages. Many packages come pre-installed, so you do not need to access CRAN to download them. For example, packages like base, utils, and stats include functions that are so necessary for R to run or to perform basic statistical analysis that they are assumed to be needed all the time. They are installed when you download R and loaded into every R session.

You can determine which package a function is in through the help file. In the first line of the help file, above the title, you will see a line with the function name or something similar, followed by curly brackets, {}. The package the function is in is given between the brackets. For example, ?rnorm and ?mean will show that both functions come from the stats package. Similarly, ?install.packages shows the function is in the utils package and the hist function is in the graphics package (run ?hist).

Sometimes multiple packages will have functions with the same name. These functions may or may not be calculating the same thing and may or may not require the same arguments as inputs. When you use a function name that is ambiguous (functions with that name are in two or more packages loaded into your current R session), R chooses the function from the most recently loaded package. If you want to ensure you are using the function from a specific package, you can use the code package_name::function_name. For example, if you want to ensure you are using the mean function from the stats package, instead of just typing mean(my_iqs) you can use stats::mean(my_iqs). This takes a lot more work in typing and identifying packages, so most package writers try to make somewhat unique function names and not replicate functions from other packages. For example, there is no need to write a new function to calculate the mean when stats::mean works just fine.

If you are ever in a situation where R tells you a function doesn’t exist (and you’re sure the spelling is correct), chances are pretty high that you have not loaded in the package that contains the function. You can read more about this error and others in Common Error Messages.

Functions like rnorm should generate random, or technically “pseudo-random” values, so they shouldn’t produce the same numbers in consecutive runs. You are able to re-create random values by setting the “seed” of the random number generator. I did that here but hid the code. If you run those two lines of code successively in R, you will get two different results (and results which differ from what I got).↩︎
I didn’t tell R that my_iqs is a numeric vector, R is able to determine that from the output that rnorm creates.↩︎