August 23, 2013

Debugging R Functions

Knowing how to debug functions is a critical skill if you want to work proficiently in R. In my experience, many graduate student R users aren’t aware of the debugging options. Maybe this stems from how many people learn R. They attend an introductory course or work their way through a introductory book. These courses or books typically touch on using functions in R but don’t go into detail (but see Hadley Wickham’s upcoming book Advanced R programming). These R users then develop their own expertise.

However, one of the insidious elements of self-teaching a programming language is that if a problem isn’t obvious then it’s hard to seek a solution. In the context of debugging, I’d bet that many users, and in particular those not familiar with other high-level programming languages, don’t know how much easier their R coding lives could be.

Read on for an introduction to the basic R debugging strategies that every user should be familiar with. To paraphrase Hadley from a talk he gave in Vancouver: if you don’t know these strategies, you’re coding blind.

Old-school strategies

These are the lowest-level options. Many users figure these out for themselves or learn them from other lower-level programming languages that don’t have as advanced debugging options. They are useful strategies, but only as part of a complete debugging toolbox.

Trial and error

See an error. Guess the solution. Run the code again. A fast solution for simple errors. But if the solution was obvious we wouldn’t be talking about debuggers, right?

Make your function global

Turn the arguments of a function into global objects. Step through the code and figure out the problem. For example, turn this:

f <- function(x = 1) x * 2

into this:

x <- 1
x * 2

This strategy breaks down for functions that receive many arguments. It also
creates many global objects that match function arguments names, which can
become confusing. This strategy is rarely the best solution given the options
I'll describe below.

## Add print statements

Add `print()` around key objects within your function and read the value of
the objects as the function runs. This is a robust strategy and one that users
familiar with lower-level languages (e.g. C, C++, or Fortran) often fall back
on. Keep this strategy handy, but realize that its key disadvantage is that it
isn't interactive. All you see are object values, but you can't interact with
them to deduce the cause of a problem.

# When things get complicated

The following set of functions and options can dramatically improve your
ability to solve complicated coding problems.

## traceback

When you encounter an error, your first course of action should be to run the
function `traceback()`. `traceback()` will list the chain of functions that
led to the error. This might be enough to figure out the solution and if not
you'll at least get a good idea of which function to debug.

## browser

Similar to Portlandia's "[Put a bird on it][]", my go-to solution when I'm
debugging a function is to "Put a `browser()` on it". 

To use `browser()` you add the line `browser()` somewhere in your function. For example:

times_two <- function(x) {
  x * 2

When you call the function you’ll pop out in the function wherever you put the browser() statement and have access to the local workspace:

Called from: times_two()

We can now access the local value of x:

Browse[1]> x
[1] 5

You can either work through the function line by line by typing n and hit enter in the R console, or you can send lines of code to the console line by line. When you hit enter the function will continue running until it finishes or reaches another browser() statement. Importantly, to get out, type the letter Q and hit enter. Note that you can nest browser() statements if you need to, just keep track of what function you’re inside.

Turn errors to debuggers

If you set the following option before calling your function:

options(error = recover)

then the R console will pop out in debugging mode when it encounters an error. This is equivalent to placing a browser() statement at precisely the location where the error occurs.

If the problem is associated with a warning instead of an error, then you can also set the following option so that the debugger will start on warnings:

options(warn = 2)

Debugging installed packages

What if the error stems from a function in an installed R package? There are various ways to approach this problem.

A first approach might be to set options(error = recover) and then work through the code line by line with n.

For complicated issues, it might help to have a copy of the function code in front of you. For most R functions you can start by printing the name of the function in the R console (without parentheses or arguments) and hitting enter. If the code is written in R then this will print out function code. You can then copy and paste this into a text editor. In many cases you can edit this code directly as if it were your own code, load it into the global workspace (source() it in R), and debug it.

This won’t work in all cases because of namespace issues. If all else fails, trying Googling for the package and downloading the source code. You are then free to read, modify (e.g. browser() or print() statements), and install it as needed. You might want to consider the devtools package and the load_all() or install() functions to make this process more rapid.

Error handling and recovery

What about the case where you don’t want your code to fail on an error? Maybe you’re running a series of models but not all models converge. Or maybe some iterations of a loop contain a missing value that creates an error during data manipulation. These are cases where try() and tryCatch() can be useful. try() is a simplified wrapper function for tryCatch() that prints the error and continues. tryCatch() will let you handle the error cases yourself and optionally let the function continue.

Let’s start with a simple function that throws an error if the input value is 1 and prints the input value otherwise:

f <- function(x) {
  if(x == 1) {
    stop("x cannot equal 1")
  } else {

If we feed this function values of 1 and 2 then the function will stop with an error:

y <- sapply(1:3, function(i) f(i))
Error in f(i) : x cannot equal 1

We can use try() to capture the error message but continue running the function:

y <- sapply(1:3, function(i) try(f(i)))
[1] "Error in f(i) : x cannot equal 1\n" "2" "3"

Alternatively, we could use tryCatch() to do something when we get an error. Here, we’ll return an NA value:

y <- sapply(1:3, function(i) tryCatch(f(i), error = function(e) NA))
[1] NA  2  3

You could run any simple function you’d like on an error to figure out what’s wrong. You could also dump the value of an object with save() or maybe even dump the whole workspace into a file with save.image().


Hopefully you find these debugging strategies useful. With practice debugging small R functions should become easier than debugging a large chunk of global code. Many of the options serve a similar purpose. As a result, you may choose to become familiar with one or two of them. Just keep the other strategies in your debugging toolbox for when you’re stuck. Often approaching the same problem from multiple angles is just what you need to understand the problem and find a solution.