R Recipes: A Problem-Solution Approach (2014)

Chapter 15. Dealing with Big Data

Datasets are big and getting bigger, as has been mentioned previously in this book. People talk of the five Vs of big data, which include volumevarietyvelocityveracity (or lack thereof), and value (or lack thereof). The value gleaned from big data may be short-lived, simply because of the velocity with which the data can change. To deal with bigger data, we need faster processing for larger datasets. To accomplish this, we can use parallel processing, speed up the operations of our processing, use more efficient algorithms, or some combination of these approaches. In this chapter, you will learn how to use the R packages to perform parallel processing, how to extend (and speed up) the capabilities of the traditional R data frame by using data tables instead, and how to speed up computations in R by using compiled code from C++ as well as by preallocating result objects.

Recipe 15-1. Parallel R

Problem

Computing requirements increase when datasets become larger and computations become more intensive. One way to increase computing speed is to distribute the computing tasks to two or more processing units. This is the basic idea behind dual-core and multi-core processors. To the extent that the processing task can be divided among different processors without creating interference or requiring extensive communication among the processors, parallel processing can be more efficient than sequential processing. The ideal is what is known as “embarrassingly parallel” or “perfectly parallel” processing, in which little or no effort is required to separate the problem into a number of parallel problems. In the ideal situation, there is no overhead associated with parallel processing, so a single sequential process that takes 20 seconds to run should take 5 seconds to run with four parallel processing units.

Parallel processing can be accomplished in different ways. We can use a multi-core processor on a single machine, combine multiple workstations (as in a computer lab), or use computer clusters—as in a supercomputer. In R, parallel processing works by establishing a cluster of processing units, each of which is an instance of R, one of which one is the primary (“master processor”), and the others  are the secondary (“worker processors”).

Parallel R involves the creation of a cluster of R processes on the same or different machines. The master processor is the one in which we create the cluster, and the worker processors wait for and process the parts of the processing task distributed to them. The master processor consolidates the results from the separate workers and produces the final output. In the real world, unlike the ideal one, there is overhead associated with parallel processing, so some tasks do not benefit from parallel processing because of overhead, while computation-intensive processes can benefit substantially.

Solution

Parallel computing can be either implicit or explicit, depending on the level at which the hardware and software come together. For example, the pnmath package by Luke Tierney uses the OpenMP parallel processing directives of recent compilers for implicit parallelism by replacing some internal R functions with substitute functions that can make use of multiple cores without any explicit requests from the user. There are several other such packages that provide implicit parallelism, but our discussion will focus on explicit parallelism.

Unlike implicit parallelism, explicit parallelism requires a communications layer. The first R package providing this capability was rpvm by Na Li and Tony Rossini. This package used the PVM (Parallel Virtual Machine) standard and libraries. More recently, the MPI (Message Passing Interface) has become more popular. It is supported in R via the Rmpi package written by Hao Yu. There are other standards, including NWS (NetWorkSpaces), and sockets. A socket cluster is a set of copies of R running in parallel, often on the same machine, though it is also possible to specify the host names on which the worker processes are to be executed as well. If the sockets are to be formed on the same machine, a positive integer specifies the number of socket clusters. For example, if we use the snow package written by Luke Tierney, and we have a quad-core machine, we can set up four clusters as follows. The original R process, that is, the one in which we are typing the commands, is the master (or manager) process. We are setting up four new worker processes, one on each of our processors. Each process is a separate R session, so we are creating the ability to run multiple R sessions at the same time (or in parallel). See that each of these clusters is running on localhost, the standard network name for the local machine.

> install.packages("snow")
> library(snow)
> cl <- makeCluster(type = "SOCK", 4)
> cl
[[1]]
$con
        description               class                mode                text
"<-Larry2in1:11720"          "sockconn"               "a+b"            "binary"
             opened            can read           can write
           "opened"               "yes"               "yes"

$host
[1] "localhost"

$rank
[1] 1

attr(,"class")
[1] "SOCKnode"

[[2]]
$con
        description               class                mode                text
"<-Larry2in1:11720"          "sockconn"               "a+b"            "binary"
             opened            can read           can write
           "opened"               "yes"               "yes"

$host
[1] "localhost"

$rank
[1] 2

attr(,"class")
[1] "SOCKnode"

[[3]]
$con
        description               class                mode                text
"<-Larry2in1:11720"          "sockconn"               "a+b"            "binary"
             opened            can read           can write
           "opened"               "yes"               "yes"

$host
[1] "localhost"

$rank
[1] 3

attr(,"class")
[1] "SOCKnode"

[[4]]
$con
        description               class                mode                text
"<-Larry2in1:11720"          "sockconn"               "a+b"            "binary"
             opened            can read           can write
           "opened"               "yes"               "yes"

$host
[1] "localhost"

$rank
[1] 4

attr(,"class")
[1] "SOCKnode"

attr(,"class")
[1] "SOCKcluster" "cluster"

The acronym snow stands for Simple Network of Workstations. The snow package uses a “scatter and gather” approach in which the manager process distributes “chunks” of the task to the worker processors and then gathers the results and combines them. In addition to socket clusters,snow supports MPI, NWS, and PVM, as discussed earlier. There are now many other R packages that support either explicit or implicit parallelism. The snowfall package written by Jochen Knaus is built on snow and uses its network and cluster capabilities. The snowfall package provides the ability to connect directly to R-specific clusters via the sfCluster function and gives the user additional flexibility, such as command-line arguments for configuration. Tierney has more recently authored the parallel package, and there are several additional packages for parallel computing available, including the doParallel and doSnow packages written by Steve Weston at Revolution Analytics, who is also the author of the foreach package. The foreach package provides an idiom for iterating over the elements in a collection without the use of explicit loop counting. The main intention is to produce a return value, and in that sense, foreach is similar to lapply, which we have discussed previously. The doParallel package provides support for the foreach package and acts as an interface between foreach andparallel.

Among the many choices available, we will begin with the use of the doParallel package.  The doParallel package requires the user to register a parallel “backend.” A backend in this context is the explicit notification to the %dopar% operator that there is a cluster of worker processes available to it. If the backend is not registered, the foreach package will run the tasks sequentially rather than in parallel. After installing doParallel and loading it, we create a cluster, register the backend, and then use foreach to distribute the parts of the task to the “worker” processors. The %dopar% operator makes the processing parallel, while the %do% operator makes it sequential. First, let us make sure we have the required package and load it:

> install.packages("doParallel")
> library(doParallel)
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Loading required package: parallel"

Observe that the doParallel package requires the foreach, iterators, and parallel packages. Let us begin with a very simple “proof of concept” demonstration before getting down to more serious computations. Simply loading the doParallel package is not sufficient. If you load the package and then use the %dopar% operator, the processing will still be sequential, as shown next. You must register the backend once per session, in which case the clusters you have created will work in parallel. If the backend is not registered, the processing will continue to be sequential, though the warning to that effect will appear only once per session. As we discussed earlier, we must use makeCluster with a positive integer (or with a list of host names) to specify the number of worker processes.

> cluster <- makeCluster(2)
> foreach(i=1:5) %dopar% i^(i+1)
[[1]]
[1] 1

[[2]]
[1] 8

[[3]]
[1] 81

[[4]]
[1] 1024

[[5]]
[1] 15625

Warning message:
executing %dopar% sequentially: no parallel backend registered

So, we must register the backend only once per session. We will now have parallel processing with our proof of concept example:

> clusters <- makeCluster(2)
> registerDoParallel(clusters)
> foreach(i=1:5) %dopar% i^(i+1)
[[1]]
[1] 1

[[2]]
[1] 8

[[3]]
[1] 81

[[4]]
[1] 1024

[[5]]
[1] 15625

The defaults in doParallel are reasonable, and will work for most users most of the time. To find out how many worker processes you have access to, you can use the getDoParWorkers function:

> getDoParWorkers()
[1] 2

You can also use doParallel with multi-core functionality, but you will be limited to the number of physical cores available to you. For example, the Toshiba laptop computer on which I am working at the moment has a dual-core processor. See what happens when I try to use more than two processors:

> options(cores = 2)
> getDoParWorkers()
[1] 2
> options(cores = 4)
> getDoParWorkers()
[1] 2

Having changed offices in the middle of the day, I am now working on a machine with a quad-core processor. Note that I can now use all four cores:

> options(cores = 4)
> getDoParWorkers()
[1] 4

Now that you understand the basics, we will use a less trivial example. As those who have worked with large datasets or computer-intensive calculations well know, R is not a speed demon compared to compiled languages. The use of vectorized operations and functions yields speed efficiencies, but there are times when R is slow compared to other languages. Remember that R is interpreted, not compiled. You can connect R with compiled code from other languages, such as C++, a subject upon which we will touch very lightly in Recipe 15-3. Meanwhile, let us examine the use of parallel R and determine what kinds of speed improvements we might experience.

Adapting Weston’s “toy example” in the doParallel documentation, we will use two clusters to perform a bootstrapping problem, and compare the results of sequential and parallel processing by using the system.time function. We will run 10,000 bootstrap iterations in parallel and compare the results to the same task performed sequentially. The system.time function will give us the elapsed time, and the [3] index will return it at the end of the processing. Here is our code. First, let us use the iris dataset and select 100 rows of data with the two speciesversicolor and virginica. We will use the lm function to predict sepal length using the species as the predictor.

> x <- iris[which(iris[,5] != "setosa"), c(1, 5)]
> head(x)
   Sepal.Length    Species
51          7.0 versicolor
52          6.4 versicolor
53          6.9 versicolor
54          5.5 versicolor
55          6.5 versicolor
56          5.7 versicolor
> tail(x)
    Sepal.Length   Species
145          6.7 virginica
146          6.7 virginica
147          6.3 virginica
148          6.5 virginica
149          6.2 virginica
150          5.9 virginica

Now, we can run the process in parallel using two clusters:

> clusters <- makeCluster(2)
> registerDoParallel(clusters)
trials <- 10000
parallelTime <- system.time({
        r <- foreach(icount(trials), .combine=cbind) %dopar% {
                ind <- sample(100, 100, replace = TRUE)
        result1 <- lm(x[ind,1]~x[ind,2])
        coefficients(result1)
        }
})[3]
parallelTime
elapsed
  25.59

By changing %dopar% to %do%, we can process sequentially for comparison purposes:

> trials <- 10000
> sequentialTime <- system.time({
        r <- foreach(icount(trials), .combine=cbind) %do% {
        ind <- sample(100, 100, replace = TRUE)
        result1 <- lm(x[ind,1]~x[ind,2])
        coefficients(result1)
        }
 })[3]
sequentialTime
elapsed
  33.89

Parallel processing with two clusters was about 24.5% faster than sequential processing in this case. Remember that this is simply the performance of my particular machine and its dual-core processor. Your results are very likely to be different.

The doSNOW package works in a similar way to doParallel, allowing the user to register the snow parallel backend for the foreach package. In this case, we will specify socket clusters. In this particular instance, doSNOW produced slightly faster parallel processing thandoParallel:

clusters <- makeCluster(2, type = type = "SOCK")
registerDoSNOW(clusters)
trials <- 10000
parallelTime <- system.time({
        r <- foreach(icount(trials), .combine=cbind) %dopar% {
                ind <- sample(100, 100, replace = TRUE)
        result1 <- lm(x[ind,1]~x[ind,2])
        coefficients(result1)
        }
})[3]
> parallelTime
elapsed
  22.36

Recipe 15-2. Using Data Tables

Problem

One of the drawbacks of R is that the traditional data frame becomes cumbersome and slow with very large datasets. The data.table package provides additional features to the data frame that make it faster and easier to use. The data table created by the function is also a data frame, so all the things you can do with data frames work with tables as well, with a few exceptions as noted next.

Solution

Those familiar with creating data frames and subsetting them, as we have discussed in this book, will find the data table to be intuitive. We create data tables in the same way that we create data frames. Observe that the data table uses colons to separate the row numbers from the first column of data.

> install.packages("data.table")
> library(data.table)
> DataFrame <- data.frame(x = c("b","b","b","a","a"),v = rnorm(5))
> DataFrame
  x          v
1 b -1.1006306
2 b -1.3294956
3 b -0.2006320
4 a  1.6582277
5 a  0.1527988
> DataTable <- data.table(x = c("b","b","b","a","a"),v = rnorm(5))
> DataTable
   x          v
1: b -0.1329364
2: b  0.6700842
3: b  0.4488872
4: a  0.2374386
5: a  1.5203938

Instead of creating a data table from scratch as before, you can convert any existing data frame to a data table. To illustrate, we will download the SPSS version of the General Satisfaction Survey discussed earlier and then import it using the Hmisc package. The result will be the expected data frame, which we can then convert to a data table.

> install.packages("Hmisc")
> library(Hmisc)
> GSSdataframe <- spss.get("GSS2012Merged_R5.sav", use.value.labels = TRUE)

As we did before, we receive warning messages while importing the data into R because of the treatment of missing values in the GSS data. You can look at the GSS data in a spreadsheet-like format by using the View() function. Although not big by the standards of today, our dataset is large enough, with 4,820 rows and 1,069 columns. We convert the data frame to a data table simply by using the data.table function. The data table is using 22 MB of storage. By contrast, the following CARS table is a data table created from the cars data that ship with R, and it uses only 1 MB of storage.

> CARS <- data.table(cars)
> View(GSSdataframe)
> nrow(GSSdataframe)
[1] 4820
> ncol(GSSdataframe)
[1] 1069
> GSSdataTable <- data.table(GSSdataframe)
> tables()
Loading required package: portfolio
Loading required package: nlme
     NAME          NROW  NCOL MB
[1,] CARS            50     2  1
[2,] DataTable        5     2  1
[3,] GSSdataTable 4,820 1,069 22
     COLS
[1,] speed,dist
[2,] x,v
[3,] year,id,wrkstat,hrs1,hrs2,evwork,wrkslf,wrkgovt,OCC10,INDUS10,marital,divorce,wi
     KEY
[1,]
[2,]
[3,]
Total: 24MB

Instead of dealing with row names as in data frames, we work with keys in data tables. You can set a key by using setkey(). The key can be one or more columns in the data table. Each data table may have only one key. Let us experiment with the DataTable object stored in the working memory. The table is now sorted by the key. When we run the tables() command again, we see that DataTable indeed has a key, while we have not yet assigned keys to the CARS or the GSS data tables.

> setkey(DataTable, x)
> DataTable
   x          v
1: a  0.2374386
2: a  1.5203938
3: b -0.1329364
4: b  0.6700842
5: b  0.4488872
> tables()
     NAME          NROW  NCOL MB
[1,] CARS            50     2  1
[2,] DataTable        5     2  1
[3,] GSSdataTable 4,820 1,069 22
     COLS
[1,] speed,dist
[2,] x,v
[3,] year,id,wrkstat,hrs1,hrs2,evwork,wrkslf,wrkgovt,OCC10,INDUS10,marital,divorce,wi
     KEY
[1,]
[2,] x
[3,]
Total: 24MB

Now, we can use the key to retrieve specific rows of data. For example, we can use the key to retrieve only the rows in the data table with the value of "b" in the x column:

> DataTable["b",]
   x          v
1: b -0.1329364
2: b  0.6700842
3: b  0.4488872

The comma shown in the previous command is optional:

> DataTable["b"]
   x          v
1: b -0.1329364
2: b  0.6700842
3: b  0.4488872

Let us examine the speed improvements we can get with the use of keys in a data table as compared to the use of comparison operators in a data frame. To make the illustration more meaningful, we will simulate some data. Building on the example in the documentation for thedata.table package, we will create a data frame with approximately 10 million rows and 676 groups. We generate two columns—one with capital letters and one with lowercase letters—to create the groups, and then create a vector of z scores using rnorm():

grpsize <- ceiling(1e7/26^2)
tt <- system.time(DF <- data.frame(
x = rep(LETTERS, each=26*grpsize),
y = rep(letters, each=grpsize),
z = rnorm(grpsize*26^2),
stringsAsFactors = FALSE))[3]
> tt
elapsed
   1.89

Now, let’s extract a group from the data frame, the cases in which the value of x is "L" and the value of y is "p". We use the standard approach to subsetting the data, which includes comparison operators for the two variables. The time is not terrible. It took my system 2.14 seconds to extract the 14,793 records that match the criteria. R is using vectorized searching here:

> tt <- system.time(answer1 <- DF[DF$x == "L" & DF$y == "p",])[3]
> tt
elapsed
   2.14

For comparison’s sake, let’s convert the data frame to a data table and use the key to extract the same information as before:

> DT <- as.data.table(DF)
> setkey(DT, x, y)
> tt <- system.time(answer2 <- DT[list("L", "p")])[3]
> tt
elapsed
   0.01
> head(answer1)
        x y           z
4452694 L p  0.48961111
4452695 L p -0.09609168
4452696 L p  0.26847516
4452697 L p  1.11715638
4452698 L p -0.77547270
4452699 L p  0.19132284
> head(answer2)
   x y           z
1: L p  0.48961111
2: L p -0.09609168
3: L p  0.26847516
4: L p  1.11715638
5: L p -0.77547270
6: L p  0.19132284

> identical(answer1$z, answer2$z)
[1] TRUE

We can use the identical() function to verify that the two answers are equivalent. Both sets of retrieved information are the same. The extraction of exactly the same information from the data.table was more than 200 times faster than the traditional data frame approach. The reason the data table was so much faster is that with the table we used a binary search rather than a vector search, as we did with the data frame. Here is an important lesson, paraphrased from a footnote in the data.table documentation: If you use parallel techniques to run vectorized operations, your efficiency may be much lower than simply using a data table.

What we just did was to “join” DT to the 1 row, 2 column table returned by list("L", "p"). Because this is a frequent activity, there is an alias, namely .(), which is a “dot” command followed by parentheses enclosing the search criteria for the key (in this case, the x and ycolumns of the table).

> identical(DT[list("L", "p"),], DT[.("L", "p"),])
[1] TRUE

The data table allows us to do very fast grouping as well. The second argument to the data table may be one or more expressions whose arguments are column names (without quotes). The third argument is a by expression. Let’s compare the use of the tapply() function to the use of the by in data.table. We see something on the order of a tenfold increase in speed.

> tt <- system.time(total1 <- tapply(DT$z, DT$x, sum))[3]; tt
elapsed
   1.65
> tt <- system.time(total2 <- DT[,sum(z), by = x])[3]; tt
elapsed
   0.17

We can also group on two columns should we need to do so. This is also much faster than using tapply(). I will leave the performance comparisons to the interested reader.

> ttt <- system.time(total3 <- DT[,sum(z), by = "x,y"]); ttt
   user  system elapsed
   0.18    0.03    0.22

Recipe 15-3. Compiled Code and Preallocation

Problem

Working with big data is becoming more and more the norm. There are many suggestions for increasing the speed of processing, input and output operation, and data management. Some of these are obvious, such as upgrading hardware, upgrading software, parallel computing (where it makes sense), using data tables as illustrated in Recipe 15-2, and using other languages besides R for time-critical functions.  We will touch briefly in Recipe 15-3 on the last item, using another language. Because C++ is compiled, we can take advantage of that fact and use compiled code to speed up computations. We will also use another quick tip to make your code execute more quickly, which is as simple as preallocating a result object before filling it.

Solution

Although we have been using system.time to determine the elapsed time for executing our code, we will use the microbenchmark package for more precise comparisons in Recipe 15-3. Let us download and install the Rcpp package as well, which will provide a seamless integration of R and C++. We can write C++ equivalents to R functions, compile them, and then use them as if they were R functions. Unlike R, C++ performs looping very efficiently, and a compiled C++ object in R can be much more efficient than even vectorized R functions.

The first step for a Windows user is to install Rtools, if you do not have them, and to make sure that the Rtools are in the path so that R can find them. This will enable you to compile C++ code from within the Rcpp package. The Rtools are available from CRAN athttp://cran.r-project.org/bin/windows/Rtools/. Follow the installation instructions, and if you like, you can have the installer add the Rtools to your path rather than editing the path manually. If you have a Mac system, you will need Xcode to compile C++ code. Users of Linux systems will have a C++ compiler with their Linux distribution.

After installing Rcpp and having the Rtools available, you will be able to write small segments of C++ code that can be used in place of R functions, and which work in basically the same way, except that the C++ code will be compiled rather than interpreted. As C++ is quite a “big” language, we will focus on just a few examples to get you started. If this is a subject you find interesting or useful in your work, you will find Dirk Eddelbuettel’s book, Seamless R and C++ Integration with Rcpp (Springer, 2013), to be very helpful. As a couple of pointers (or reminders), R is not “typed,” but C++ is a strongly typed language, and you must declare the type of a variable before you can use it. Also, unlike R, iterations in C++ begin with 0, not with 1. The for loop in C++ is different from the one in R, and unlike R, C++’s operations are not vectorized.

In addition to compiling C++ code from files saved with the *.cpp extension, you can use C++ inline in R for small code segments. As this is the simplest way to get started, we will look at a couple of examples. The more useful approach will be saving the C++ source file and then “sourcing it” in R using the sourceCpp function in Rcpp.

Adapting an example from Eddelbuettel, let’s compare an R function to calculate the nth number in a Fibonacci sequence to an inline function we create using Rcpp and C++. Recall that the Fibonacci sequence begins with 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55. Each successive number is the sum of the previous two numbers. A rather straightforward R function can be written to find the Fibonacci number for any position in the sequence. The Fibonacci sequence can be defined mathematically as follows, with seed values F0 = 0 and F1 = 1.

image

We can program this in R with a relatively simple function:

fibR <- function(n) {
        if (n == 0) return(0)
        if (n == 1) return(1)
        return(fibR(n-1)+fibR(n-2))
}

To make a C++ function work in R, we can use Rcpp as follows. In this particular case, the C++ function is almost a literal translation of the R code. The question is whether the C++ function is faster than the R function. We will answer that question using the microbenchmarkpackage after verifying that the two functions produce the same results. Simply execute the following script to create and compile the function fibCpp. Note that lines of C++ code must be terminated by a semicolon (;):

library(Rcpp)
cppFunction('int fibCpp(int n) {
        if(n == 0) return(0);
        if(n == 1) return(1);
        return(fibCpp(n -1) + fibCpp(n - 2));
}
' )

The functions produce the same result:

> fibCpp(20)
[1] 6765
> fibR(20)
[1] 6765

However, the compiled C++ function is much faster, as the microbenchmark indicates:

> library(microbenchmark)
> microbenchmark(fibCpp(20), fibR(20))
Unit: microseconds
       expr      min        lq        mean     median        uq        max
 fibCpp(20)    42.92    44.319    51.75065    49.4505    57.848     99.367
   fibR(20) 72050.87 72297.183 72963.34808 72483.5550 72632.372 109907.356
 neval
   100
   100

Be aware that although the newly created function is in your workspace for repeated use in the current session, the compiled code is not saved, and you will need to compile the function again to use it in a new session. As I mentioned, it is more useful to write the C++ function and save it in a *.cpp file and source it when you need it. Here is an example. We will take a commonly used R function and produce a C++ version of it. In this case, we will see how much of a savings we might get by calculating the variance of a dataset with C++ as compared to using R’s built-invar function. You can use the R Editor or any other text editor to write the following code and save it as a *.cpp file. I chose the self-explanatory name varC.cpp. The function uses a two-pass algorithm to find the mean and then the variance using the definitional formula for the variance:

image

As before, we must declare the type for x, n, and the other variables in our function. The use of differing forms of commenting allows us to embed R code, which the compiler sees as a comment and ignores. The size command in C++ determines the length of the x vector. In C++, it is common to declare a variable’s type and assign an initial value to it at the same time. Here are the contents of the varC.cpp file, which I created using the R Editor.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double varC(NumericVector x) {
  int n = x.size();
  double sum1 = 0;
  double sum2 = 0;
  double mean = 0;
  double var = 0;

  for(int i = 0; i < n; ++i) {
    sum1 += x[i];
  }
  mean = sum1 / n;

  for(int i = 0; i < n; ++i) {
    sum2 += (x[i]-mean)*(x[i]-mean);
  }
  var = sum2 / (n-1);
  return var;
}

/*** R
library(microbenchmark)
x <- rnorm(1e6)
microbenchmark(
  var(x),
  varC(x)
)
*/

To compile and execute this function, use the source.Cpp function in Rcpp. We create a vector of 1,000,000 random normal deviates, and see how long it takes our C++ function to find the variance as compared to the var function in R. Although the time savings are not as dramatic in this case, we still achieve a decent improvement, even though the R function is already vectorized.

> library(Rcpp)
> sourceCpp("varC.cpp")

> library(microbenchmark)

> x <- rnorm(1e6)

> microbenchmark(
+   var(x),
+   varC(x)
+ )
Unit: milliseconds
    expr      min       lq     mean   median       uq      max neval
  var(x) 6.947759 6.992777 7.041822 7.020535 7.077449 7.377417   100
 varC(x) 3.140090 3.179044 3.200733 3.190008 3.207968 3.311300   100

As a last illustration of making code more efficient, note the difference between the time required to fill a result object when you preallocate it as compared to simply creating the object. This is a way to save more time, especially when working with loops (which are sometimes unavoidable). Here, we create a loop that produces 100,000 random normal deviates. In the first case, we simply declare x1 to be a numeric vector and then fill it using the loop. In the second case, we preallocate the vector and then fill it. Notice the substantial speed difference the preallocation makes.

> timer <- system.time({
+ x1 <- numeric()
+ for(i in 1:100000) x1[i] = rnorm(1)
+ })[3]
> timer
elapsed
  10.03
> timer <- system.time({
+ x1 <- numeric(100000)
+ for(i in 1:100000) x1[i] = rnorm(1)
+ })[3]
> timer
elapsed
   0.52

We have scratched the surface of the very powerful and flexible Rcpp package, but I hope the interested reader has enough information from this recipe to get started and to continue to learn how to work with the package.