Haghish, E. F. (2014). Applied Statistics Using Stata.

Free Online Stata Tips & Tutorials. Data Management; Stata Graphs and Graphics; Data Analysis; Stata Programming; Advanced Statistics

Fork Star Watch
Updated on 12 July 2016

Rcall : seamless R in Stata

| Installation |     | Syntax |     | Modes |     | Data Communication |    

The Rcall package integrates R within Stata and allows automated data communication between R and Stata. Stata macro, scalar, matrix, and dataset can be automatically transported to R and similarly, R objects of different classes (data.frame, list, matrix, vector, logical, and NULL) can be automatically imported to Stata and updated in real-time. This level of integration between Stata and R, also allows Stata users to get benefit from other programming languages that can be executed interactively in R environment, such as C++ using Rcpp or JavaScript using V8 packages. Moreover, Rcall allows embedding R language in Stata ado programs to not only call R functions and packages, but also program Stata packages by allowing communication betwen Stata and R.

The main idea of Rcall centers on interactive use. This is the same philosophy as markdoc package, arguing that statistics demands interactive tools. Therefore, in contrast to other attempts for embedding R in other languages or statistics software, Rcall is designed to communicate data automatically, which considerably improves its interactive functionalities. The main idea of the package is discussed in the following journal article.

Seamless R and Stata integration

Installation

You can install the package from GitHub by executing the following command:

  . net install Rcall, replace from("https://raw.githubusercontent.com/haghish/Rcall/master/")

You also need to make sure that R statistical software is installed on your machine. Rcall package includes the default paths of R on Microsoft Windows, Mac, and Linux. But if you have installed R in a different location, then you can define the path to executable R using the setpath command as shown below. the setpath command defines the path to R permanently.

Before defining the R path permanently, make sure Rcall cannot access R alread. You may try to print a "Hello World" in R to check that out. If R is not accessible, or if you have several R with various versions installed and you would like to call a particular version, define the path to R as shown below:

  .  Rcall setpath "/usr/bin/R"

Finally, for passing Stata data sets to R automatically, you need to install the foreign R package, which you can install it within Stata:

  .  R: install.packages("foreign", repos="http://cran.uk.r-project.org")

Syntax

In general, the syntax of the Rcall package can be summarized as follows:

Rcall [mode] [:] R-command         // calling R in Stata
Rcall [subcommand]                 // managing R 
Rcall [list] [:] namelist          // controling data communication

The syntax of the package is further explained in the following sections.

Rcall Modes

Rcall can embed R in several modes. The mode can be interactive (not specifying anything), vanilla (non-interactive), and sync mode which is an extended interactive mode with object synchronization. The interactive and sync modes can also be used within the console mode which simulates the R console in Stata. These modes are further explained below.

console mode

To enter the R console mode within Stata, type Rcall. This runs R in Stata interactively similar to running Mata environment. However, with every R command you execute, Stata obtains the objects from R simultaniously. Note that similar to mata environment, you cannot execute R commands from the Do-File Editor when the environment is running. To execute R from Do-File Editor, you should call R using the Rcall command. Nevertheless, the st.scalar(), st.matrix(), st.data(), and load.data() functions will continue to work when R environment is running.

  . scalar a = 999
  . R:

	------------------------------------------------- R (type end to exit) ----------------
        . a <- 2*(st.scalar(a))
        . a
        [1] 1998
        . end
	---------------------------------------------------------------------------------------
  . display r(a)
        1998

The interactive mode also supports multi-line code. The + sign is added automatically:

  . R:

	------------------------------------------------- R (type end to exit) ----------
        . myfunction <- function(x) {
        +
        . if (is.numeric(x)) {
            +
        .   return(x^2)
            +
        . }
        +
        . }
        . (a <- myfunction(199))
        . [1] 39601
        . end
	---------------------------------------------------------------------------------------
  . display r(a)
        39601

Interactive mode

vanilla mode

The Rcall package runs R interactively. That is, when you define an object in R, it remains in the memory of R and accissable with the next command. For example:

  . R: rm(list=ls())                       
  . R: a <- 10
  . R: (a^2) 
      [1] 100

the vanilla subcommand runs R non-interactively, which can be imagined as opening R, executing a script, and closing R without saving it. This subcommand is only useful if you want to source() a script file in R.

sync mode

By default, Rcall returns rclass objects from R to Stata and allows passing Stata objects to R using several functions. However, the package also has a sync mode where it automatically synchronizes the global environments of Stata and R, allowing real-time synchronization between the two languages, which consequently replaces the objects whenever they change in either of the environments.

The sync mode allows maximum interactive experience for numeric and string scalars and matrices in Stata. The mode does not synchronize global macros. In the example below, the value of c1 changes from 1 to 0 after it is altered in R:

  . scalar a = 1
  . R sync: (a = 0)
        [1] 0
  . display a
        0

The same example is repeated without sync mode:

  . scalar a = 1
  . R: (a = 0)
        [1] 0
  . display a
        1

The synchronize mode also replaces matrices in R and Stata, when there is a change in the matric in either of the environments. Naturally, new matrices also are synchronized:

  . mat drop _all
  . mat define A = (1,2,3 \ 4,5,6)
  . Rcall sync: B = A
  . mat list B
  
        B[2,3]
            c1  c2  c3 
        r1   1   2   3
        r2   4   5   6 
  . mat C = B/2
  . R sync: C
  
             [,1] [,2] [,3] 
        [1,]  0.5  1.0  1.5 
        [2,]  2.0  2.5  3.0 
        

As shown in the examples, any change made to the matrices, whether it has happened in R or Stata will be instantly available in the other environment. While such a level of integration between the two languages is exciting, it requires a lot of caution and testing. This is rather an exploratory feature which is not a main-stream approach to calling a foreign language in a programming language.

The Rcall command can also be abbreviated as R and the colon sign ":" is optional. For for the rest of the examples, I call R by typing R: instead of Rcall:.

Rcall Lists

sendlist

returnlist

synclist

Data communication

The biggest advantage of Rcall package is that it allows data communication between Stata and R. Variables that are defined in R, can be accessed in Stata automatically, within the returned rclass scalars, macros, and matrices. For example, I create a numeric variable, a numeric vector, a character variable, a matrix, and a list in R, and retreive the results in Stata simultaniously as shown below

Numeric
  . R: a <- 99
  . display r(a)
      99
      
      

Character

  . R: b <- "hello world"
  . display r(b)
      hello world
      
      

Vector

note that the vector is returned as a "string" macro in Stata. But you can destring it easily. Stata does not return rclass numeric lists (to my knowledge). Nevertheless, if you want to access an R vector in Stata, now you can...

  . R: c <- c(1:5)
  . display r(c)
      1 2 3 4 5
      
      

Matrix

Excitingly, you can also create a Matrix in R and access it simultaniously in Stata, anytime you make a change to it. For example:

  . R: A = matrix(1:6, nrow=2, byrow = TRUE) 
  . R: A
           [,1] [,2] [,3]
      [1,]    1    2    3
      [2,]    4    5    6
      
      

And now view the matrix in Stata! That simple!

  .  mat list r(A)
      r(A)[2,3]
          c1  c2  c3
      r1   1   2   3
      r2   4   5   6
      
      

List

Accessing Lists is more tricky, but yet, automatically possible. Stata returns each element of a list as a separate rclass scalar or macro. The biggest difference is that rclass cannot include $ sign in the name. Rcall automatically renames the $ sign to underscore.

  . R: mylist <- list(x="character", y=c(1:10))
  . display  r(mylist_x)
      character
      
  . display  r(mylist_y)
      1 2 3 4 5 6 7 8 9 10
      
      

rclasses

As noted earlier, without using the vanilla subcommand, R is executed interactively within Stata and "most" of the R objects are accessible in Stata. You can see the list of available objects in Stata by typing return list which shows the returned matrices, macros, and scalars:

  .  return list
      scalars:
                        r(a) =  99
      
      macros:
                 r(mylist_y) : "1 2 3 4 5 6 7 8 9 10"
                 r(mylist_x) : "character"
                        r(b) : "hello world"
                        r(c) : "1 2 3 4 5"
      
      matrices:
                        r(A) :  2 x 3
      
      

Passing data from Stata to R

So far I documented how R variables can be accessed within Stata. This package is under constant development and I will be able to automatically import more R classes to Stata (currently it only imports numeric, character, matrix, and list).

Now I show how to pass data from Stata to R. In general, passing local and global macro is the simplest:

  .  global a 2016
  . R: a <- $a
  . display r(a) 
      2016
      
      

But when it comes to scalar, matrix, and data sets, it becomes more complicated. Similar to passing scalar, matrix, or data sets to Mata, the Rcall defines 3 functions for passing these classes to R.

function description
st.scalar() passes a numeric or string scalar to R
st.matrix() passes a matrix to R
st.data() passes a Stata data set to R
load.data() loads R dataframe in Stata

Below, I demonstrate how to use thise functions.

st.scalar() function

  .  scalar a = 999
  . R: (a <- st.scalar(a))
      [1] 999
      
  .  scalar a = "String Scalar"
  . R: (a <- st.scalar(a))
      [1] "String Scalar"
      
      

st.matrix() function

as shown in the example below, you can pass your Stata matrices to R, do any manipulation, and automatically get the resulting matrix back in Stata

  .  matrix A = (1,2\3,4) 
  . matrix B = (96,96\96,96)                
  . R: C <- st.matrix(A) + st.matrix(B)   
  . R: C
           [,1] [,2]
      [1,]   97   98
      [2,]   99  100
      
  . mat list r(C)                                    //Matrix C in Stata
      r(C)[2,2]
           c1   c2
      r1   97   98
      r2   99  100
      
      

st.data() function

Finally, you can also pass Stata data set to R. If the data set is on your machine, you should provide the relative or absolute path to the file name. For example, the absolute path to the auto.dta on my machine is:

  .  R: mydata <- st.data(/Applications/Stata/ado/base/a/auto.dta)
  . R: head(mydata)
                 make price mpg rep78 headroom trunk weight length turn displacement
      1   AMC Concord  4099  22     3      2.5    11   2930    186   40          121
      2     AMC Pacer  4749  17     3      3.0    11   3350    173   40          258
      3    AMC Spirit  3799  22    NA      3.0    12   2640    168   35          121
      4 Buick Century  4816  20     3      4.5    16   3250    196   40          196
      5 Buick Electra  7827  15     4      4.0    20   4080    222   43          350
      6 Buick LeSabre  5788  18     3      4.0    21   3670    218   43          231
        gear_ratio  foreign
      1       3.58 Domestic
      2       2.53 Domestic
      3       3.08 Domestic
      4       2.93 Domestic
      5       2.41 Domestic
      6       2.73 Domestic
      
      

if you leave the st.data() function empty, it passes the loaded data set from Stata to R. for example:

  .  sysuse auto, clear
      (1978 Automobile Data)
      
  . keep price mpg
  . R: mydata <- st.data()
  . R: head(mydata)
        price mpg
      1  4099  22
      2  4749  17
      3  3799  22
      4  4816  20
      5  7827  15
      6  5788  18
      
      

load.data() function

You can also load a dataframe from R to Stata. This will clear any data you have loaded in Stata automatically, so becareful with that! Nevertheless, the function can be very useful to quickly pass data frame from R to Stata. This function will export a Stata version 11 data set using the foreign R package and load it in Stata:

  . clear
  . R: mydata <- data.frame(cars)
  . R: load.data(mydata)

The mydata data frame is already loaded in Stata! You can just follow your analysis in Stata now!

  . list in 1/2

        +--------------+
        | speed   dist |
        |--------------|
     1. |     4      2 |
     2. |     4     10 |
        +--------------+

It's your turn to test the package, Fork It on GitHub and contribute to it. Connecting R to Stata in such a level of integrity, can really ease the process of running a computation in R and passing the results or variables back to Stata.

Rcall subcommands

setpath

The package requires R to be installed on the machine. The package detects R in the default paths based on the operating system. The easiest way to see if R is accessible is to execute a command in R

  . R: print("Hello World") 
        [1] "Hello World" 

If R is not accessible, you can also permanently setup the path to R using the setpath subcommand. For example, the path to R on Mac 10.10 could be:

  . Rcall "{it:/usr/bin/r}" 

clear

When you work with Rcall interactively (without vanilla subcommand), anything you do in R is memorized and saved in a .RData file automatically, even if you quit R using q() function. If you wish to clear the memory and erase everything defined in R, you should erase the R environment and unlink the .RData file and erase the objects:

  . R: rm(list=ls())
  . R: unlink(".RData") 	

However, the commands above do not erase the attached packages and data sets. you can view the attached objects in your R environment using the search() function. To detach packages or objects, use the detach() function. Note that packages are named as "package:name". Here is an example of detaching a data set and a package

  . R:
  
	------------------------------------------------- R (type end to exit) ----------
        . attach(cars)
        . library(Rcpp)               # make sure you have it installed
        . search()                    # Output is omitted ...
        .
        . detach(cars)
        . detach("package:Rcpp")
	---------------------------------------------------------------------------------------

history

describe

site