Haghish, E. F. (2014). Practical Stata Programming: Calculating Adjusted R Squared.
Updated on November 23th 2014

r2_a package: Adjusted R Squared

| Quick Tips |     | Introduction |      | Algorithm |     | r2_a program |     | Analysis |     | Exercise |     | Download ado |    

Quick Tips

r2_a package was written for Stata version 6 to calculate Adjusted R Squared after running a regression analysis. Newer versions of Stata include the adjusted R Squared in the output but in this article I will review this package to see how the program was written.

Introduction

r2_a package was written in 2001 by Jeff Pitblado to calculate Adjusted R Squared after running a regression analysis. In the newer version of Stata, Adjusted R Squared is included in the regression outputs and therefore, there is no need for installing a user-written package. But given the simplicity of the package, reviewing how the program was written could be educative for beginner Stata programming learners. The program can be installed searching findit r2_a. Let's load the auto.dta dataset, run a regression, and try the r2_a command.

sysuse auto, clear /* loading the auto data set */
regress price mpg
      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =   20.26
       Model |   139449474     1   139449474           Prob > F      =  0.0000
    Residual |   495615923    72  6883554.48           R-squared     =  0.2196
-------------+------------------------------           Adj R-squared =  0.2087
       Total |   635065396    73  8699525.97           Root MSE      =  2623.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

As highlighted in the output of the regression command, Stata 13 (and I assure you all of the Stata licenses you have owned during the last 10 years) return the Adjusted R Squared. Now I try the r2_a program:

r2_a /* running r2_a after regression command */
Adj R-square = 0.2087

Luckily, the value returned by r2_a and the regression command are identical. Now, let's find out how does the program work and how the problem was solved!

Algorithm

How can we calculate Adjusted R Squared after running a regression analysis? The package is written with the following procedure to answer this question.

  • Define the syntax of the program to return rclass
  • Make sure that the command is used right after regression analysis, otherwise return an error and exit
  • Obtain the R Squared, Residual's degree of freedom, and Model's degree of freedom from the regression analysis
  • Calculate Adjusted R Squared according to the formula and store it in a scalar
  • Print the value of the Adjusted R Square
  • End the program


r2_a program

program define r2_a, rclass
version 6

if "`e(cmd)'" != "regress" {
di in red "estimates not from -regress-"
exit 198
}

tempname r2_a
scalar `r2_a' = 1 - (1 - e(r2))*( e(df_r) + e(df_m) )/( e(df_r) )
di in green "Adj R-Squared = " %-6.4f `r2_a'
return scalar r2_a = `r2_a'

end


Analysis of r2_a package

Next, I begin explaining the program codes step by step to analyze how the program works.

Syntax processing

program define r2_a, rclass
version 6

if "`e(cmd)'" != "regress" {
di in red "estimates not from -regress-"
exit 198
}

The program begins by defining the name of the program, i.e. r2_a, and the version of the Stata that the program should be run with, i.e. version 6. The program name has an rclass option. rclass indicates that the defined program returns the results in r() which can be obtained using the __return list__ command. see [P] return. Without specifying this the rclass option, the program may not change or replace the rclass results.

However, the program does not make use of Stata programming syntax command. Instead, r2_a requires applying the command after a regression analysis because it uses several of eclass scalars returned by the regression analysis. e(cmd) is a scalar that returns the name of the used command. if "`e(cmd)'" != "regress" means that if the command name is not equal to "regress"... which is used to require users only apply the command after regression. If the command is not used after regression, the error message will be printed and the program exits with error 198 which indicates invalid syntax.


Adjusted R Squared

The formula for calculating adjusted R squared is as followes, where:

    • R2 : Sample R Squared
    • p : Number of predictors
    • N : Sample size

We can obtain these scalars from the regression model by typing ereturn list. In the r2_a package, Instead of (N - 1), the program adds up the degrees of freedom of the model and the error which should add up to (N - 1). This has been done by summing up e(df_r) + e(df_m) scalars returned by the regression model. In addition, instead of (N - p - 1), the program uses e(df_r) scalar which indicates the degrees of freedom of the residuals. The scalars that are used from the regression model are listed below:

    • e(r2) : Sample R Squared
    • e(df_r) : Residual's degree of freedom
    • e(df_m) : Model's degree of freedom

Calculating Adjusted R Squared

tempname r2_a
scalar `r2_a' = 1 - (1 - e(r2))*( e(df_r) + e(df_m) )/( e(df_r) )

tempname defines a particular local macro that can be used temporarily as a scalar or matrix name. Since it is a temporary macro, it will be dropped at the end of the program. In this program, the tempname is used for defining a scalar which calculates the Adjusted R Squared, i.e. r2_a. The rest is simply applying the macros returned from the regress command to calculate the Adjusted R Squared.

Printing the command output

di in green "Adj R-Squared = " %-6.4f `r2_a'
return scalar r2_a = `r2_a'
end

After calculating the Adjusted R Squared, the output of the package is prepared. The %-6.4f is used to reformat the value of the scalar. Formating numeric values which can be found in the [U] manual, begins with % sign. The "hyphen" is optional which makes the result left-aligned. The first number (before the dot) indicates the width of the results. The second number after the point states the number of digits to follow the decimal point. The "f" indicates the fixed format which is one of the available formats for numeric values. The value of the Adjusted R Squared is pronted in a one-line return and r2_a scalar is returned in rclass which means, if you type return list after applying the r2_a program, you will get something like:

*** Example ***
sysuse auto, clear
quietly regress price mpg
r2_a
return list
scalars:
               r(r2_a) =  .2087437291901012
               

Exercise

  • Try to calculate the Adjusted R Squared using different scalars returned by the regression
  • Create a table that presents all the elements used in calculating the Adjusted R Squared and also includes the Adjusted R Squared itself

Download r2_a commented ado

The example ado file below is the commented version of r2_a.ado that you can download. Richt click and select save as... to download the ado file.
r2_a.ado





Aboutفارسی