Some Brief R Information

by Jeffrey S. Rosenthal

This page provides brief information about the free-software package "R", which is used for statistical computing in various statistics courses at the University of Toronto.

Running R:

To install R on your own computer (absolutely free!), go to one of: and, under "Precompiled Binary Distributions", download R for your computer type (Linux, Mac, Windows, ...). [For Windows users, select "Download R for Windows", then choose the "base" package, and then download and run the first link, i.e. the "Download for Windows" .exe install file.]

Alternatively, Statistics Dept graduate students can run R on the utstat research computer servers.

And, undergraduate students can run R on the computers in the FASIIT lab rooms in SS 561 and RW 107/109 (see schedules) by logging in with your UTORid (at least after "checking" it). There are also computers with R available for student use in the Data Library lab.

In a pinch, R can also be run remotely online.

Alternatively, you can install and use Rstudio, which is an integrated package designed to make R simpler and more intuitive to use. (I don't use Rstudio myself -- I always use traditional R -- but some people prefer using the Rstudio interface.)

Examples of using R:

Once you have started R, then you can type many different R commands after the R prompt ("> ").

Simple arithmetic may be done directly, including simple expressions like:

3 + 4
or more complicated expressions like:
3.7 + 11 * exp(2) - 17^2 + sqrt(5) / 3
Variables may be assigned by typing "=", e.g. to assign the variable "w" the value 7.2, type:
w = 7.2
Alternatively, the "=" may be replaced by "<-" (i.e. "less-than" followed by "dash"), as in "w <- 7.2".

Lists of data may be entered directly using the "c( )" function, for example:

x = c(2, 4, 1.7, -3)
Once this is done, then typing just "x" will output x as a list (i.e. vector). Or, typing e.g. "x[3]" will output the 3rd element, 1.7. Also "length(x)" is 4, so e.g. "x[length(x)]" will output -3.

Once x is a list, then its sum, mean (x-bar), variance (s^2), standard deviation (s), etc. can be computed directly:

sum(x)
mean(x)
var(x)
sd(x)
[Of course, "sd(x)" is the same as "sqrt(var(x))".] If you wish, you can assign these values to other variables, e.g. "mu = mean(x)", "s = sd(x)", etc.

Lists can themselves be operated on. In the example above, "x^2" would produce the list (4, 16, 2.89, 9), so that e.g. "mean(x^2)" would give 7.9725. Equivalently, you could first do "z = x^2", and then "mean(z)" would also be 7.9725.

When typing commands in R, you can use the up-arrow key to retrieve previous commands, and the left-arrow key to edit a command that you've already typed.

R can also generate pseudorandom values. For example, to generate 50 i.i.d. draws from a standard normal distribution, type "rnorm(50)". To save them as a list called "y", type "y = rnorm(50)". Then you can compute their mean etc. by "mean(y)", "var(y)", "sd(y)", etc.

To generate from other distributions, use e.g. "rbinom(50, 10, 0.3)" for the Binomial(10, 0.3) distribution, "rpois(50, 14)" for the Poisson(14), "runif(10, 2, 4)" for Uniform[2,4], "rexp(50, 0.3)" for Exponential(0.3), etc. To compute cdf's, use e.g. "pnorm(1.2)" for the standard normal, "pexp(1.2, 0.3)" for Exponential(0.3), "pchisq(74.22, 100)" for the chisquared(100) distribution, "pt(2.228, 10)" for the t(10) distribution, "pf(5.7, 2, 10)" for the F(2, 10) distribution, etc.; for densities use "dnorm(1.2)", "dexp(1.2, 0.3)", etc.

R also has excellent plotting features. For example, "plot(x)" will plot the individual values of x above, while "hist(x)" will display a histogram. Also, "pie(x^2)" will display a pie-chart of x^2 (or any other non-negative list). Try them! [If you prefer, you can save your plot as a pdf file, by typing "pdf()", then "plot(x)" (or whatever), and then "dev.off()". If you want a png or jpeg or postscript file instead of pdf, then type "png()" or "jpeg()" or "postscript()" instead of "pdf()".]

Once you have typed "y = rnorm(50)", then you can e.g. get a histogram of this sample by typing

hist(y)
To then e.g. get a second histogram, of a fresh sample, overlaid on the first, in a different colour, you could type
hist(rnorm(50), add=TRUE, border=2)
Points and lines can be added to plots using the commands "points" and "lines", respectively.

Longer sequences of R commands (e.g. full computer programs) can be saved to a file, and then executed by typing

source("filename")

The number sign # indicates a comment (useful for explaining what you are doing); everything from # to the end of the line is ignored by R.

When you are done with your computations, type "quit()" or "q()" to quit R. (It might then ask you if you want to save your workspace image; I always reply "n", but either option is fine.)

Finding Out More:

R has lots of built-in help available. For example, to find out more about the "hist" function, simply type "help(hist)" or "?hist" after the R prompt. Similarly, type "?plot" to find out about the many plotting options. Or, for an interactive R-help web interface, type "help.start()".

In addition, there is lots of documentation about R available on the web, see e.g. here or here or here or here. There are also lots of free online instructional videos, for example this R video series.

Finally, R is a full-featured programming language (with "if", "for", "while", etc.), and has a huge number of other features and options not mentioned here. There is lots to learn and read and investigate and use -- try it out, and have fun!


-- Jeffrey S. Rosenthal, Department of Statistics, University of Toronto