Some Brief R Information

by Jeffrey S. Rosenthal, University of Toronto

This page provides brief information about the free-software package "R", which is used for statistical computing in various statistics courses at the University of Toronto.

Running R:

To install R on your own computer (absolutely free!), go to one of: and, under "Precompiled Binary Distributions", download R for your computer type (Linux, Mac, Windows, ...). [For Windows users, select "Download R for Windows", then choose the "base" package, and then download and run the first link, i.e. the "Download for Windows" .exe install file.]

Alternatively, you can install and use Rstudio, which is an integrated package designed to make R simpler and more intuitive to use.

Rstudio can also be run remotely online [or maybe here or here].

There are also computers with R available for student use in the University of Toronto Data Library.

Examples of using R:

Once you have started R, then you can type many different R commands after the R prompt ("> ").

Simple arithmetic may be done directly, including simple expressions like:

3 + 4
or more complicated expressions like:
3.7 + 11 * exp(2) - 17^2 + sqrt(5) / 3
Variables may be assigned by typing "=", e.g. to assign the variable "w" the value 7.2, type:
w = 7.2
Alternatively, the "=" may be replaced by "<-" (i.e. "less-than" followed by "dash"), as in "w <- 7.2".

Lists of data may be entered directly using the "c( )" function, for example:

x = c(2, 4, 1.7, -3)
Once this is done, then typing just "x" will output x as a list (i.e. vector). Or, typing e.g. "x[3]" will output the 3rd element, 1.7. Also "length(x)" is 4, so e.g. "x[length(x)]" will output -3. You can use "1:5" as shorthand for "c(1,2,3,4,5)", etc.

Once x is a list, then its sum, mean (x-bar), variance (s^2), standard deviation (s), etc. can be computed directly:

sum(x)
mean(x)
var(x)
sd(x)
[Of course, "sd(x)" is the same as "sqrt(var(x))".] If you wish, you can assign these values to other variables, e.g. "mu = mean(x)", "s = sd(x)", etc.

Lists can themselves be operated on. In the example above, "x^2" would produce the list (4, 16, 2.89, 9), so that e.g. "mean(x^2)" would give 7.9725. Equivalently, you could first do "z = x^2", and then "mean(z)" would also be 7.9725.

When typing commands in R, you can use the up-arrow key to retrieve previous commands, and the left-arrow key to edit a command that you've already typed.

R can also generate pseudorandom values. For example, to generate 50 i.i.d. draws from a standard normal distribution, type "rnorm(50)". To save them as a list called "y", type "y = rnorm(50)". Then you can compute their mean etc. by "mean(y)", "var(y)", "sd(y)", etc.

To generate from other distributions, use e.g. "rbinom(50, 10, 0.3)" for the Binomial(10, 0.3) distribution, "rgeom(50, 0.2)" for the Geometric(0.2), "rpois(50, 14)" for the Poisson(14), "runif(50, 2, 4)" for Uniform[2,4], "rexp(50, 0.3)" for Exponential(0.3), etc. To compute cdf's, use e.g. "pnorm(1.2)" for the standard normal, "pexp(1.2, 0.3)" for Exponential(0.3), "pchisq(74.22, 100)" for the chisquared(100) distribution, "pt(2.228, 10)" for the t(10) distribution, "pf(5.7, 2, 10)" for the F(2, 10) distribution, etc. For densities, use "dnorm(1.2)", "dexp(1.2, 0.3)", etc.

R also has excellent plotting features. For example, "plot(x)" will plot the individual values of x above, while "hist(x)" will display a histogram. Also, "pie(x^2)" will display a pie-chart of x^2 (or any other non-negative list). Try them! [If you prefer, you can save your plot as a pdf file, by typing pdf("yourfilename.pdf"), then plot(x) (or whatever), and then "dev.off()". If you want a png or jpeg or postscript file instead of pdf, then type "png" or "jpeg" or "postscript" instead of "pdf".]

Once you have typed "y = rnorm(50)", then you can e.g. get a histogram of this sample by typing

hist(y)
To then e.g. get a second histogram, of a fresh sample, overlaid on the first, in a different colour, you could type
hist(rnorm(50), add=TRUE, border=2)
Points and lines can be added to plots using the commands "points" and "lines", respectively.

R can also read data files, with commands like "read.table" and "read.csv". You have to specify the file to read, as either a local file like read.table("bicycles.txt") or a url like read.table("http://www.reallyusefulsite.com/data/bicycles.txt"). I recommend including the flag "as.is=TRUE". To read a local file, you may need to first adjust your local working directory with the "setwd" command (or perhaps from a File menu).

Longer sequences of R commands (e.g. full computer programs) can be saved to a file, and then executed by typing

source("filename")

The number sign # indicates a comment (useful for explaining what you are doing); everything from # to the end of the line is ignored by R.

When you are done with your computations, type "quit()" or "q()" to quit R. (It might then ask you if you want to save your workspace image; I always reply "n", but either option is fine.)

Finding Out More:

R has lots of built-in help available. For example, to find out more about the "hist" function, simply type "help(hist)" or "?hist" after the R prompt. Similarly, type "?plot" to find out about the many plotting options. Or, for an interactive R-help web interface, type "help.start()".

In addition, there is lots of information about R available on the web, see e.g. here or here or here or here, or this R video series, or various online courses.

Finally, R is a full-featured programming language (with "if", "for", "while", etc.), and has a huge number of other features and options not mentioned here. There is lots to learn and read and investigate and use -- try it out, and have fun!


-- Jeffrey S. Rosenthal, Department of Statistics, University of Toronto