The R programming language Textbook: John Verzani, Using R for Introductory Statistics R is a computer language for statistical computing. The benefits of R: * R is free. R is open-source and runs on UNIX, Windows and Macintosh. * R has a good built-in help system. * R has good graphing capabilities. * Students can easily migrate to the commercially supported S-Plus program if commercial software is desired. * R's language has many built-in statistical functions. * The language is easy to extend with user-written functions. * R is a computer programming language. What is R lacking compared to other software solutions? * It has a limited graphical interface (S-Plus has a good one). This means it can be harder to learn at the outset. * There is no commercial support. (Although one can argue the international mailing list is even better) * The command language is a programming language so students must learn to appreciate syntax issues etc. R is maintained by the R core-development team, an international team of volunteer developers. The R project web page http://www.r-project.org is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation. The home page for the Verzani text is www.math.csi.cuny.edu/UsingR. The home page has solutions to selected problems, data sets, and R functions. Verzani, Preface page xv, gives instructions on installing the UsingR package on your computer with the following commands: Start R, and then enter this command > install.packages (“UsingR”) After you install the UsingR package on your computer, the command > library(UsingR) will load the package into R. For more details on how to install R on your computer, also see the instructions in Verzani, Appendix A, page 359. John Verzani, Using R for Introductory Statistics Examples of R code from Verzani Chapter 1. Data To start R in Windows, click on the R icon on the desktop or find the R program under the start menu. R : Copyright 2005, The R Foundation for Statistical Computing Version 2.2.1 (2005-12-20 r36812) ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > The caret “>” is the R prompt. Using R as a calculator: 2+3 3^2 (1 – 2 ) * 3 Many mathematical and statistical functions are available in R. # Square root of 9 sqrt(9) # Natural log of 10 log(10) 1.2.3 Assignment of values to variables Assign the value 2 to the variable a: a = 2 remove the variable a rm(a) Assignment can also be done using “<-“, which is equivalent to “=”. b <- 2 rm(b) 1.2.4 Using c() to collect data Suppose the yearly number of whales beached in Texas during the period 1990 to 1999 is 74 122 235 111 292 111 211 133 156 79 What is the mean, the variance, and the standard deviation of the number of whales? Review mean, variance, and standard deviation. Create a variable, whales, to contain the data as a data vector: whales = c(74, 122, 235, 111, 292, 111, 211, 133, 156, 79) mean(whales) var(whales) sd(whales) # The R function for standard deviation is sd, not std: std(whales) # We can calculate standard deviation ourselves: sqrt(var(whales)) sqrt( sum( (whales - mean(whales))^2 /(length(whales)-1))) 1.2.5 Using functions on a data vector We can apply R functions to the variable (data vector) whales: sum(whales) length(whales) mean(whales) sort(whales) max(whales) min(whales) range(whales) # calculate successive differences whales diff(whales) length(whales) length(diff(whales)) # calculate the cumulative sum (running total) whales cumsum(whales) Functions that work on a vector in one step (vectorization) Most R functions can perform their action on all the entries in a data vector in a single step: sorted.whales= sort(whales) sorted.whales # calculate the differences from the mean sorted.whales - mean(sorted.whales) Functions: Read pages 11 to 14 on help(), data.entry(), edit() Create a sequence of numbers 1:10 rev(1:10) seq(1,9,by=2) rep(2,5) rep(1:3,2) rep(c(“long”, “short”), 3) Read pages 16 to 21 on data indices, logical values and managing the work environment. Study these examples: x=1:5 x<5 x>1 x>1 & x<5 x==1 x!=1 Missing values (NA) shuttle = c(0,1,0,NA,0,0) mean(shuttle) mean(shuttle, na.rm=TRUE) Read pages 23 to 29 on reading in programs and data. Study these examples: library() library(UsingR) data() data(package = .packages(all.available = TRUE)) library(MASS) geyser names(geyser) Use the source() function to read in data and commands. Here’s an example to illustrate the function source(yourfile), where yourfile is in a local directory. 1. Put the following commands into a file named “Rexample1.txt” xx = rnorm(100) hist(xx) 2. Save the file “Rexample1.txt” in your folder named “My Documents” 3. Edit the following command, replacing “Michael Walker” with the name of your documents directory source(“C:/Documents and Settings/Michael Walker/My Documents/Rexample1.txt”) 4. Enter the edited source command in R. 5. R should plot a histogram of the variable xx. 6. Examine the value of the variable xx.