Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

A Course in Statistics with R (eBook)

eBook Download: EPUB
2016
John Wiley & Sons (Verlag)
978-1-119-15275-0 (ISBN)

Lese- und Medienproben

A Course in Statistics with R - Prabhanjan N. Tattar, Suresh Ramaiah, B. G. Manjunath
Systemvoraussetzungen
83,99 inkl. MwSt
(CHF 81,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Integrates the theory and applications of statistics using R A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and the related statistical techniques underlying them through practical applications, and hence helps the reader to achieve a clear understanding of the associated statistical models.

Key features:

  • Integrates R basics with statistical concepts
  • Provides graphical presentations inclusive of mathematical expressions
  • Aids understanding of limit theorems of probability with and without the simulation approach
  • Presents detailed algorithmic development of statistical models from scratch
  • Includes practical applications with over 50 data sets


Prabhanjan Tattar , Business Analysis Senior Advisor at Dell International Services, Bangalore, India. Professor Tattar is a statistician providing analytical solutions to business problems inclusive of statistical models and machine learning as appropriate.
Suresh Ramaiah, Assistant Professor of Statistics at Dharwad University, Dharwad, India.

B G Manjunath, Business Analysis Advisor at Dell International Services, Bangalore, India


Integrates the theory and applications of statistics using R A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and the related statistical techniques underlying them through practical applications, and hence helps the reader to achieve a clear understanding of the associated statistical models. Key features: Integrates R basics with statistical concepts Provides graphical presentations inclusive of mathematical expressions Aids understanding of limit theorems of probability with and without the simulation approach Presents detailed algorithmic development of statistical models from scratch Includes practical applications with over 50 data sets

Prabhanjan Tattar , Business Analysis Senior Advisor at Dell International Services, Bangalore, India. Professor Tattar is a statistician providing analytical solutions to business problems inclusive of statistical models and machine learning as appropriate. Suresh Ramaiah, Assistant Professor of Statistics at Dharwad University, Dharwad, India. B G Manjunath, Business Analysis Advisor at Dell International Services, Bangalore, India

"Integrates the theory and applications of statistics using R the book has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject." (Zentralblatt MATH 2016)

Chapter 1
Why R?


Package(s): UsingR
Dataset(s): +AD1-9

1.1 Why R?


Welcome to the world of Statistical Computing! During the first quartile of the previous century Statistics started growing at a great speed under the schools led by Sir R.A. Fisher and Karl Pearson. Statistical computing replicated similar growth during the last quartile of that century. The first part laid the foundations and the second part made the founders proud of their work. Interestingly, the beginning of this century is also witnessing a mini revolution of its own. The R Statistical Software, developed and maintained by the R Core Team, may be considered as a powerful tool for the statistical community. The software being a Free Open Source Software is simply icing on the cake.

R is evolving as the preferred companion of the Statistician. The reasons are aplenty. To begin with, this software has been developed by a team of Statisticians. Ross Ihaka and Robert Gentleman laid the basic framework for R, and later a group was formed who are responsible for the current growth and state of it. R is a command-line software and thus powerful with a lot of options for the user.

The legendary Prasanta Chandra Mahalanobis delivered one of the important essays in the annals of Statistics, namely, “Why Statistics?” It appears that Indian mathematicians were skeptical to the thought of including Statistics as a legitimate branch of science in general, and mathematics in particular. This essay addresses some of those concerns and establishes the scientific reasoning through the concepts of random samples, importance of random sampling, etc.

Naturally, we ask ourselves the question “Why R?” Of course, the magnitude of the question is oriented in a completely different and (probably) insignificant way, and we hope the reader will excuse us for this idiosyncrasy. The most important reason for the choice of R is that it is an open source software. This translates to the fact that the functioning of the software can be understood to the first line of code which steam rolls into powerful utilities. As an example, we can trace how exactly the important mean function works.

# File src/library/base/R/mean.R # Part of the R package, http://www.R-project.org # # A copy of the GNU General Public License is available at # http://www.r-project.org/Licenses/ mean <- function(x, ...) UseMethod("mean") mean.default <- function(x, trim = 0, na.rm = FALSE, ...) { if(!is.numeric(x) && !is.complex(x) && !is.logical(x)) { warning("argument is not numeric or logical: returning NA") return(NA_real_) } if (na.rm) x <- x[!is.na(x)] if(!is.numeric(trim) || length(trim) != 1) stop("'trim' must be numeric of length one") n <- length(x) if(trim > 0 && n > 0) { if(is.complex(x)) stop("trimmed means are not defined for complex data") if(trim >= 0.5) return(stats::median(x, na.rm=FALSE)) lo <- floor(n*trim)+1 hi <- n+1-lo x <- sort.int(x, partial=unique(c(lo, hi)))[lo:hi] } .Internal(mean(x)) } mean.data.frame <- function(x, ...) sapply(x, mean, ...)

Note that there is information about the address of the mean function, src/library/base/R/mean.R. The user can go to that address and open mean.R in any text editor. Now, if you find that the mean function does not work according to your requirement, modifications and new functions can be defined easily. For instance the default setting of the mean function is na.rm=FALSE, that is, if there are missing observations in a vector, see Section 2.3, the mean function will return NA as the answer. It is very simple to define a modified function whose default setting is na.rm=TRUE.

> x <- c(10,11,NA,13,14) > mean(x) [1] NA > mean_new <- function(...,na.rm=TRUE) mean(...,na.rm=TRUE) > mean_new(x) [1] 12 > mean(x,na.rm=TRUE) [1] 12

This is as simple as that. Thus, there are no restrictions imposed by the software on the user. The authors strongly believe that this freedom is priceless. If the decision to acquire the software is dictated by economic considerations, it is convenient that R comes freely.

Computation complexity is a reason for the need of software. As the modern statistical methods are embedded with complexity, it becomes a challenge for the developers of the methodology to complement the applications with appropriate computer programs. It has been our observation that many statisticians tend to address this dimension with relevant R packages. Venables and Ripley (2002) developed a very useful package MASS, an abbreviation for the title of their book Modern Applied Statistics with S. This package is shipped along with the software and is “recommended” as a priority package. In Section 1.8 we will see how many statisticians have adopted R as the language of their statistical computations.

1.2 R Installation


The website http://cran.r-project.org/ consists of all versions of R available for a variety of Operating Systems. CRAN is an abbreviation for Comprehensive R Archive Network. An incidental fact is that R had been developed on the Internet only.

The R software can be installed on a variety of platforms such as Linux, Windows, and Macintosh, among others. There is also an option of choosing 32- or 64-bit versions of the software. For a Linuxian, under appropriate privileges, R may be easily installed from the terminal using the command sudo apt-get install r-base. Ubuntu operating system users can find more help regarding R installation at the link http://ubuntuforums.org/showthread.php?t=639710.

After the installation is complete, the user can start the software by simply keying in R at the terminal. If the user is a beginner and not too familiar with the Linux environments, it is a possibility that she may be disappointed with its appearance as she cannot find much help there. Furthermore, the Linux expert may find this too trivial to explain/help a beginner. Some help for the beginner is available at http://freshmeat.net/articles/view/2237/.

A user of Windows first needs to download the recent versions executable file, currently R-3.0.2-win32.exe, and then merely double-click her way to completing the installation process. Similarly, Macintosh users can easily find the related files and methods for installation. The web links “R MacOS X FAQ” and “R Windows FAQ” should further be useful to the reader. The authors have developed the R codes used in this book and verified them for Linux and Windows versions. We are confident that they will compile without errors on Macintosh too.

1.3 There is Nothing such as PRACTICALS


The reader is absolutely free to differ from our point of view that “There is nothing such as PRACTICALS” and may skip this section altogether. There are two points of view from the authors which will be put forward here. First, with the decreasing cost of computers and availability of Open Source Software, OSS, see Appendix A, there is no need for calculator-based practicals. Also within the purview of a computer lab, a Statistics student/expertise needs to be more familiar with software such as R and SAS among others. Our second point of view is that the integration of theory with applications can be seamlessly achieved using the software modules.

It is apparently clear with the exponential growth of technology that the days of separate sessions for practicals of are a bygone era, and it's not an intelligent proposition to hang onto a weak rope, and blame it for our fall. It has been observed that in many of the developed Departments of the subject, calculator-based computations/practicals session have been done away with altogether. It is also noticed that many Statistical institutes do not teach C++/Fortran programming languages even at a graduate course, and a reason for this may be that statisticians need not necessarily be software programmers. There are many additional reasons for this reluctance. A practical reason is that computers have become very much cheaper, and if not within the financial reach of the students (especially in the developing countries), computing machines are easily available in most of their institutes. It is more often the case that the student has access to at least a couple of hours per week at her institute.

The availability of subject-specific interpretative software has also minimized the need of writing explicit programs for most of the standard practical methods in that subject. For example, in our Statistics subject, there are many software packages such as SAS, SYSTAT, STATISTICA, etc. Each of these contains inbuilt modules/menus which enable the user to perform most of these standard computations in a jiffy, and as such the user need not develop the programs for the statistical techniques in the applied area such as Linear Regression Analysis, Multivariate Statistics, among other topics of the subject.

It is true that one of the driving themes of this book is to convey as many ideas and concepts, both theoretical and practical, through a mixture of software programs and mathematical rigor. This aspect will become clear as the reader goes deeper into the book and especially through the asterisked sections or subsections. In short, this...

Erscheint lt. Verlag 15.3.2016
Sprache englisch
Themenwelt Mathematik / Informatik Informatik
Mathematik / Informatik Mathematik Computerprogramme / Computeralgebra
Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Schlagworte categorical data analysis • Computational & Graphical Statistics • Data Analysis • Data Visualization • Datenanalyse • Experimental Design • Exploratory data analysis • linear regression model • Monte Carlo Markov Chain • multivariate analysis • Probability Theory • r basics • Rechnergestützte u. graphische Statistik • R (Programm) • Statistical Inference • Statistical Software / R • Statistics • Statistik • Statistiksoftware / R
ISBN-10 1-119-15275-5 / 1119152755
ISBN-13 978-1-119-15275-0 / 9781119152750
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich