Introduction into R

What is R?

Intro into R?

R is an open source interpreted programming language and software environment for statistical computing and graphics. It is commonly used among statisticians, data miners, and data scientists for developing statistical software and data analysis. R supports procedural programming  with functions and, in some cases, object-oriented programming with some generic functions.

RStudio

Data types

In R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. Such as:

  • Vectors

x<-c(3,4,5)

  • Lists

x=list(c(2,3,4), c("aa","bb","cc"), c(TRUE,FALSE,TRUE), 3)

  • Matrices

a=matrix(c(2,3,4), nrow = 3, ncol = 3, byrow=TRUE)

  • Arrays

res<-array(c(c(5,6,3),c(10,12,13,14)), dim=c(3,3,2))

  • Factors

fdata = factor(c(1,2,3,4,4,5))

  • Data frames

df = data.frame(c(1,2,3),c("aa","bb","cc"),c(TRUE,FALSE,TRUE))

Variables

A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, group of atomic vectors or a combination of many R-objects. A valid variable name consists of letters, numbers and the dot or underline characters. The variable name starts with a letter or the dot not followed by a number.

The variables can be assigned values using leftward, rightward and equal to operator. The values of the variables can be printed using print() or cat() function. The cat() function combines multiple items into a continuous print output.

var.1 = c(0,1,2,3)

var.2 <- c("learn"."R")

c(TRUE,1) -> var.3

print(var.1)

cat("var.1 is",var.1,"\n")

cat("var.2 is",var.2,"\n")

cat("var.3 is",var.3,"\n")

Data categories

Working with statistics, it is important to recognize the different types of data: numerical (discrete and continuous), categorical, and ordinal. Data are the actual pieces of information that you collect through your study. For example, if you ask five of your friends how many pets they own, they might give you the following data: 0, 2, 1, 4, 18. Not all data are numbers; let’s say you also record the gender of each of your friends, getting the following data: male, male, female, male, female.

Most data fall into one of two groups: numerical or categorical.

Numerical data. These data have meaning as a measurement, such as a person’s height, weight, IQ, or blood pressure; or they are a count, such as the number of stock shares a person owns, how many teeth a dog has, or how many pages you can read of your favorite book before you fall asleep. (Statisticians also call numerical data quantitative data.)

Numerical data can be further broken into two types: discrete and continuous.

Discrete data represent items that can be counted; they take on possible values that can be listed out. The list of possible values may be fixed (also called finite); or it may go from 0, 1, 2, on to infinity

Continuous data represent measurements. Their possible values cannot be counted and can only be described using intervals on the real number line. 

> x = 10.5

>x

[1] 10.5

> class(x)

[1] "numeric"

> k = 1

> k

[1] 1

>class(k)

[1] "numeric"

Categorical data represent characteristics such as a person’s gender, marital status, hometown, or the types of movies they like. 

Categorical data can take on numerical values, but those numbers do not have mathematical meaning. 

There is also and Ordinal data, which mixes numerical and categorical data. The data fall into categories, but the numbers placed on the categories have meaning.

> x = "cat1"

> x

[1] "cat1"

> class(x)

[1] "categorical"

> x = "string"

> x

[1] "string"

> class(x)

[1] "categorical"