-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy path02-Objects.Rmd
More file actions
467 lines (335 loc) · 14.8 KB
/
02-Objects.Rmd
File metadata and controls
467 lines (335 loc) · 14.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
---
title: "2.1 - Objects"
author: "Jonny Saunders"
date: "October 5, 2017"
output:
md_document:
preserve_yaml: true
toc: true
toc_depth: 2
order: 2
---
# Objects
## What are Objects?
Objects are, roughly, data (or more generally a stored state) that knows what it can do.
We know what happens when we put this troublesome `+` guy between numbers
```{r}
1 + 1 # ud better b sitting down
```
But it's less clear what it means to `+` letters
```{r, error=TRUE}
"a" + "b"
```
Let's see what `typeof` variables `1` and `"a"` are :
```{r}
typeof(1)
typeof("a")
```
>(note this is a little misleading, `typeof` determines the base object class that an R object is stored as. All R objects are composed of base objects, we'll get to the types of objects in the next section)
R has a useful package `pryr` for inspecting objects and other meta-linguistic needs. Let's get that now.
```{r}
# install.packages("pryr")
```
### Object terminology
A **class** is the description, or 'blueprint' of how individual **objects** or **instances** are made, including their **attributes** - which data should be kept and what it should be named, and **methods**, the functions that they are capable of calling on their stored data or attributes. Objects can have a nested structure, and sub-classes can **inherit** the attributes and methods of their parent classes.
For example: As a class, trucks have attributes like engine_size, number_of_wheels, or number_of_jumps_gone_off. Trucks have the method go_faster(), but only individual instances of trucks can go_faster() - the concept/class of trucks can't. As a subclass, monster_trucks also have the attributes engine_size, etc. and the method go_faster(), but they also have additional attributes like mythical_backstory and methods like monster_jam().
## Objects in R
>"In R functions are objects and can be manipulated in much the same way as any other object." - *R language guide 2.1.5*
>"S3 objects are functions that call the functions of their objects" - *Also R*
### Object Systems
R has base types and three object-oriented systems (also called **types**). We'll spend more time on Base types and S3 objects in this lesson, and return to S4 and reference classes when we start building bigger code.
* **Base types:** Low-level C types. Build the other object systems.
* **S3 - "Casual objects":** Objects that use **generic functions**. S3 methods "belong to" functions, not classes. Functions contain the **UseMethod("function_name", object)** function (see `?UseMethod`).
* **S4 - "Formal objects":** Formal classes with inheritance and means by which methods can be shared between classes. S4 methods still "belong to" functions, but classes are more rigorously defined.
* **Reference classes:** Objects that use **message passing** - or the method finally 'belongs to' the class rather than a function.
The easiest way to see everything about an object is to use the str() function, short for structure. For example we can see everything about the lamest linear model ever
```{r}
lame_model <- lm(c(1,2,3) ~ c(4,5,6))
str(lame_model)
```
We can query any object's base type with pryr's `otype`
```{r}
pryr::otype(c(1,2,3,4,5)) # A vector is a base object
pryr::otype(data.frame(x=c(1,2,3,4,5))) # A dataframe is an S3 object
pryr::otype(lame_model) # as is our lame model.
```
and its class with `class`
```{r}
class(c(1,2,3,4,5))
class(lame_model)
```
Confusingly, R's object system means that a given object will have both a **class** and a **type**, for example:
```{r}
pryr::otype(c(1)) # The vector "c(1)" is a base type object
class(c(1)) # whose class is 'numeric'
```
### Attributes
Object can also have arbitrarily many **attributes**. The most important and common are
* **`names`** - which give the object the ability to refer to its elements by name. for example:
```{r}
# We can construct named vectors like this
named_vector <- c("apples"=1, "bananas"=2, "cherries" = 3)
names(named_vector)
named_vector["apples"]
# or this
named_vector <- c(1,2,3)
names(named_vector) <- c("apples", "bananas", "cherries")
```
* **`class`** - which is used by the S3 object system, we'll see that in a moment
* **`dim`** - short for dimensions, which is used by multidimensional base objects. We'll see that in a moment too.
You can query a specific attribute with `attr`
```{r}
attr(named_vector, "names")
```
or list all attributes with `attributes`
```{r}
attributes(named_vector)
```
## Base Types
Every R object is built out of basic C structures that define how it is stored and managed in memory.
This table from [Advanced R](http://adv-r.had.co.nz/Data-structures.html#data-structures) summarizes them:
| | Homogenous data | Heterogenous data |
|------------|----------------| ------------------|
| 1-Dimensional | Atomic Vector | List |
| 2-Dimensional | Matrix | Data frame |
| N-Dimensional | Array | |
Recall that we can use `typeof()` to find an object's base type
```{r}
typeof(1)
typeof(list(1,2,3))
```
### Vectors
Vectors are sequences, the most basic data type in R. They have two varieties: **atomic vectors** (with homogenous values) and **lists** (with ... heterogenous values).
R has no 0-dimensional, scalar types, so individual characters or numbers are length=one atomic vectors. They are:
| Atomic Vector Type | Example |
| ------------------ | ------- |
| Logical | `booleans <- c(TRUE, FALSE, NA)` |
| Integer | `integers <- c(1L, 2L, 3L)` |
| Double (== `numeric`) | `doubles <- c(1, 2.5, 0.005)` |
| Character | `characters <- c("apple", "banana")` |
`raw` and `complex` types also exist, but they are rare.
Vectors are constructed with `c()`. When heterogeneous vectors are constructed with `c()`, they are **coerced** to the most permissive vector type (an integer can be both a double (floating point numbers with decimal points) and character "1") - the table above is ordered from least to most permissive.
```{r}
vect_1 <- c(1L, 2L, 3L)
vect_2 <- c(1L, 2L, 3)
vect_3 <- c(1L,2,"3")
typeof(vect_1)
typeof(vect_2)
typeof(vect_3)
# We select elements of vectors with [] notation
vect_1[1]
vect_3[1]
```
Each of the different atomic vector types has different methods (we'll come back to how methods work in a bit), which explains why we can `1 + 1` but not `"1" + "1"`. Notice how the `integer` class has a set of methods called "Arith" (see `?Arith`, an S4 group of generic functions, something we won't talk about until section 5) but `character` doesn't.
```{r}
methods(class="integer")
methods(class="character")
```
To make a vector that preserves the types of its elements, make a `list` instead
```{r}
a_list <- list(1L,2,"3")
a_list
typeof(a_list[[1]])
typeof(a_list[[2]])
typeof(a_list[[3]])
```
Notice the double bracket notation `[[]]`. Lists are commonly recursive, ie. they store other lists. Since the elements of our list are themselves lists, single bracket indexing `[]` returns lists, and `[[]]` returns the the elements in that list.
```{r}
is.recursive(a_list)
a_list[1]
typeof(a_list[1])
# Indexing recursive lists
b_list <- list(1:3, c("apple", "banana", "cucumber"))
b_list
b_list[1] # gets the first list
b_list[1][1] # doesn't work
b_list[[1]] # gets the contents of the first list
b_list[[1]][1] # gets the first element of the contents of the first list
```
Similarly to coersion among atomic vectors, vectors that contain lists will be coerced to lists.
```{r}
c(1,2,3)
c(c(1),c(2,3)) # vectors can't be recursive, so they get flattened
c(c(1),list(2,3)) # if one element is a list, the whole object will become a list
list(c(1,2,3), c("a","b","c"))
# Unlist turns lists back into (flat) atomic vectors
unlist(list(c(1,2,3), c("a","b","c")))
```
Because they are the most general form of vector, lists are used as the base type for many derived classes, like data frames
```{r}
typeof(data.frame(c(1,2,3)))
```
### Matrices & Arrays
**Arrays** are atomic vectors with a `dim` attribute. **Matrices** are arrays with `dim = 2`.
```{r}
# General way of making arrays
array_1 <- array(1:24, dim=c(2,3,4))
array_1
typeof(array_1)
attributes(array_1)
# Matrices have their own syntax
array_2 <- matrix(1:24, ncol=3, nrow=8)
array_2
# A vector can be made an array afterwards by setting the 'dim' attribute
array_3 <- c(1:24)
dim(array_3) <- c(2,3,4)
# or attr(array_3, "dim") <- c(2,3,4)
array_3
```
In higher dimensions, c() becomes `cbind(), rbind()`, and `abind()`; column and row bind for matrices and array bind for arrays.
```{r}
by_columns <- cbind(c(1,2,3), c(4,5,6), c(7,8,9))
by_columns
by_rows <- rbind(c(1,2,3), c(4,5,6), c(7,8,9))
by_rows
abind::abind(by_columns, by_rows, along=1)
abind::abind(by_columns, by_rows, along=2)
abind::abind(by_columns, by_rows, along=3)
```
Arrays and matrices also have new methods that lists and vectors dont.
```{r}
methods(class="list")
methods(class="matrix")
```
### Data Frames
Data frames are one of the gems of R. A data frame is a list of equal length vectors.
```{r}
df <- data.frame(little_ones = c(0,1,2,3,4),
big_ones = c(5,6,7,8,9))
df
attributes(df)
```
data frames can be used like lists of vectors
```{r}
df[1]
df[[1]]
df[[1]][1] # as above
```
Or using `names` with the `$` operator (see `?Extract` for more information).
```{r}
names(df)
colnames(df)
rownames(df)
df$little_ones
df$big_ones
```
Data frames also inherit the methods of lists and vectors
```{r}
df2 <- data.frame(medium_ones = c(3,4,5,6,7))
cbind(df, df2)
df_squared <- cbind(df2, df2)
names(df_squared) <- names(df)
rbind(df,df_squared)
```
### Etc.
Functions, environments, and other stuff that we'll learn about in our section on Functions are also base objects, but we'll discuss them then.
## S3 Objects
S3 objects "belong to" functions, which become their methods. S3 classes don't really "exist," but are assigned as an object's "class" attribute. S3 classes are one of the worst things about R, but are also responsible for some of its flexibility.
```{r, echo=TRUE}
x <- 1
attr(x, "class")
class(x) <- "letters"
attr(x, "class")
```
One can find an articulation of the reasoning behind this "function-and-class" programming can be found here: https://developer.r-project.org/howMethodsWork.pdf. We'll talk more about this in later sections.
S3 objects are defined by a series of functions that themselves contain the `UseMethod()` function - this is described briefly above, try `?UseMethod` for more detail. These functions extend the generic function, typically using the syntax `generic.class()` as in the case of `mean.Date()` for taking the mean of dates. One can list the objects that have a generic method, and the methods that an object has with `methods()`
```{r}
methods(mean) # list the classes that have this method
methods(class="Date") # list the methods that this class has
```
By default, the source code of S3 methods is not visible to R, one can retreive it with `utils::getS3method``
The `plot` base function is an s3 generic method.
```{r}
pryr::ftype(plot) # get a function's type
```
By default, if the first argument is a base type compatible with being points on a scatterplot, the actual function that is called is `plot.default`, whose source behaves like you'd expect:
```{r}
plot.default
```
If the first argument to `plot` has its own `plot` method (ie. that it is exported by the object's package namespace, more about this in section 5), that function is called instead. That's why
```{r}
aq <- datasets::airquality
plot(lm(Ozone ~ Month, data=aq))
```
is different than this nonsensical model
```{r}
plot(lme4::lmer(Ozone ~ 0 + (Day | Month), data=aq))
```
### Example: Extending S3 Objects
> http://adv-r.had.co.nz/OO-essentials.html "Creating new methods and generics"
Using a class's method is what allows us to do sensible computations on different types of objects with the same command.
```{r, error=TRUE}
pryr::ftype(mean) # mean is an s3 generic function
x <- 1
class(x) <- "just_one"
# We give our "just_one" class a mean method:
mean.just_one <- function(x, ...) print("that's just a one you maniac")
# Mean behaves like it should for numbers and lists of numbers
mean(1)
mean(c(1,1.5))
mean(x) # Even though "x" is just 1, because we have given it a "class" attribute it calls mean.just_one rather than mean.default
# Other objects have their own mean() method
methods(mean)
# like Date objects
dates <- c("01jan2000","15jan2000")
attr(dates,"class")
mean(dates) # Don't work
# turn it into "Date" object
dates <- as.Date(dates, "%d%b%Y") # base has a set of "as" methods to convert types
attr(dates,"class")
mean(dates) # will call its method
mean.Date(dates) # which can also be called directly
```
## S4 Objects
S4 objects have a single class definition with specifically defined fields and functions. They are too complicated for us to cover in much detail yet, so we will return to them again later.
We could pretend for awhile we're another class with S3 objects
```{r, error=TRUE}
a <- data.frame(a="test")
class(a)
class(a) <- "lm"
a
summary(a)
```
Not so with S4 objects.
We can finally implement our truck classes
```{r, eval=FALSE}
setClass("truck",
slots = list(engine_size = "numeric",
n_wheels = "numeric",
n_jumps = "numeric"))
setClass("monster_truck",
slots = list(mythical_backstory = "character"),
contains = "truck")
getClass("monster_truck")
```
S4 objects have `slots`, accessible with `@` (which behaves like `$`) or `slot()`. We create new instances of S4 objects with `new()`
```{r, eval=FALSE}
my_truck <- new("truck", engine_size = 4, n_wheels = 4, n_jumps = 40)
my_truck@engine_size
slot(my_truck, "n_jumps")
```
S4 Methods are a headache (and we will skip them in the class). One has to create a generic function if it does not yet exist with `setGeneric()`, then set the method, classes and function separately with `setMethod()`. An example for your edification:
```{r, eval=FALSE}
setGeneric("go_faster", function(which_truck, how_fast) {standardGeneric("go_faster")})
setMethod("go_faster",
signature = c(which_truck = "truck",
how_fast = "character"),
function(which_truck, how_fast){
print("your truck is now going:")
print(how_fast)
print("in MPH:")
print(which_truck@engine_size * 4)
}
)
go_faster(my_truck, "2 fast 4 u 2 c")
```
Try extending that to have the monster trucks tell their mythical_backstory as they accelerate.
We will return to S4 objects in more detail in section 5.
## Reference Classes
References classes are a "truly" object oriented system in R, but we are going to skip them entirely for now because they are rare enough that you aren't likely to encounter them yet. See here for more information: http://adv-r.had.co.nz/OO-essentials.html#rc
## References
* http://manuals.bioinformatics.ucr.edu/home/programming-in-r#TOC-Object-Oriented-Programming-OOP-
* http://www.stat.ucla.edu/%7Ecocteau/stat202a/resources/docs/S4Objects.pdf
* http://adv-r.had.co.nz/OO-essentials.html
* http://adv-r.had.co.nz/Data-structures.html
-------------------