R-Codes/MakeUse_Coefficients.Rmd at master · DuggalM/R-Codes · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: "Make and Use Coefficients"
author: "Mausam Duggal"
date: "September 2, 2016"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r init, echo=FALSE, message=FALSE}
library(dplyr); library(knitr); library(reshape2); library(ggplot2)

wd <- setwd("c:/personal/r")
```

### Calculate the Make and Use coefficients for the Commodity Flow Model.

Initially, the intention was to be able to use Tom's IO excel sheet to develop the tables. This proved to be difficult as the raw files provided by Tom were not readily usable for creating the tables. Further, Tom's raw data would not have allowed the creation of the Use table.

Following the above, we decided to use the Make and Use coefficients developed for the Oregon model, with some modification to make it applicable to **TRESO**.

Rick and I allocated NAICS 2 categories against the **Activity** column in the Oregon excel file. Further, we then created a new column called **Comm**. In this column, all records that in the **Commodity** field had a value that did not start with *SCTG_* were given the *SCTG260* code, as these represent non-transportable goods.

```{r I am sick and tired of NAs so a function to set them for good}

f_rep <- function(df) {
  # this function is used to set all NA values to zero in a dataframe
df[is.na(df)] <- 0
return(df)
}

```


```{r read Oregon MakeUse file}
oregon <- read.csv("MakeUse_Oregon.csv", stringsAsFactors = FALSE)
```

#### Make the two tables.

The tables contain categories such as *hh* and *SCTG60* (explained before), which have been removed to generate tables that only have NAICS and standard SCTG2 defintions.

```{r USE Table }
#' creat USE table first
use <- oregon %>% subset(., MorU == 'U') %>% group_by(Comm, NAICS.Code) %>%
  summarise(coeff = sum(AbsCoeff)) %>% subset(., !grepl("^S", .$NAICS.Code)) %>%
  subset(., NAICS.Code !='hh' & Comm != "SCTG60")
#' set NAs to zero
use <- f_rep(use)

# cast into wide format
use_wide <- dcast(use, Comm ~ NAICS.Code, value.var = "coeff")
use_wide <- f_rep(use_wide)

#' test that the use rows sum to 1 or less than that.
use_wide$Total <- rowSums(use_wide[2:20])

#' the row sums need to sum up to 1. Given that the Oregon data also had non-transportation SCTG2 codes, I have scaled the values to sum to 1.
use_wide1 <- use_wide[, -1]
rownames(use_wide1) <- use_wide[, 1]

use_wide2 <- use_wide1/use_wide1$Total
use_wide2$Total <- rowSums(use_wide2[1:19])

```


```{r MAKE Table }
#' creat MAKE Table
make <- oregon %>% subset(., MorU == 'M') %>% group_by(Comm, NAICS.Code) %>%
  summarise(coeff = sum(AbsCoeff))  %>% subset(., !grepl("^S", .$NAICS.Code)) %>%
  subset(., NAICS.Code !='hh' & Comm != "SCTG60")

#' set NAs to zero
make <- f_rep(make)

# cast into wide format
make_wide <- dcast(make, Comm ~ NAICS.Code, value.var = "coeff")
make_wide <- f_rep(make_wide)

# transpose to get NAICS on the rows and SCTG2 on columns
make_wide <- as.data.frame(t(make_wide))

# get column names and then set them in the dataset
cols <- t(make_wide[1,])

make_wide1 <- make_wide[-1, ]
colnames(make_wide1) <- t(cols)

```

### Now Plot the results.

**USE** table first.

```{r}
# plot the Use table. This plot includes the commodities as well in the columns, as opposed to only NAICS
use1 <- subset(use, Comm != "SCTG60" & NAICS.Code != "hh")
ggplot(use1, aes(y = coeff, x = factor(Comm), fill = factor(NAICS.Code))) + geom_bar(stat = "identity") + coord_flip()

```

The second plot strips out all **SCTG** records from the NAICS. Thus, this is the purest form of the USE table i.e SCTG2 commodities on the rows and NAICS in the columns

```{r}
# plot the Use table. This plot strips out any SCTG2 in the NAICS.Code column
use2 <- subset(use1, !grepl("^S", use1$NAICS.Code))
ggplot(use2, aes(y = coeff, x = factor(Comm), fill = factor(NAICS.Code))) + geom_bar(stat = "identity") + coord_flip()

```


#### **Make** table second.

The first plot excludes the MAKE for the **hh (households)** category of NAICS, and the **SCTG60 (non-transportable)** commodity

```{r}
# plot the Make table. This plot includes the non-transportable commodities as well in the rows, as opposed to only NAICS
make1 <- subset(make, Comm != "SCTG60" & NAICS.Code != "hh")
ggplot(make1, aes(y = coeff, x = factor(NAICS.Code), fill = factor(Comm))) + geom_bar(stat = "identity") + coord_flip()

```

The second plot strips out the MAKE for **SCTG** commodities. Thus, this is the purest form of the MAKE table i.e NAICS in the rows and SCTG2 commodities on the columns

```{r}
# plot the Use table. This plot strips out any SCTG2 in the NAICS.Code column
make2 <- subset(make1, !grepl("^S", make1$NAICS.Code))
ggplot(make2, aes(y = coeff, x = factor(NAICS.Code), fill = factor(Comm))) + geom_bar(stat = "identity") + coord_flip()

```