-
Notifications
You must be signed in to change notification settings - Fork 3
Configuration Matrices
This guide is also available as a vignette in the R console: vignette(Configuration-Matrices).
As configuration matrices (Pfister 1988) are dependent on information
about acts and scenes, we need to load the texts in such a way that this
information is present. The function load.text2() does exactly that.
Alternatively, one can use the dataset rksp.0.mtext.
require(DramaAnalysis)
data(rksp.0)
colnames(rksp.0$mtext)## [1] "corpus" "drama"
## [3] "begin.Act" "end.Act"
## [5] "Number.Act" "begin.Scene"
## [7] "end.Scene" "Number.Scene"
## [9] "begin" "end"
## [11] "Speaker.figure_surface" "Speaker.figure_id"
## [13] "Token.surface" "Token.pos"
## [15] "Token.lemma" "length"
## [17] "Mentioned.figure_surface" "Mentioned.figure_id"
In addition to the regular text and speakers, this table also contains informatio about the scenes and acts in which each token is spoken.
c <- configuration(rksp.0$mtext)
c$matrix## 1 2 3 4 5
## [1,] 2947 0 965 809 832
## [2,] 42 0 0 0 0
## [3,] 764 0 0 0 0
## [4,] 1062 527 1857 1377 837
## [5,] 106 0 0 0 0
## [6,] 0 1264 641 232 0
## [7,] 0 366 0 0 0
## [8,] 0 622 0 765 2011
## [9,] 0 416 262 0 0
## [10,] 0 1189 417 0 757
## [11,] 0 1133 0 0 0
## [12,] 0 0 195 8 0
## [13,] 0 0 0 2962 0
c$figure## [1] DER PRINZ DER KAMMERDIENER CONTI MARINELLI
## [5] CAMILLO ROTA CLAUDIA GALOTTI PIRRO ODOARDO GALOTTI
## [9] ANGELO EMILIA APPIANI BATTISTA
## [13] ORSINA
## 13 Levels: ANGELO APPIANI BATTISTA CAMILLO ROTA CLAUDIA GALOTTI ... PIRRO
This creates a basic configuration matrix, but instead of just containing the presence or absence of a figure, it contains the number of spoken tokens for each act for each figure.
We can use this easily to create a stacked bar chart, showing the distribution visually (you probably need a color palette with more than ten colors … )
par(mar=c(2,2,2,10))
barplot(c$matrix,
legend.text = c$figure, # legend text
args.legend = list(cex=0.5, # legend font size
x=7.5, # legend x position
y=max(colSums(c$matrix)) # legend y pos
),
col=qd.colors)
Since each act has a different length, it would be useful to normalize each block, according to the total number of spoken tokens. This way, we can display the relative active presence of each figure in each act. We normalize by dividing by the sum of each column.
c$matrix <- scale(c$matrix, center=FALSE, scale=colSums(c$matrix))
c$matrix## 1 2 3 4 5
## [1,] 0.598862020 0.00000000 0.22250404 0.131480579 0.1875141
## [2,] 0.008534851 0.00000000 0.00000000 0.000000000 0.0000000
## [3,] 0.155252997 0.00000000 0.00000000 0.000000000 0.0000000
## [4,] 0.215809795 0.09552293 0.42817616 0.223793272 0.1886410
## [5,] 0.021540337 0.00000000 0.00000000 0.000000000 0.0000000
## [6,] 0.000000000 0.22911002 0.14779802 0.037705184 0.0000000
## [7,] 0.000000000 0.06634040 0.00000000 0.000000000 0.0000000
## [8,] 0.000000000 0.11274243 0.00000000 0.124329595 0.4532342
## [9,] 0.000000000 0.07540330 0.06041042 0.000000000 0.0000000
## [10,] 0.000000000 0.21551568 0.09614941 0.000000000 0.1706108
## [11,] 0.000000000 0.20536523 0.00000000 0.000000000 0.0000000
## [12,] 0.000000000 0.00000000 0.04496196 0.001300179 0.0000000
## [13,] 0.000000000 0.00000000 0.00000000 0.481391191 0.0000000
## attr(,"scaled:scale")
## 1 2 3 4 5
## 4921 5517 4337 6153 4437
par(mar=c(2,2,2,10))
barplot(c$matrix,
legend.text=c$figure, # set legend text
args.legend = list(cex=0.5, # legend font size
x=7.5, # legend x position
y=max(colSums(c$matrix)) # legend y pos
),
col=qd.colors)
Configuration matrices are also often used to get an overview of who is copresent on stage. First, we create a configuration matrix that only represents presence or absence of a figure (and we switch to scenes). Obviously, the resulting matrix has many more columns.
c <- configuration(rksp.0$mtext, onlyPresence = TRUE, by="Scene")
# to see the matrix (not shown here):
# c$matrix Creating a co-occurrence matrix is a simple matter of matrix multiplication:
rksp.0.co <- c$matrix %*% t(c$matrix)
rksp.0.co## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 17 2 2 9 1 0 0 2 0 2 0 1 1
## [2,] 2 2 0 0 0 0 0 0 0 0 0 0 0
## [3,] 2 0 2 0 0 0 0 0 0 0 0 0 0
## [4,] 9 0 0 19 0 3 1 4 1 4 2 3 3
## [5,] 1 0 0 0 1 0 0 0 0 0 0 0 0
## [6,] 0 0 0 3 0 13 3 3 0 3 4 2 1
## [7,] 0 0 0 1 0 3 4 1 1 0 1 0 0
## [8,] 2 0 0 4 0 3 1 12 0 2 0 0 3
## [9,] 0 0 0 1 0 0 1 0 2 0 0 0 0
## [10,] 2 0 0 4 0 3 0 2 0 7 1 1 0
## [11,] 0 0 0 2 0 4 1 0 0 1 5 0 0
## [12,] 1 0 0 3 0 2 0 0 0 1 0 4 0
## [13,] 1 0 0 3 0 1 0 3 0 0 0 0 6
# add figure names
rownames(rksp.0.co) <- c$figure
colnames(rksp.0.co) <- c$figureThis can be visualised in a simple heat map:
# since it's a square matrix, we don't need the bottom right triangle
# and diagonales.
rksp.0.co[lower.tri(rksp.0.co,diag=TRUE)] <- NA
par(mar=c(10,10,1,1)) # plot margins
image(rksp.0.co,
col = rgb(64,111,184, alpha=(seq(0,255)),
maxColorValue = 256),
xaxt= "n", # no x axis
yaxt= "n", # no y axis
frame=TRUE # print a frame around the heatmap
)
# add the x axis
axis(1, at = seq(0,1,length.out = length(c$figure)), labels = c$figure, las=3)
# add the y axis
axis(2, at = seq(0,1,length.out = length(c$figure)), labels = c$figure, las=1)
Apparently, Marinelli and Der Prinz have the most shared scenes. Marinelli also shares a scene with most other figures (vertical bar).
… or even a co-occurrence network, using the package igraph. A nice
introduction in igraph can be found in (Arnold and Tilton 2015),
particularly for literary networks.
require(igraph)## Warning: package 'igraph' was built under R version 3.4.4
Technically, the matrix we created before is an adjacency
matrix. It is therefore
simple to convert it to a graph, and igraph offers the function
graph_from_adjacency_matrix() for this.
g <- graph_from_adjacency_matrix(rksp.0.co,
weighted=TRUE, # weighted graph
mode="undirected", # no direction
diag=FALSE # no looping edges
)
# Now we plot
plot.igraph(g,
layout=layout_with_gem, # how to lay out the graph
main="Co-Occurrence Network: Emilia Galotti", # title
vertex.label.cex=0.6, # label size
vertex.label.color="black", # font color
vertex.color=qd.colors[4], # vertex color
vertex.frame.color=NA, # no vertex border
edge.width=E(g)$weight # scale edges according to their weight
) 
As a final step, one might want to further work on the graph using Gephi, or other tools. In order to do so, one can export the graph into an appropriate file:
write_graph(g,
"rksp.0.graphml",
format="graphml")This results in a file called rksp.0.graphml, that
starts similarly as this:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<!-- Created by igraph -->
<key id="v_name" for="node" attr.name="name" attr.type="string"/>
<key id="e_weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="undirected">
<node id="n0">
<data key="v_name">DER KAMMERDIENER</data>
</node>
<node id="n1">
<data key="v_name">DER PRINZ</data>
</node>
...This file can be opened with Gephi.
Arnold, Taylor, and Lauren Tilton. 2015. Humanities Data in R. Springer International Publishing. https://doi.org/10.1007/978-3-319-20702-5.
Pfister, Manfred. 1988. The Theory and Analysis of Drama. Translated by John Halliday. European Studies in English Literature. Cambridge University Press. https://doi.org/10.1017/CBO9780511553998.