Skip to content

Configuration Matrices

Nils Reiter edited this page May 25, 2018 · 37 revisions

2.1.0

This guide is also available as a vignette in the R console: vignette(Configuration-Matrices).


Matrices

As configuration matrices (Pfister 1988) are dependent on information about acts and scenes, we need to load the texts in such a way that this information is present. The function load.text2() does exactly that. Alternatively, one can use the dataset rksp.0.mtext.

require(DramaAnalysis)
data(rksp.0)

colnames(rksp.0$mtext)
##  [1] "corpus"                   "drama"                   
##  [3] "begin.Act"                "end.Act"                 
##  [5] "Number.Act"               "begin.Scene"             
##  [7] "end.Scene"                "Number.Scene"            
##  [9] "begin"                    "end"                     
## [11] "Speaker.figure_surface"   "Speaker.figure_id"       
## [13] "Token.surface"            "Token.pos"               
## [15] "Token.lemma"              "length"                  
## [17] "Mentioned.figure_surface" "Mentioned.figure_id"

In addition to the regular text and speakers, this table also contains informatio about the scenes and acts in which each token is spoken.

c <- configuration(rksp.0$mtext)
c$matrix
##          1    2    3    4    5
##  [1,] 2947    0  965  809  832
##  [2,]   42    0    0    0    0
##  [3,]  764    0    0    0    0
##  [4,] 1062  527 1857 1377  837
##  [5,]  106    0    0    0    0
##  [6,]    0 1264  641  232    0
##  [7,]    0  366    0    0    0
##  [8,]    0  622    0  765 2011
##  [9,]    0  416  262    0    0
## [10,]    0 1189  417    0  757
## [11,]    0 1133    0    0    0
## [12,]    0    0  195    8    0
## [13,]    0    0    0 2962    0
c$figure
##  [1] DER PRINZ        DER KAMMERDIENER CONTI            MARINELLI       
##  [5] CAMILLO ROTA     CLAUDIA GALOTTI  PIRRO            ODOARDO GALOTTI 
##  [9] ANGELO           EMILIA           APPIANI          BATTISTA        
## [13] ORSINA          
## 13 Levels: ANGELO APPIANI BATTISTA CAMILLO ROTA CLAUDIA GALOTTI ... PIRRO

This creates a basic configuration matrix, but instead of just containing the presence or absence of a figure, it contains the number of spoken tokens for each act for each figure.

We can use this easily to create a stacked bar chart, showing the distribution visually (you probably need a color palette with more than ten colors … )

par(mar=c(2,2,2,10))
barplot(c$matrix, 
        legend.text = c$figure, # legend text
        args.legend = list(cex=0.5, # legend font size
                           x=7.5, # legend x position
                           y=max(colSums(c$matrix)) # legend y pos
                        ), 
        col=qd.colors)

Since each act has a different length, it would be useful to normalize each block, according to the total number of spoken tokens. This way, we can display the relative active presence of each figure in each act. We normalize by dividing by the sum of each column.

c$matrix <- scale(c$matrix, center=FALSE, scale=colSums(c$matrix))
c$matrix
##                 1          2          3           4         5
##  [1,] 0.598862020 0.00000000 0.22250404 0.131480579 0.1875141
##  [2,] 0.008534851 0.00000000 0.00000000 0.000000000 0.0000000
##  [3,] 0.155252997 0.00000000 0.00000000 0.000000000 0.0000000
##  [4,] 0.215809795 0.09552293 0.42817616 0.223793272 0.1886410
##  [5,] 0.021540337 0.00000000 0.00000000 0.000000000 0.0000000
##  [6,] 0.000000000 0.22911002 0.14779802 0.037705184 0.0000000
##  [7,] 0.000000000 0.06634040 0.00000000 0.000000000 0.0000000
##  [8,] 0.000000000 0.11274243 0.00000000 0.124329595 0.4532342
##  [9,] 0.000000000 0.07540330 0.06041042 0.000000000 0.0000000
## [10,] 0.000000000 0.21551568 0.09614941 0.000000000 0.1706108
## [11,] 0.000000000 0.20536523 0.00000000 0.000000000 0.0000000
## [12,] 0.000000000 0.00000000 0.04496196 0.001300179 0.0000000
## [13,] 0.000000000 0.00000000 0.00000000 0.481391191 0.0000000
## attr(,"scaled:scale")
##    1    2    3    4    5 
## 4921 5517 4337 6153 4437
par(mar=c(2,2,2,10))
barplot(c$matrix, 
        legend.text=c$figure, # set legend text
        args.legend = list(cex=0.5, # legend font size
                           x=7.5, # legend x position
                           y=max(colSums(c$matrix)) # legend y pos
                        ),
        col=qd.colors)

Copresence

Configuration matrices are also often used to get an overview of who is copresent on stage. First, we create a configuration matrix that only represents presence or absence of a figure (and we switch to scenes). Obviously, the resulting matrix has many more columns.

c <- configuration(rksp.0$mtext, onlyPresence = TRUE, by="Scene")

# to see the matrix (not shown here):
# c$matrix 

Creating a co-occurrence matrix is a simple matter of matrix multiplication:

rksp.0.co <- c$matrix %*% t(c$matrix)
rksp.0.co
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]   17    2    2    9    1    0    0    2    0     2     0     1     1
##  [2,]    2    2    0    0    0    0    0    0    0     0     0     0     0
##  [3,]    2    0    2    0    0    0    0    0    0     0     0     0     0
##  [4,]    9    0    0   19    0    3    1    4    1     4     2     3     3
##  [5,]    1    0    0    0    1    0    0    0    0     0     0     0     0
##  [6,]    0    0    0    3    0   13    3    3    0     3     4     2     1
##  [7,]    0    0    0    1    0    3    4    1    1     0     1     0     0
##  [8,]    2    0    0    4    0    3    1   12    0     2     0     0     3
##  [9,]    0    0    0    1    0    0    1    0    2     0     0     0     0
## [10,]    2    0    0    4    0    3    0    2    0     7     1     1     0
## [11,]    0    0    0    2    0    4    1    0    0     1     5     0     0
## [12,]    1    0    0    3    0    2    0    0    0     1     0     4     0
## [13,]    1    0    0    3    0    1    0    3    0     0     0     0     6
# add figure names
rownames(rksp.0.co) <- c$figure
colnames(rksp.0.co) <- c$figure

As Heatmap

This can be visualised in a simple heat map:

# since it's a square matrix, we don't need the bottom right triangle
# and diagonales.
rksp.0.co[lower.tri(rksp.0.co,diag=TRUE)] <- NA

par(mar=c(10,10,1,1)) # plot margins
image(rksp.0.co, 
      col = rgb(64,111,184, alpha=(seq(0,255)),
                maxColorValue = 256),
      xaxt= "n",  # no x axis
      yaxt= "n",  # no y axis
      frame=TRUE  # print a frame around the heatmap
      )
# add the x axis
axis(1, at = seq(0,1,length.out = length(c$figure)), labels = c$figure, las=3)
# add the y axis
axis(2, at = seq(0,1,length.out = length(c$figure)), labels = c$figure, las=1)

Apparently, Marinelli and Der Prinz have the most shared scenes. Marinelli also shares a scene with most other figures (vertical bar).

As Network

… or even a co-occurrence network, using the package igraph. A nice introduction in igraph can be found in (Arnold and Tilton 2015), particularly for literary networks.

require(igraph)
## Warning: package 'igraph' was built under R version 3.4.4

Technically, the matrix we created before is an adjacency matrix. It is therefore simple to convert it to a graph, and igraph offers the function graph_from_adjacency_matrix() for this.

g <- graph_from_adjacency_matrix(rksp.0.co, 
                                 weighted=TRUE,     # weighted graph
                                 mode="undirected", # no direction
                                 diag=FALSE         # no looping edges
                                )

# Now we plot
plot.igraph(g, 
            layout=layout_with_gem,       # how to lay out the graph
            main="Co-Occurrence Network: Emilia Galotti",  # title
            vertex.label.cex=0.6,         # label size
            vertex.label.color="black",   # font color
            vertex.color=qd.colors[4],    # vertex color
            vertex.frame.color=NA,        # no vertex border
            edge.width=E(g)$weight        # scale edges according to their weight
            )  

Graph Export

As a final step, one might want to further work on the graph using Gephi, or other tools. In order to do so, one can export the graph into an appropriate file:

write_graph(g, 
            "rksp.0.graphml",
            format="graphml")

This results in a file called rksp.0.graphml, that starts similarly as this:

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
         http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<!-- Created by igraph -->
  <key id="v_name" for="node" attr.name="name" attr.type="string"/>
  <key id="e_weight" for="edge" attr.name="weight" attr.type="double"/>
  <graph id="G" edgedefault="undirected">
    <node id="n0">
      <data key="v_name">DER KAMMERDIENER</data>
    </node>
    <node id="n1">
      <data key="v_name">DER PRINZ</data>
    </node>
    ...

This file can be opened with Gephi.

References

Arnold, Taylor, and Lauren Tilton. 2015. Humanities Data in R. Springer International Publishing. https://doi.org/10.1007/978-3-319-20702-5.

Pfister, Manfred. 1988. The Theory and Analysis of Drama. Translated by John Halliday. European Studies in English Literature. Cambridge University Press. https://doi.org/10.1017/CBO9780511553998.

Clone this wiki locally