-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathhappysadsleepymad.html
More file actions
194 lines (175 loc) Β· 11.1 KB
/
happysadsleepymad.html
File metadata and controls
194 lines (175 loc) Β· 11.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
<!DOCTYPE html>
<html lang="en">
<head>
<script
src="https://code.jquery.com/jquery-3.1.1.min.js"
integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8="
crossorigin="anonymous">
</script>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta charset="UTF-8">
<title> Minna K-T </title>
<link href="./happysadsleepymad.css" rel="stylesheet" type="text/css">
<link rel="icon"
type="image/png"
href="./images/favicon/favicon.ico" />
</head>
<body>
<div class = "wrapper">
<div class = "confetti"> </div>
<div class = "container">
<div class = "header">
<div class = "name">
<a href="./index.html">MINNA KIMURA-THOLLANDER</a>
</div>
</div>
<div class = "content">
<div class="intro">
<div class = "title-big">HappySadSleepyMad ππ’π΄π€</div>
<div class = "date"> May 2019 | Data Analysis and Data Viz</div>
<div class = "summary">
if you've been on my page for a little while you might have noticed that
I really like emojis. Emojis have become a staple of modern conversation that
transcends languages, used all over the world.
</div>
<div class = "summary">
HappySadSleepyMad was a semester-long project for Brown's "Data Science" course.
Our goal was not to merely compare the meaning of emojis across different
languagesβ because let's face it, π probably means the same thing around the
worldβ but to detect the nuances in emoji usage.
</div>
<div class = "summary">
Our team consisted of 4 people, each of us representing one emoji in the project name!
<ul>
<li> Katherine Sang β π </li>
<li> Iris (YunYun) Yao β π’</li>
<li> Maggie Wu β π΄</li>
<li> Me β π€</li>
</ul>
</div>
</div>
<div class = "paragraph">
<div class = "title">Data Collection</div>
<div class = "summary">
Data collection was probably the most important part of the project, because it forms
the foundation for everything that follows. Though we originally planned to scrape
from multiple sources such as Facebook, Instagram, and Twitter, we quickly learned that
Facebook has strict restrictions on their data policy which ruled Facebook and Instagram out.
</div>
<div class = "summary">
In the end, we settled for Twitter, which, by the way, also required us to request an API key,
and we failed to receive a response for the entire semester. We ended up acquiring an API key through
some less-than-scrupulous methods... but we had good intentions!
</div>
<div class = "summary">
We scraped Twitter for tweets containing emojis for one month from March to April, collecting nearly 300,000 tweets
during that time. We then inserted our data into a SQLite DB, and it looked a little like this:
</div>
<img src = "./images/happysadsleepymad/database.PNG" width="800px">
<div class = "summary">
As you can see, we collected information such as username, region, sentiment, and iteraction
counts. You can also see that the data is not exactly clean because not every Twitter user utilizes
geo-tagging... so we also had to deal with that. π
</div>
</div>
<div class = "paragraph">
<div class = "title"> Sentiment Analysis</div>
<div class = "summary">
Working with only English-language tweets, we used TextBlob (a sentiment analysis library)
and K-Means with 3 clusters to determine which emojis are positive, negative,
and neutral.
</div>
<div class="summary"> Here our are results:</div>
<div class = "split-box">
<div class = "kmeans">
<img src="./images/happysadsleepymad/kmeans.png" width="400" >
</div>
<div class = "summary-box">
<div class = "summary">
Positive: <br/>
π©π₯πππππππππππππͺπ
πππππ°ππΆπππ₯πππ¨πππ»π€
ππ€πππ€πβπ³π¨β‘π±ππ»πππ
</div>
<div class= "summary">
Negative: <br/>
πππ€£π€π©π€·π€ ππππ’π€π€¦ππ‘π
</div>
<div class = "summary">
Neutral: <br/>
π¨β¨πππΈππ¦π§ππ¦ππ³ππΉπππ°πππ
π«π¨ππ±ππ
π«ππΆπ²ππ·πΆπ³πΎ
</div>
</div>
</div>
<div class = "summary">
Isn't it interesting how π and π€£ are considered negative emojis? Perhaps they're used
in a lot of sarcastic contextsβ but that's a whole different data science project!
</div>
</div>
<div class = "paragraph">
<div class = "title"> Context Analysis</div>
<div class = "summary">
Following sentiment analysis, we performed case studies of context analysis on English, Japanese, and Spanish.
These languages were simply the languages for which we had collected the most tweet data!
</div>
<div class = "summary">
To clarify, context analysis is when you find what words are used in similar contexts. Note that this is slightly different
from synonyms, although synonyms would definitely be used in very similar contexts. For example, if you have the
sentences "Mary walks to the bedroom" and "John goes to the bathroom," then "Mary" and "John" are words
used in similar contexts, but they aren't synonyms.
</div>
<div class = "summary">
We used our tweet data to construct sparse word vectors, which we then used to discover words that were
used in a similar
context for each emoji in ππ’π΄π€. Once we had the most similar words for every emoji, we created radial graphs
using Javascript's D3 library.
</div>
<div class = "image-grid">
<div class="box1"> <p>Happy</p> </div>
<div class="box2"> <p>Sad</p> </div>
<div class="box3"> <p>Sleepy</p> </div>
<div class="box4"> <p>Mad</p> </div>
</div>
<div class = "summary">
Note that <span style="color: #f27a7d; font-family:'Europa-Bold'">RED</span > = English,
<span style="color:#fec83e; font-family:'Europa-Bold'" >YELLOW</span> = Spanish, and
<span style="color: #1fbbee; font-family:'Europa-Bold'">BLUE</span> = Japanese.
The distance between the words and the emoji in the center represents the cosine similarity of contexts.
</div>
</div>
<div class = "paragraph">
<div class = "title">Poster</div>
<div class="summary">
All of this culminated into a final poster, shown below:
</div>
<img src = "./images/happysadsleepymad/hpsm.png" width="800px">
<div class = "summary">
You can also read about our project <a target="_blank" href="https://medium.com/@happysadsleepymad/final-blog-post-f92f24c3d6c5"> here</a>.
<br/> <br/>
Sidenote β our poster received the "Best Visualization" award out of 40 groups!
</div>
</div>
<div class = "paragraph">
<div class = "title">Reflection</div>
<div class = "summary">
This was one of my first long-term project based classes at Brown, and it was a really
rewarding experience! It was perfect intersection between data and design and if I could
I would do it all over again.
</div>
<div class="summary">
Had we had more time, we would have liked to create a Deep Learning model that could
predict the emojis from the text of a tweet. We did attempt this with KNN, SVM, &
DecisionTree, but the models had at most an accuracy of 3 percentβ pretty terrible.
Deep Learning is in itself a different field, but it would be fun to see where this
could take us!
</div>
</div>
<div class = "back2top">
<a href="#top">Back to top of page</a>
</div>
</div>
</div>
</div>
</body>
</html>