Title: Crawling Wikipedia Graph
Collaborators: Maria Mitkina (Foster Bussiness School), and Chase Gottlich (Institute of Health Metrics & Evaluation)
Abstract:
Mining large graphs reveals information; temporal network of the same reveal evolution. However, performing novel algorithms on these large graphs can be computationally expensive. We need methods that can provide an un-biased sample that would be representative of the underlying large network. In this work, we evaluate different random walks by crawling a large online editing network – Wikipedia.
Findings:
- Clustering of the graph associated with high growth in the platform.
- Simple Random Walk is ineffective when sampling graphs with high tailed distribution.
- Re-Weighted Random Walk outperforms other methods for graph sampling.