GitHub - mingchungx/Curio: Curio is a research-based, unsupervised topic modelling pipeline for social media, written in Swift.

Curio

Curio is a research-based, unsupervised topic modelling pipeline for social media, written in Swift. It draws from available libraries to support data collection, document encoding (e.g., CoreML, Model2vec, Apple's Natural Language), dimensionality reduction (e.g., PCA, tSNE, UMAP), clustering (e.g., HDBSCAN, KMeans), and topic modeling. Our goals are to provide a modular and efficient set of tools that work across a variety of data sources. We leverage modern Swift concurrency and libraries like MLX to provide performant and safe implementations that work well on commodity Mac hardware. Curio will enable the development of new qualitative data analysis tools for edge devices like laptops, tablets, and smartphones.

Roadmap

Data Collection
- Reddit API Endpoints
- PushShift Reddit Archives
- Additional data sources (e.g., X, Steam, Github)
Encoding
- Static Embeddings
  - GloVE
  - Apple Natural Language
- Contextual Embeddings (e.g., Sentence-Transformers)
  - Open AI API
  - CoreML Models (e.g., All-MiniLM-L6)
Dimensionality Reduction
- PCA
- t-SNE
- Spherical t-SNE
- UMAP
Clustering
- K-Means
- DBSCAN
- HDBSCAN
Topic Models
- c-TF-IDF Keyword Generation
- Evaluation Metrics (Cosine Similarity, Topic Diversity)

Installation

You can use Swift Package Manager and specify dependency in Package.swift by adding:

.package(url: "https://git.uwaterloo.ca/jrWallac/curio.git", from: "0.0.8")

Contributing

This project is developed by a team of researchers from the Human-Computer Interaction and Health Lab at the University of Waterloo. The project is led by Prof. Jim Wallace, with contributions from:

Jason Zhao
Nicole Mathis
Peter Li
Adrian Davila
Henry Tian
Jean Nordmann
Mingchung Xia
Abhinav Jain
George Wang
Ali Raza Zaidi

If you would like to contribute to the project, contact Prof. Wallace with "Curio" in the subject line, and mention one or more of the roadmap items above that you would like to work on.

License

All original code released under the MIT license for commercial and non-commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.swiftpm/xcode/package.xcworkspace/xcshareddata		.swiftpm/xcode/package.xcworkspace/xcshareddata
Sources/Curio		Sources/Curio
Tests/CurioTests		Tests/CurioTests
.doccignore		.doccignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
builddocs.sh		builddocs.sh
paper.bib		paper.bib
paper.md		paper.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Curio

Roadmap

Installation

Contributing

License

About

Uh oh!

Languages

License

mingchungx/Curio

Folders and files

Latest commit

History

Repository files navigation

Curio

Roadmap

Installation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages