Releases: jager47X/VibeMap
Experimental Version Released
Experimental Version Released
Emotion Classification with Hybrid Learning Pipeline
This release introduces a semi-supervised, hybrid sentiment analysis framework designed to classify emotions from text with scalable efficiency and cost-effectiveness. The system combines supervised, unsupervised, and prototype-based learning strategies to achieve enhanced performance.
Current Workflow
[Raw Tweets from CSV]
↓
[Embedding Generation]
↓
[annotate emotions of tweets by human]
↓
[Emotion Prototypes → Prototype Matching]
↓
[Supervised Learning (weighted)]
↓
[Unsupervised Residual K-Means]
↓
[Enriched Tweets in MongoDB]
↓
[3D Visualization]
Highlights
-
Expanded Semantic Encoding
Embeddings are generated using 100 synonyms per emotion across 10 distinct emotional categories instead of simple base words like "happy".
This approach improved both clustering consistency and classification accuracy. -
Hybrid Learning Architecture
Integrated:- Supervised learning on 100–200 manually annotated samples.
- Prototype Matching for early-stage labeling based on semantic similarity.
- Unsupervised learning via K-Means to capture latent emotional groupings.
The hybrid pipeline proved more effective than traditional emotion-only embedding techniques.
-
Semi-Supervised Scalability
Designed to scale to large datasets. With each run and inclusion of new labels or pseudo-labels, model performance improves incrementally.
Current classification accuracy is approximately 60–70%. -
Benchmarking and Fine-Tuning
In parallel, performance is being compared with transformer-based models such as BERTweet.
Fine-tuning experiments are ongoing to close the gap in cases where contextual understanding is crucial.
Findings and Limitations — As of April 16, 2025
- The system currently has difficulty recognizing sarcasm and passive-aggressive tones, which typically require deeper linguistic and contextual understanding.
- Future directions include exploring more context-aware models, potentially without relying on full transformer-based architectures, to balance performance with scalability and cost.
This version is a foundation for upcoming research and optimization efforts. Contributions and feedback are welcome.