Bandung, West Java • rrayhka@gmail.com
Data Scientist with experience in ML, NLP, and information retrieval. Skilled at building end-to-end pipelines that improve accuracy and efficiency. Strong background in Python, deep learning, and applied data science.
Data Scientist – PT Kazee Digital Indonesia | Feb–Jun 2025
- Boosted sentiment/topic classifiers (NusaBERT, LLaMA, Qwen) → 92% accuracy.
- Built clustering pipeline (BERTopic + KMeans/HDBSCAN) → 92.7% mapping accuracy.
- Optimized document retrieval (BM25, TF-IDF, VSM) → 30% faster.
- Automated transcription/surveys (Whisper + FastAPI) → 50% time saved.
- OCR logo analyzer → 95% precision / 90% recall.
- Prototyped RAG chatbot & face recognition (99.2%).
Dataset Curator – Cagliostro Research Lab | Oct 2024–Jul 2025
- Filtered 1,000+ images for Animagine XL dataset, improving training quality.
ML Mentor – GDG on Campus UTM | Aug 2024–Present
- Trained 12 students on supervised learning & CNNs with live coding projects.
- Gambling Site Classification – SURF + Random Forest → 91% accuracy, >85% precision/recall.
- Supreme Court Decision IE – Extracted key fields with 95% accuracy.
- CourtQuery Search Engine – BERT + BM25 semantic search for court rulings.
- Dirty Vote Sentiment Analysis – BiLSTM + Attention on 60k texts, Flask web app.
- Indonesian SpaCy NER – Fine-tuned NER for low-resource Indonesian entities.
- Distributed Movie Recommender – Content-based filtering; Dask for scalability, Streamlit for interactive UI.
(More on GitHub)
B.Sc. Computer Science – Univ. of Trunojoyo Madura | GPA: 3.68 / 4.0
- Merit Award, Inter Varsity Innovation Challenge 2024 (EcoCraft AI – GANs for waste-to-product).
- Led workshops on Image Classification & GANs.
ML/NLP: TensorFlow, PyTorch, Scikit-learn, SpaCy, BERTopic, OCR, Whisper
Programming: Python, PHP, JS | Data: Pandas, NumPy, Matplotlib, MySQL
Soft Skills: Communication, Teamwork, Time Management
- CCNP: Core Networking – Cisco, 2025 [Link]
- ML with Apache Spark 3.0 – Udemy, 2025 [Link]
- AI Fundamentals – Dicoding, 2025 [Link]
Indonesian (Native), Javanese (Native), English (Working proficiency)