Selected Work
Projects with momentum.
Each card is engineered for stable interaction: subtle 3D tilt, highlight-follow, and readable tags.
Large-Scale Occupational Coding (15M titles → 1,636 categories)
Built a large-scale occupational coding pipeline (pandas/jieba/rapidfuzz + Sentence Transformer embeddings), improving precision & recall by 18% and enabling downstream clustering/active learning.
PythonpandasjiebarapidfuzzSentence TransformersActive Learning
LLM-based Cannabis Use Disorder Detection (Early-stage research)
Labeled thousands of Reddit posts with DSM-based guidelines; wrote Python scripts for cleaning, inter-rater reliability checks, and baseline modeling to speed up fine-tuning & evaluation.
LLMNLPPythonData LabelingIRRBaselines
VIOLA — Text-to-Music Retrieval for CRM SaaS (MVP)
Architected a text-to-music retrieval pipeline using CLAP embeddings + ChromaDB + emotion-aware reranking; improved search efficiency by 80% and boosted top-10 relevant hit rate by 30%.
PyTorchCLAPChromaDBRetrievalRe-rankingPrecision@k