Selected Work

Projects with momentum.

Each card is engineered for stable interaction: subtle 3D tilt, highlight-follow, and readable tags.

Large-Scale Occupational Coding (15M titles → 1,636 categories)

Built a large-scale occupational coding pipeline (pandas/jieba/rapidfuzz + Sentence Transformer embeddings), improving precision & recall by 18% and enabling downstream clustering/active learning.

PythonpandasjiebarapidfuzzSentence TransformersActive Learning

LLM-based Cannabis Use Disorder Detection (Early-stage research)

Labeled thousands of Reddit posts with DSM-based guidelines; wrote Python scripts for cleaning, inter-rater reliability checks, and baseline modeling to speed up fine-tuning & evaluation.

LLMNLPPythonData LabelingIRRBaselines

VIOLA — Text-to-Music Retrieval for CRM SaaS (MVP)

Architected a text-to-music retrieval pipeline using CLAP embeddings + ChromaDB + emotion-aware reranking; improved search efficiency by 80% and boosted top-10 relevant hit rate by 30%.

PyTorchCLAPChromaDBRetrievalRe-rankingPrecision@k

Blue Sky LA Climate Projects Platform

An interactive civic platform that makes Blue Sky LA climate projects visible, searchable, and engaging for the public.

Next.jsCloudflareMapboxSearch UX

Voices Beyond Assault Anonymous Forum

A secure, anonymous online forum supporting 100+ survivors to share experiences and seek help.

Security UXWeb AppCommunityReliability

Intelligent Data Analysis Platform (BI + AIGC)

A full-stack BI platform: upload datasets, generate visualizations and insights, and accelerate analysis with AIGC narratives.

ReactSpring BootAnt DesignAsyncAIGC

Automated Chemistry Lab Workflows

PDF extraction + reporting automation, MSDS expiration alerts, and procurement forecasting to reduce manual effort by 40%.

PythonpandasPyPDF2VBATask Scheduler

Methanol Synthesis Ratio Optimization

Multi-objective optimization + feature importance to identify operational ratios and suppress side reactions, improving catalyst lifetime.

OptimizationML ModelingFeature Importance

Kaggle AMP® Parkinson’s Progression Prediction

Time-series + feature engineering + CV/HPO for MDS-UPDRS progression forecasting; ranked Top 30% among 1,805 teams.

Time-seriesFeature EngineeringEnsemblesCV/HPO