Howdy! I’m a Ph.D. candidate in Statistics at Boston University, co-advised by Prof. Debarghya Mukherjee and Prof. Luis Carvalho, and I also collaborate with Prof. Nabarun Deb. Before BU, I earned my M.A. in Statistics from Columbia University and my B.S. in Mathematics from Shandong University, including a year of joint training at the Academy of Mathematics and Systems Science(AMSS), Chinese Academy of Sciences. My research sits at the intersection of statistics and machine learning, where I develop theoretically grounded transfer-learning and representation-learning methods—spanning optimal transport, graph mining, multimodal learning for structured, heterogeneous data in low-sample, high-dimensional, and non-IID settings.

The question that keeps me up (in a good way):

How can we reuse past knowledge when the world—and the data—won’t sit still?

In statistical learning, this is about transferring geometry or smoothness from a well-understood source distribution to a smaller, noisier target under shift. In reinforcement learning, the source might be prior trajectories, simulators, or related tasks, while the target is the evolving environment, so we need principled rules for what to keep, what to adapt, and what to forget. And yes! LLMs/VLMs make this even more exciting (and tricky): they already contain a lot of cross-domain knowledge, but the real challenge is extracting and specializing it safely for downstream tasks without overfitting, hallucination, or misalignment.

What I build

THEORY Theory that supports practice
Minimax rates · oracle inequalities · regret bounds · safe-transfer criteria under covariate or structural shift.
GRAPHS Graph-structured transfer
Aligning and transporting information across graphs and manifolds — robust transfer when correspondence is messy or unknown.
RL & BANDITS RL & bandits under drift
Warm-started policies with uncertainty-aware adaptation for reliable sequential decision-making in changing environments.
LLMs & VLMs Transfer for LLMs / VLMs
Controlled adaptation · domain grounding · structure-preserving fine-tuning — so models adapt without getting sloppy.

Curious about my research? I’ve put together beginner-friendly slide decks on my main research directions: transfer learning, graph learning, optimal transport, and LLMs for time series.

Along my academic journey, I have been deeply fortunate to study and conduct research under the guidance of inspiring scholars, including Prof. Zhanxing Zhu, whose influential work includes Spatio-Temporal Graph Convolutional Networks (STGCN) for traffic forecasting, and Prof. Yongshun Gong. Their perspectives on deep learning, representation learning, and structured spatio-temporal systems have profoundly shaped how I think about evolving, heterogeneous data, and have guided my pursuit of principled transfer learning methods.

Beyond theory and modeling, I am drawn to building AI applications that reflect how I see people and the world. I have always felt that human beings are more than their outward forms, that something of the spirit, memory, and inner life exceeds the body that temporarily carries it. That is why I am especially fascinated by cinema, atmosphere, and emotionally resonant digital experiences ✨

🔥 News

  • 2025.09: 🎉 My first-author paper “Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model” is accepted by (NeurIPS 2025)!
  • 2025.08: 🎉 My co-authored paper “Cross-Domain Hyperspectral Image Classification via Mamba-CNN and Knowledge Distillation” is accepted by (IEEE TGRS 2025)!

📝 Publications

Leading Author

GTrans NeurIPS 2025 Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model  Paper Poster Slides Code
  • First graphon-level transfer without node correspondence — aligns graphs via Gromov–Wasserstein and transfers edge structure nonparametrically.
  • Residual smoothing unlocks small/sparse targets with convergence & stability guarantees; SOTA on link prediction and graph classification.
Phase Transition Under Review Phase Transition in Nonparametric Minimax Rates for Covariate Shifts on Approximate Manifolds  arXiv Poster Slides Code
  • New minimax theory for "near-manifold" shift: exposes a sharp phase transition controlled by the support gap between target and source neighborhoods — unifying multiple geometric-transfer regimes.
  • Ratio-free, adaptive estimator: achieves near-optimal, dimension-adaptive rates without density ratios and without assuming known geometry (works under approximate manifold mismatch).
TESS Under Review From Text to Forecasts: Bridging Modality Gap with Temporal Evolution Semantic Space  arXiv Slides
  • Bridges the text–time-series modality gap: introduces a Temporal Evolution Semantic Space (TESS) that distills free-form text into interpretable temporal primitives (mean shift, volatility, shape, lag), instead of directly fusing noisy token embeddings.
  • LLM-guided yet numerically grounded forecasting: uses structured prompting + confidence-aware gating to inject reliable semantic signals as prefix tokens into a Transformer forecaster, yielding robust gains under event-driven non-stationarity (up to 29% error reduction).
SCOT Under Review SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objectives  arXiv Slides
  • Sinkhorn entropic-OT coupling enables many-to-many region alignment across cities — no node matching required.
  • OT-weighted contrastive loss + target-aware prototype hub prevents collapse and scales cleanly to multi-source heterogeneity.
INCM Under Review INCM: INConsistency-aware Multi-modal Recommendation with Cross-Modal Hard Negatives
  • Inconsistency-aware multimodal ranking: studies how cross-modal discrepancies may provide complementary ranking evidence or degrade fusion quality — explicitly modeled in training.
  • Cross-modal hard negatives + synergy-aware ranking loss: proposes CHNS to mine modality-specific hard negatives across branches, and a Synergy-aware BPR loss to ensure the fused branch achieves stronger preference margins than unimodal branches.

Co-author

MKDNet IEEE TGRS 2025 Cross-Domain Hyperspectral Image Classification via Mamba-CNN and Knowledge Distillation  IEEE Slides
  • Hybrid spectral–spatial modeling for domain shift: integrates a Mamba-based global spectral encoder with CNN local feature extraction, capturing long-range dependencies while preserving fine-grained spatial structure.
  • Dual-level transfer via distillation + graph alignment: performs teacher–student knowledge distillation for distribution alignment and OT-guided graph consistency across domains, yielding robust cross-domain generalization under severe spectral mismatch.
SSGP Under Review Semantic Scientific Graph Pruning for Reliable Agentic Paper Reproduction  arXiv
  • SSGP prunes dense scientific graphs into task-adaptive subgraphs via rank-based ensemble scoring — drastically shrinks agent search space.
  • Reuse–patch execution + confidence-weighted aggregation boosts reproducibility, stability, and success rate of LLM scientific agents.

🤖 LLM Engineering Projects

ALIGN AlignDPO Code PDF
DPO · IPO · KTO · QLoRA · Mistral-7B · HH-RLHF
RAG RAGAudit: Hallucination Detection Code PDF
BM25+FAISS · NLI · SelfCheckGPT · sem. entropy · Mistral-7B
CAUSAL Congestion Pricing Analyzer Code PDF
TWFE · CS-DiD · Synth DiD · Double ML · 12M+ NYC TLC
AGENT CausalLens: LLM-Augmented Causal Pipeline Code
DoWhy · Double ML · Causal Forest · Claude API · Streamlit
RAG GraphRAG: Multimodal RAG Code
dense + entity graph + CLIP · FastAPI · ChromaDB
RAG Adaptive RAG Code
query routing · iterative retrieval · self-check · FastAPI
CORE DraftVerify: Speculative Decoding Code
draft + verifier · latency · throughput · acceptance
CORE HQQ: 1-bit Quantization Code
1–8 bit · proximal opt · W1G64: 12.7× · >4× speedup
CAUSAL Causal Promotion Optimization Code PDF
AIPW · LightGBM · DRLearner CATE · OR-Tools · FastAPI
ML Demand Forecasting Code PDF
Seasonal Naive · LightGBM · TFT · M5 · 28-day · store-SKU

📖 Educations

  • 2021.09 – Now: Ph.D. in Statistics, Boston University

  • 2019.09 – 2020.05: M.A. in Statistics (Data Science Track), Columbia University

  • 2018.05 – 2019.06: B.S. in Mathematics, Chinese Academy of Sciences (Jointly Supervised Talent Program)

  • 2015.09 – 2019.06: B.S. in Mathematics, Shandong University

💻 Internships

Plymouth Rock

Data Scientist Intern · Plymouth Rock Insurance
📍 Boston, MA  ·  🗓️ May 2025 – Aug 2025

  • Architected an end-to-end AWS SageMaker pipeline for property-level loss prediction using an XGBoost Tweedie model on multi-million-policy data, lifting Gini by +4.3% over the production baseline and directly improving underwriting risk segmentation.

  • Pioneered an LLM-powered visual risk scoring system combining GPT-4o multimodal reasoning with Google Street View imagery to capture previously unobservable property features (roof condition, surroundings, hazards); integrated outputs into downstream actuarial pricing models as a novel signal layer.

  • 📎 For a high-level, non-confidential summary of this work, see the Home Insurance slides.

✨ My Apps

A quiet collection of cinematic, atmospheric, and emotionally resonant side projects — part digital keepsakes, part memory-keepers.  See all →

Wilderness
🌲 Wilderness
MBTI Vibe
MBTI Vibe
What If Cinema
🎬 What If Cinema
Letters from the Screen
✉️ Letters from Screen
If You Disappeared
✈️ If You Disappeared
Souvenirs
🎟️ Souvenirs
The Map of Me
🗺️ Map of Me
A Room in Macondo
🦋 A Room in Macondo
Say It Like a Classic
✒️ Say It Like a Classic
The Boston Archive
🏛️ Boston Archive

🎖 Honors

  • 2025: Student Travel Grant, Boston University
  • 2025: Ralph B. D’Agostino Endowed Fellowship, Boston University
  • 2025: Outstanding Teaching Fellow Award, Boston University

  • 2019: Outstanding Graduate, Shandong University

  • 2018: Hua Loo-Keng Scholarship, Chinese Academy of Sciences
  • 2018: National Gold Award, Internet+ Innovation & Entrepreneurship Competition
  • 2018: First-Class Scholarship, Shandong University
  • 2018: Outstanding Student Leader, Shandong University

📂 DS Projects

CV Dog Classification Code Demo
VGG16 · ResNet50 · Flask · 75.48%
ML Credit Risk Code
XGBoost · SMOTE · AUC 0.976
CV Pedestrian Detection Code
Fast R-CNN · Siamese · few-shot
NLP Financial Sentiment Code Demo
DistilBERT · 85% · 30%↑ speed
CV Mask Detection Code
ResNet50 · Grad-CAM · 94%
NLP Spam Detection Code
TF-IDF · NB · P 96 / R 94
APP Airbnb Dashboard Code Demo
R Shiny · maps · filtering
STATS Bayesian Logistic Code Demo
RStan · Spike-and-Slab · MCMC
STATS A/B Testing Code
Bootstrap · power · +15% conv.
TS Time Series Forecast Code Demo
SARIMA · ETS · Prophet
ML Movie Recommendation Code
ALS · SVD · +15% / −20%
ML Customer Segmentation Code
K-Means · elbow · silhouette

📝 Service & Teaching

Presentations  ·  CIKM 2024, NeurIPS 2025
Reviewer  ·  CIKM 2025, ICME 2026, ICML 2026, KDD 2026
Instructor @ Boston University  ·  MA 582 Mathematical Statistics, MA 113 Elementary Statistics
TA @ Boston University  ·  MA 575 Generalized Linear Models, MA 582, MA 415 Data Science in R, MA 214 Applied Stats

🎨 Interests

🎵 Mandarin R&B loyalist — Leehom Wang, David Tao, Khalil Fong🦋, Dean Ting

🎹 Trained in piano, calligraphy, and ink painting

🏞️ National park lover · 🫧 lake admirer · 🌅 opacarophile — welcome to my Gallery