I’m Yuyao Wang, a Ph.D. candidate in Statistics at Boston University, co-advised by Debarghya Mukherjee and Luis Carvalho.
My research lies in transfer learning, graph mining, nonparametric statistics, and reinforcement learning, where I develop theoretically grounded methods for structured, heterogeneous data in low-sample, high-dimensional, and non-IID settings. Everyone imagines the challenges of theoretical statistics, yet within that struggle lies the quiet beauty of uncovering order in chaos. My work values the elegance of fundamental theory while pursuing its reach across real-world applications, bridging statistics with computer science and beyond. My research asks a simple but very real-world question:
How can we reuse what we’ve already learned when the world keeps changing?
That’s what transfer learning is all about.
In classical statistics, it means borrowing structure or smoothness from a well-studied “source” dataset to help a smaller, noisier “target.”
In reinforcement learning, the source is prior experience—offline logs, simulators, or related tasks/policies; the target is the live environment/task. The magic is figuring out what to keep and what to forget, so models stay robust even when everything shifts.
What I build:
- Theory-first tools: minimax/Oracle-type guarantees, regret bounds, and safe transfer criteria under shift.
- Graph-aware transfer: graph alignment so signals move across domains.
- RL & bandits under drift: warm-start policies, uncertainty-aware adaptation, and “what-to-keep vs. what-to-forget” rules.
🔥 News
- 2025.09: 🎉 My first-author paper “Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model” is accepted by (NeurIPS 2025)!
- 2025.08: 🎉 My co-authored paper “Cross-Domain Hyperspectral Image Classification via Mamba-CNN and Knowledge Distillation” is accepted by (IEEE TGRS 2025)!
📝 Publications

Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model NeurIPS 2025 · Code
Yuyao Wang, Yu-Hung Cheng, Debarghya Mukherjee, Huimin Cheng
Boston University
- We propose GTRANS, the first graphon transfer learning method without node correspondence, combining Gromov-Wasserstein optimal transport and residual smoothing.
- Theoretical Guarantee: We prove stability and convergence of the transport-based alignment under nonparametric assumptions.
- Applications: GTRANS improves link prediction and graph classification, especially under small target graphs and sparse settings.
- Achieves state-of-the-art (SOTA) performance on both synthetic and real-world datasets.

Phase Transition in Nonparametric Minimax Rates for Covariate Shifts on Approximate Manifolds · Code · Poster · Presentation
Yuyao Wang, Nabarun Deb, Debarghya Mukherjee
Boston University; The University of Chicago Booth School of Business
- We establish new minimax rates for estimating Hölder-smooth regression functions under covariate shift when the target distribution lies near, but not on, a source manifold.
- Introduces a novel phase transition phenomenon: the minimax rate depends sharply on the proximity between the target and source support, unifying prior results under a generalized Hölder framework.
- Addresses settings where density ratios are ill-defined, making classical transfer techniques invalid.
- Our estimator adapts to unknown manifold dimension and achieves near-optimal rates without prior geometric knowledge.

Cross-Domain Hyperspectral Image Classification via Mamba-CNN and Knowledge Distillation · Presentation
Aoyan Du, Guixin Zhao, Yuyao Wang, Aimei Dong, Guohua Lv, Yongbiao Gao, Xiangjun Dong
Shandong Computer Science Center; Boston University
- We propose MKDnet, a cross-domain HSI classification framework combining Mamba-CNN hybrid architecture with knowledge distillation and graph alignment via optimal transport.
- Effectively captures both global context and local detail via Mamba + CNN dual-stream encoder.
- Aligns source–target distributions through domain-level knowledge distillation and graph OT-based subgraph matching.
- Achieves SOTA performance across multiple public hyperspectral benchmarks under domain shift.

Multi-scale based Cross-modal Semantic Alignment Network for Radiology Report Generation
Zhihao Zhang, Long Zhao, Yuyao Wang, Dun Lan, Linfeng Jiang, Xiangjun Dong
Shandong Computer Science Center; Boston University
- We propose MCSANet, a radiology report generation framework that enhances cross-modal semantic alignment between medical images and diagnostic text.
- Introduces a Multi-scale Visual Feature Extraction (MVE) module with multi-head local sparse attention (MLSA) to capture image semantics and abnormalities across different spatial scales.
- Incorporates a Cross-modal Semantic Alignment (CSA) module with a learnable matrix, gating mechanism, and multi-label contrastive loss for precise image–text fusion.
- Combined with a Transformer-based report generator, MCSANet achieves SOTA performance on IU-Xray and MIMIC-CXR, surpassing prior models such as CAMANet and XPRONET.
🎖 Honors and Awards
- 2025.05: Ralph B. D’Agostino Endowed Fellowship, Boston University
- 2025.04: Outstanding Teaching Fellow Award, Boston University
- 2019.06: Outstanding Graduate, Shandong University
- 2018.10: Hua Loo-Keng Scholarship, Chinese Academy of Sciences
- 2018.09: First-Class Scholarship, Shandong University
- 2018.09: Outstanding Student Leader, Shandong University
📖 Educations
-
2021.09 – 2026.05 (expected): Ph.D. in Statistics, Boston University
-
2019.09 – 2020.05: M.A. in Statistics (Data Science Track), Columbia University
-
2015.09 – 2019.06: B.S. in Mathematics, Shandong University
💻 Internships
- 2025.05 – 2025.08: Data Scientist Intern, Plymouth Rock Insurance (Boston, MA)
- Built AWS SageMaker pipeline for property-level loss prediction; boosted Gini +4.3% using XGBoost Tweedie
- Developed LLM-powered image risk scoring with GPT-4o + Google Street View; integrated outputs into actuarial models
🎨 Interests
🎵 Mandarin R&B loyalist — Leehom Wang, David Tao, Khalil Fong, Dean Ting
🎹 Trained in piano, calligraphy, and ink painting