My research sits at the intersection of statistical learning, causal inference, and computational biology, with a particular focus on building principled models that connect high-dimensional molecular data to interpretable biological and clinical questions.
Rather than developing methods in isolation, my work is organized around recurring scientific challenges that arise across genomics, immunology, and clinical research: heterogeneity, confounding, limited experimental control, and the need to generalize across populations, conditions, and data modalities.
Below are the main themes that define my current research program.
1. Causal Inference in Complex Biological Systems
A central goal of my work is to move beyond associative modeling toward causal explanations of biological mechanisms. In modern biomedical data, causal questions are complicated by latent confounding, selection bias, and distributional mismatch between studies.
I develop causal inference and causal discovery methods that are designed to operate under these realistic constraints, with particular emphasis on:
- combining experimental and observational data,
- learning structured causal relationships in high dimensions,
- and quantifying uncertainty in causal conclusions.
These methods are motivated by applications in functional genomics, immunology, and clinical decision-making, where interventions are costly and causal assumptions must be made explicit and testable.
2. Data Fusion Across Experimental and Observational Studies
Randomized controlled trials provide strong internal validity but are often limited in scope, size, and representativeness. Observational studies are abundant but confounded. A recurring theme in my work is how to integrate these complementary data sources in a principled way.
My research develops frameworks for:
- borrowing information across studies with different covariates and populations,
- improving efficiency of causal effect estimation,
- and maintaining robustness when standard identifiability assumptions fail.
This line of work connects ideas from transfer learning, domain adaptation, and semiparametric inference, and is motivated by problems in clinical trials, regulatory science, and real-world evidence.
3. Multiscale Modeling of Immune and Regulatory Systems
Biological systems operate across multiple scales, from molecular interactions to cellular populations and tissues. My work aims to develop models that explicitly connect these scales, rather than treating them independently.
Recent efforts focus on:
- regulatory molecules and transcriptional control,
- immune cell interactions and signaling,
- and ecological perspectives on host–pathogen co-evolution.
A unifying idea in this work is that persistent structure in biological systems can be revealed by studying constrained, reusable components—whether regulatory programs, immune niches, or conserved interaction patterns.
4. Keystone Structures and Ecological Perspectives in Immunology
An emerging direction in my research explores ecological and systems-level perspectives on immunity, including the concept of keystone components that disproportionately shape immune responses.
This work investigates:
- how long-lived pathogens and conserved epitopes shape immune memory,
- how immune responses generalize or fail across contexts,
- and how these structures can explain both protection and adverse immune reactions.
The goal is to develop computational frameworks that link molecular specificity to system-level behavior, with implications for vaccine design, immune-mediated adverse events, and personalized risk assessment.
5. High-Dimensional Learning with Structure and Constraints
Many biomedical inference problems are high-dimensional, undersampled, and structured. A recurring methodological theme in my work is leveraging structure to make learning possible and interpretable.
This includes:
- structured regularization and multitask learning,
- learning under sparsity, low-rank, or hierarchical constraints,
- and designing estimators that remain stable across environments.
These ideas appear throughout my work, from transcriptomic modeling to causal discovery and data fusion.
6. Statistical Software and Reproducible Tools
Across all research themes, I place strong emphasis on reproducibility and accessibility. Many of my projects include the development of open-source software that implements new methods in a transparent and reusable way.
This includes:
- statistical packages for high-dimensional inference,
- tools for causal analysis and data integration,
- and pipelines designed to be extensible to new datasets and domains.
Working with Me
I collaborate closely with researchers in biostatistics, biomedical informatics, immunology, and clinical sciences, and regularly mentor students, postdocs, and interns from diverse quantitative backgrounds.
If you are a student or trainee interested in:
- causal inference,
- computational biology,
- statistical learning for biomedical data,
- or systems-level modeling of immune and regulatory processes,
feel free to reach out. I am especially interested in collaborations that combine theory, data, and real scientific questions.