Selected Projects

My work applies machine learning and statistical modeling to study biological and clinical heterogeneity and treatment response, with applications in pharmacogenomics, molecular stratification, and large-scale biomedical data.


Genotype Embeddings for Pharmacogenomics (Current)

I am currently developing representation learning approaches to model genotype data directly. Using variational autoencoder, the goal is to learn genotype embeddings that support downstream tasks such as patient stratification and drug response analysis.

This work aims to move beyond traditional single variant or PRS based approaches toward molecular representations.


Endophenotype-driven GWAS Framework

To address heterogeneity in asthma, I developed machine learning based endophenotypes that capture clinically meaningful variation in lung function and treatment response. These endophenotypes were incorporated into subtype-specific genetic association analyses.

This framework improves interpretability and statistical power in pharmacogenomic studies by aligning genetic signals with biologically coherent patient subgroups.

Representative publications: PCA based endophenotype definiton , ANOVA for endophenotype specific GWAS , Asthma pharmacogenetics through subtype specific associations

GitHub


miRNA-based Biomarkers of Drug Response

I analyzed the role of microRNAs as modulators of inhaled corticosteroid response in asthma, focusing on miR-584-5p across two independent cohorts (CAMP and GACRS).

The analysis involved normalization and filtering of miRNA read counts, cohort-specific outcome definitions, and interaction modeling to assess how treatment modifies regulatory effects on asthma exacerbations.

This work provides mechanistic insight into regulatory pathways underlying corticosteroid resistance.

Representative publications: Micro-RNA-584-5p as a key modulator of ICS resistance (Full paper in progress)


Proteomics Analysis of Pediatric Long COVID

I conducted proteomic analysis using OLINK NPX data to study children with Long COVID presenting neurological symptoms. The project involved extensive data harmonization and careful handling of cases and controls originating from different clinical centers.

Although cohort differences limited definitive conclusions, this work emphasized rigorous QC, normalization, and sensitivity analysis in high-dimensional proteomic studies.


Machine Learning Analysis of Pediatric COVID-19 clinical data

I applied random forest models to clinical data from pediatric COVID-19 cohorts to perform case–control prediction and examine factors associated with infection status in children. The analysis focused on how demographic and clinical variables contributed to predictive performance.

Representative publications: Case control prediction using clinical data . GitHub


Context-aware testing of mobile applications

During my PhD, I developed methods to model event-driven data using probabilistic and neural approaches. This included sequence modeling with conditional random fields, neural network–based component discovery, and combinatorial methods for large search spaces.

This work shaped how I approach modeling problems involving dependent observations, structured outputs, and constrained search spaces—ideas that continue to influence my applied machine learning work in biomedical domains.

Representative publications and tools: