Statistician · Population Health Methodologist · RWE Platform Architect
Twenty years designing population-based studies, integrating multi-source health data, and building the platform infrastructure that makes evidence reproducible and auditable. From PHIA biomarker surveys and the WHO/UNICEF immunization estimates for 195 countries, to the svy ecosystem and svyLab — a multi-tenant analytics platform built to model RWE-grade reproducibility and tenant safety.
import svy design = svy.Design(stratum="sdmvstra", psu="sdmvpsu", wgt="wtmec2yr") sample = svy.Sample(data=nhanes, design=design) # mean SBP among adults with diabetes, by age group sbp = sample.estimation.mean( y="sys_bp", by="age_group", domain="diabetes == 1", )
About
I'm Mamadou S. Diallo — a Ph.D. statistician based in Hoboken, NJ. For two decades I've generated population-level health evidence from complex real-world data: observational study design, causal inference, multi-source data integration, and small area estimation, across health surveillance, immunization, HIV epidemiology, and chronic disease.
Lead sampling statistician on the Population-based HIV Impact Assessment surveys (8 countries, with CDC/PEPFAR). Designed the NHANES 2012 compositing strategy at Westat, directed the WHO/UNICEF immunization estimates (WUENIC) for 195 countries at UNICEF, and developed sampling methodology for Statistics Canada's national health surveillance system (CCHS) earlier in my career. Author of Samplics (265K+ downloads, JOSS) and architect of the open-source svy ecosystem.
I also build the infrastructure evidence runs on. Solo architect of svyLab — a multi-tenant analytics platform with architectural sandbox isolation, lifecycle and classification governance, AI-assisted analysis with full provenance, and a 327-test invariant suite. Built deliberately to model the reproducibility, auditability, and tenant-safety properties that regulated RWE work requires.
Currently extending the svy ecosystem into pharmacoepidemiology (ISPE 2026 poster, Milan) and open to senior roles where population-based methods, causal inference, and platform engineering intersect.
Ph.D. Statistics, Carleton University
(advisor: J.N.K. Rao)
M.Sc. Statistics, Université Laval
ISPE (2026) · ASA (since 2010) · AAPOR
Guest Editor, JSSAM 2025
English & French — fluent in both speaking and writing
Hoboken, NJ — open to remote and on-site engagements
Currently focused on pharmacoepidemiology methods and RWE infrastructure.
Services
I help pharma RWE teams, platform companies, research institutes, and international agencies generate decision-grade evidence from complex real-world data — and build the infrastructure that makes that evidence reproducible and auditable.
Observational study design, causal inference for population-based research, and survey-weighted methods (IPTW, doubly robust estimation) for pharmacoepidemiology and population health. Active development on the svy-causal module, accepted for ISPE 2026 in Milan.
Harmonizing administrative, survey, biomarker, and surveillance data into decision-ready evidence — the discipline I built at WHO/UNICEF for 195 countries. Fit-for-purpose data assessment, discrepancy resolution, and reporting under quality frameworks (GATHER, AAPOR, STROBE).
Reliable subnational estimates when direct samples are too small — Fay–Herriot, unit-level, spatial Bayesian models. Refined through years of applied work with national statistics offices and through the Healthy Illinois Analytics platform (NORC, all 102 counties).
Production-grade analytics platforms with multi-tenant isolation, AI-assisted analysis with provenance, lifecycle governance, and reproducibility primitives — the kind of infrastructure RWE and HEOR teams need but rarely build well. Architect of svyLab and the open-source svy ecosystem.
End-to-end design of complex multi-stage household surveys — frame development, sample size, stratification, cluster selection, weighting, calibration, variance estimation. Compliant with international standards. PHIA, NHANES, MICS, DHS, LSMS, and custom programs.
Long-term technical backstopping for national statistics offices and research teams — sampling design, weighting, small area estimation, reproducible analysis. Delivered in English or French. Recent: NBS Tanzania (World Bank, 2026), INSBU Burundi (2025), Ethiopia, Senegal, Vietnam.
Selected work
A selection of the platforms I've built, the methods I've developed, and the population-based studies that ground both.
Multi-tenant analytics platform for health-data evidence generation
Sole architect and developer. Architectural sandbox isolation (cross-org reads return 404 by design, not just by access control), lifecycle and classification governance with audit logs, AI-assisted analysis where every output persists with full provenance — model, prompt, code, tokens, cost, timestamp. 327 passing tests including structural CI invariants, cross-org sandbox enforcement, existence-leak contracts, and quota correctness. Built deliberately to model the reproducibility and tenant-safety properties that regulated RWE work requires.
Survey-weighted causal inference for pharmacoepidemiology — IPTW, stabilized weights, doubly robust estimation, validated against NHANES. Accepted for poster at ISPE 42nd Annual Meeting, Milan (August 2026): 'Design-Based Causal Inference for Pharmacoepidemiology.'
Production-grade Python library for sample selection, weighting, estimation, and small area estimation. Published in the Journal of Open-Source Software (JOSS, 2021). The reference implementation for survey statistics in Python.
JOSS paper →Lead sampling statistician at Westat for the Population-based HIV Impact Assessment surveys across Cameroon, Côte d'Ivoire, Malawi, Namibia, Tanzania, Uganda, Zambia, Zimbabwe — in collaboration with ICAP at Columbia and CDC/PEPFAR. Multi-stage probability samples for HIV prevalence, incidence, viral-load suppression, and ART coverage at national and subnational levels.
At UNICEF, led the annual production of WHO/UNICEF immunization coverage estimates for 14 vaccines across all 195 countries. Integrated administrative reporting (DHIS2), household-survey microdata (DHS, MICS), and programme surveillance under GATHER reporting standards — with fit-for-purpose assessments across data modalities.
At NORC, led development of the Healthy Illinois Analytics Platform — a Python-based platform producing county- and community-level health estimates across all 102 Illinois counties using small area estimation. Integrated American Community Survey administrative data with survey microdata under disclosure control.
At Westat, developed model-based small area estimation for state-level crime rates using 15 years of National Crime Victimization Survey data. Published the R package sae2 on CRAN — used for federal subnational statistics.
Publications
Peer-reviewed work in observational and survey methodology, small area estimation, population health, and machine learning.
Get in touch
Open to senior RWE / pharmacoepidemiology / HEOR roles at pharma companies, RWE platform vendors, and CROs — full-time or contract. Particularly interested in roles that combine observational methodology with platform engineering.
Also selectively consulting on survey design, small area estimation, and statistical capacity for international agencies and research institutions. Bilingual English/French. I respond to all inquiries within two business days.
Download CV (PDF)I'll respond within two business days.