Statistician · Population Health Methodologist · RWE Platform Architect

I generate evidence from complex health data, surveys, observational studies, and real-world data.

Twenty years designing population-based studies, integrating multi-source health data, and building the platform infrastructure that makes evidence reproducible and auditable. From PHIA biomarker surveys and the WHO/UNICEF immunization estimates for 195 countries, to the svy ecosystem and svyLab — a multi-tenant analytics platform built to model RWE-grade reproducibility and tenant safety.

Currently extending svy into pharmacoepidemiology — ISPE 2026 (Milan)
Causal inference Observational study design Real-world data integration Small area estimation Multi-tenant platforms Reproducible analytics Survey Design and Analysis Python svy R Rust
svylab.com / lab / nhanes_2017
Question
"Mean systolic BP among adults with diabetes, by age group."
svy-agents — translation
import svy

design = svy.Design(stratum="sdmvstra", psu="sdmvpsu", wgt="wtmec2yr")
sample = svy.Sample(data=nhanes, design=design)

# mean SBP among adults with diabetes, by age group
sbp = sample.estimation.mean(
  y="sys_bp",
  by="age_group",
  domain="diabetes == 1",
)
Result · Taylor-linearization 95% CI
18–39
122.4
40–59
131.7
60–79
138.2
80+
142.9
provenance: model · prompt · code · weights · CI audited
20
Years generating health evidence from real-world data
195
Countries — multi-source data integration (WUENIC)
265K+
Samplics downloads (Python, JOSS)
327
Tests passing on the svyLab platform
UNICEF WHO Westat NORC CDC / PEPFAR Statistics Canada Gates Foundation Gavi

About

Statistician by training.
Builder by instinct.

I'm Mamadou S. Diallo — a Ph.D. statistician based in Hoboken, NJ. For two decades I've generated population-level health evidence from complex real-world data: observational study design, causal inference, multi-source data integration, and small area estimation, across health surveillance, immunization, HIV epidemiology, and chronic disease.

Lead sampling statistician on the Population-based HIV Impact Assessment surveys (8 countries, with CDC/PEPFAR). Designed the NHANES 2012 compositing strategy at Westat, directed the WHO/UNICEF immunization estimates (WUENIC) for 195 countries at UNICEF, and developed sampling methodology for Statistics Canada's national health surveillance system (CCHS) earlier in my career. Author of Samplics (265K+ downloads, JOSS) and architect of the open-source svy ecosystem.

I also build the infrastructure evidence runs on. Solo architect of svyLab — a multi-tenant analytics platform with architectural sandbox isolation, lifecycle and classification governance, AI-assisted analysis with full provenance, and a 327-test invariant suite. Built deliberately to model the reproducibility, auditability, and tenant-safety properties that regulated RWE work requires.

Currently extending the svy ecosystem into pharmacoepidemiology (ISPE 2026 poster, Milan) and open to senior roles where population-based methods, causal inference, and platform engineering intersect.

Education

Ph.D. Statistics, Carleton University
(advisor: J.N.K. Rao)
M.Sc. Statistics, Université Laval

Affiliations

ISPE (2026) · ASA (since 2010) · AAPOR
Guest Editor, JSSAM 2025

Languages

English & French — fluent in both speaking and writing

Based in

Hoboken, NJ — open to remote and on-site engagements

Currently focused on pharmacoepidemiology methods and RWE infrastructure.

Services

How I work

I help pharma RWE teams, platform companies, research institutes, and international agencies generate decision-grade evidence from complex real-world data — and build the infrastructure that makes that evidence reproducible and auditable.

Real-World Evidence & Observational Methods

Observational study design, causal inference for population-based research, and survey-weighted methods (IPTW, doubly robust estimation) for pharmacoepidemiology and population health. Active development on the svy-causal module, accepted for ISPE 2026 in Milan.

Multi-Source Health Data Integration

Harmonizing administrative, survey, biomarker, and surveillance data into decision-ready evidence — the discipline I built at WHO/UNICEF for 195 countries. Fit-for-purpose data assessment, discrepancy resolution, and reporting under quality frameworks (GATHER, AAPOR, STROBE).

Small Area Estimation

Reliable subnational estimates when direct samples are too small — Fay–Herriot, unit-level, spatial Bayesian models. Refined through years of applied work with national statistics offices and through the Healthy Illinois Analytics platform (NORC, all 102 counties).

Reproducible Analytics Infrastructure

Production-grade analytics platforms with multi-tenant isolation, AI-assisted analysis with provenance, lifecycle governance, and reproducibility primitives — the kind of infrastructure RWE and HEOR teams need but rarely build well. Architect of svyLab and the open-source svy ecosystem.

Population-Based Survey Design

End-to-end design of complex multi-stage household surveys — frame development, sample size, stratification, cluster selection, weighting, calibration, variance estimation. Compliant with international standards. PHIA, NHANES, MICS, DHS, LSMS, and custom programs.

Statistical Capacity Building

Long-term technical backstopping for national statistics offices and research teams — sampling design, weighting, small area estimation, reproducible analysis. Delivered in English or French. Recent: NBS Tanzania (World Bank, 2026), INSBU Burundi (2025), Ethiopia, Senegal, Vietnam.

Selected work

Platforms, methods, and the studies they shape

A selection of the platforms I've built, the methods I've developed, and the population-based studies that ground both.

Platform · 327 tests · Multi-tenant

svyLab

Multi-tenant analytics platform for health-data evidence generation

Sole architect and developer. Architectural sandbox isolation (cross-org reads return 404 by design, not just by access control), lifecycle and classification governance with audit logs, AI-assisted analysis where every output persists with full provenance — model, prompt, code, tokens, cost, timestamp. 327 passing tests including structural CI invariants, cross-org sandbox enforcement, existence-leak contracts, and quota correctness. Built deliberately to model the reproducibility and tenant-safety properties that regulated RWE work requires.

Sandbox isolation AI-with-provenance Lifecycle governance Reproducibility primitives Fail-closed auth
Python · In development

svy-causal

Survey-weighted causal inference for pharmacoepidemiology — IPTW, stabilized weights, doubly robust estimation, validated against NHANES. Accepted for poster at ISPE 42nd Annual Meeting, Milan (August 2026): 'Design-Based Causal Inference for Pharmacoepidemiology.'

Open source · 265K+ downloads

Samplics

Production-grade Python library for sample selection, weighting, estimation, and small area estimation. Published in the Journal of Open-Source Software (JOSS, 2021). The reference implementation for survey statistics in Python.

JOSS paper →
Population-based research

PHIA — HIV biomarker surveys, 8 countries

Lead sampling statistician at Westat for the Population-based HIV Impact Assessment surveys across Cameroon, Côte d'Ivoire, Malawi, Namibia, Tanzania, Uganda, Zambia, Zimbabwe — in collaboration with ICAP at Columbia and CDC/PEPFAR. Multi-stage probability samples for HIV prevalence, incidence, viral-load suppression, and ART coverage at national and subnational levels.

Multi-source RWD integration

WUENIC — global immunization estimates

At UNICEF, led the annual production of WHO/UNICEF immunization coverage estimates for 14 vaccines across all 195 countries. Integrated administrative reporting (DHIS2), household-survey microdata (DHS, MICS), and programme surveillance under GATHER reporting standards — with fit-for-purpose assessments across data modalities.

Subnational estimation

HILDX — county-level health estimates

At NORC, led development of the Healthy Illinois Analytics Platform — a Python-based platform producing county- and community-level health estimates across all 102 Illinois counties using small area estimation. Integrated American Community Survey administrative data with survey microdata under disclosure control.

U.S. federal · CRAN package

NCVS subnational crime estimates

At Westat, developed model-based small area estimation for state-level crime rates using 15 years of National Crime Victimization Survey data. Published the R package sae2 on CRAN — used for federal subnational statistics.

Publications

Selected publications

Peer-reviewed work in observational and survey methodology, small area estimation, population health, and machine learning.

Submitted / Accepted

  • 2026
    Design-Based Causal Inference for Pharmacoepidemiology: Survey-Weighted IPTW and Doubly Robust Estimation in Python With NHANES Validation
    ISPE 42nd Annual Meeting, Milan — Poster (accepted)

Methodology & Statistical Methods

Population Health & Real-World Evidence

Get in touch

Let's talk.

Open to senior RWE / pharmacoepidemiology / HEOR roles at pharma companies, RWE platform vendors, and CROs — full-time or contract. Particularly interested in roles that combine observational methodology with platform engineering.

Also selectively consulting on survey design, small area estimation, and statistical capacity for international agencies and research institutions. Bilingual English/French. I respond to all inquiries within two business days.

Download CV (PDF)

Send a message

I'll respond within two business days.