Platform · svyLab
svyLab is a multi-tenant analytics platform I designed and built solo. It exists to model the reproducibility, auditability, and tenant-safety properties that regulated real-world evidence work requires — built on top of the open-source svy ecosystem.
This page is my narrative on what I built and why it matters for RWE infrastructure thinking. For product information, see svylab.com.
Core architectural principles
These are not features. They are commitments — when a specific design decision conflicts with one of these, the principle wins. Together they're what makes the platform's properties hold up to scrutiny rather than just look good on a marketing page.
Cross-org reads return 404 at every endpoint — by design, not by access control. A user from org B making a request for a dataset in org A gets the same response as if the dataset did not exist. The only way data flows between orgs is via explicit user action. Architecturally invisible, not just access-controlled.
A dataset's maturity (in_production → team_finalized → org_validated → archived) reflects how far it has moved through the QA pipeline. Its access classification (public / restricted) reflects who in the org can see its data. These axes are independent and governed by different authorities — the decoupling is deliberate, designed to prevent the silent-authorization-expansion bugs that leak sensitive data on promotion.
svy-agents translate natural-language questions into validated svy code. Every translation is shown before execution. Every output persists with full provenance — model, prompt, code, tokens, cost, timestamp. No silent AI-authored data modifications; every transformation is traceable to a user decision. The user finishes their work understanding what they did and why.
Immutable dataset versions; versioned survey designs with full replicate-weight support; analysis lineage with rerun-against-current-version; soft delete with selective restore; Typst-based audit reporting. Every numerical result on the platform can be reconstructed from its inputs. Built-in, not bolted on.
Why this matters for RWE
svyLab was built around survey data, not claims or EHR data. But the architectural properties — sandbox isolation, reproducibility, governance, AI-with-provenance — are exactly what regulated RWE work requires, and they are unusual to find in someone applying for senior methodology roles. This page explains the mapping.
Stack
From the Rust extensions in the svy core, through the Python backend and async SQLAlchemy data layer, to the Astro/Svelte frontend and the AI provenance layer — one designer, one engineer, one accountable methodologist.
Open source
svyLab sits on top of an open-source Python ecosystem for survey design and analysis — also designed and maintained by me. The platform is the polish; the ecosystem is the foundation.
Comprehensive Python library for sample selection, weighting, estimation, and small area estimation. Published in JOSS (2021).
Next-generation survey design and inference. Taylor linearization; replicate variance (BRR, jackknife, bootstrap). Built with Rust/PyO3 and Polars.
JAX-powered Fay–Herriot and unit-level small area estimation for subnational disease burden and prevalence.
High-speed I/O for SAS, SPSS, and Stata files — preserving variable labels, value labels, and user-defined missing codes.
Survey-weighted causal inference for pharmacoepi — IPTW, stabilized weights, doubly robust estimation, validated against NHANES.
R package for model-based small area estimation, developed for the Bureau of Justice Statistics.
Get in touch
Open to senior roles where this thinking is useful — and happy to talk about the tradeoffs even if a role isn't open.