Platform · svyLab

A platform for evidence — reproducible, auditable, isolated.

svyLab is a multi-tenant analytics platform I designed and built solo. It exists to model the reproducibility, auditability, and tenant-safety properties that regulated real-world evidence work requires — built on top of the open-source svy ecosystem.

This page is my narrative on what I built and why it matters for RWE infrastructure thinking. For product information, see svylab.com.

327
Tests passing — including structural CI invariants and cross-org sandbox enforcement
3
Backend hardening phases completed (auth, guards, lifecycle/library)
0
Cross-org reads possible — by architecture
5
Packages in the svy ecosystem (svy, svy-sae, svy-io, svy-causal, samplics)

Core architectural principles

Four design choices that shape everything.

These are not features. They are commitments — when a specific design decision conflicts with one of these, the principle wins. Together they're what makes the platform's properties hold up to scrutiny rather than just look good on a marketing page.

01

Every organization is a complete sandbox

Cross-org reads return 404 at every endpoint — by design, not by access control. A user from org B making a request for a dataset in org A gets the same response as if the dataset did not exist. The only way data flows between orgs is via explicit user action. Architecturally invisible, not just access-controlled.

02

Lifecycle and classification are orthogonal

A dataset's maturity (in_production → team_finalized → org_validated → archived) reflects how far it has moved through the QA pipeline. Its access classification (public / restricted) reflects who in the org can see its data. These axes are independent and governed by different authorities — the decoupling is deliberate, designed to prevent the silent-authorization-expansion bugs that leak sensitive data on promotion.

03

AI makes methodology accessible without hiding it

svy-agents translate natural-language questions into validated svy code. Every translation is shown before execution. Every output persists with full provenance — model, prompt, code, tokens, cost, timestamp. No silent AI-authored data modifications; every transformation is traceable to a user decision. The user finishes their work understanding what they did and why.

04

Reproducibility is a primitive, not a feature

Immutable dataset versions; versioned survey designs with full replicate-weight support; analysis lineage with rerun-against-current-version; soft delete with selective restore; Typst-based audit reporting. Every numerical result on the platform can be reconstructed from its inputs. Built-in, not bolted on.

Why this matters for RWE

The architecture is the argument.

svyLab was built around survey data, not claims or EHR data. But the architectural properties — sandbox isolation, reproducibility, governance, AI-with-provenance — are exactly what regulated RWE work requires, and they are unusual to find in someone applying for senior methodology roles. This page explains the mapping.

Tenant isolation
RWE teams handle sensitive patient data across multiple sponsors and studies. Architectural sandbox isolation — not access control over a shared schema — is the safer foundation.
Reproducible workflows
Regulatory submissions require full reproducibility. Every analysis on svyLab produces svy code, provenance metadata, and versioned results. Rebuild is one call.
AI with methodology guardrails
AI in regulated analytics has to be auditable. Every svy-agents call logs model, version, prompt, context hash, tokens, and cost. The user sees the translation; nothing happens silently.
Lifecycle governance
Datasets progress through maturity stages. Classification (public/restricted) is governed orthogonally by a separate authority. Maps cleanly to RWE data-governance requirements.
Institutional library
Finalized datasets enter a curated organizational library for secondary analysis — exactly what RWE teams managing multiple studies across therapeutic areas need.

Stack

Built solo, end to end.

From the Rust extensions in the svy core, through the Python backend and async SQLAlchemy data layer, to the Astro/Svelte frontend and the AI provenance layer — one designer, one engineer, one accountable methodologist.

Frontend
Astro · Svelte 5 · Tailwind v4
Backend
Python · Litestar · async SQLAlchemy
Storage
PostgreSQL · DuckDB (analytic) · Redis (sessions, rate limiting)
Computation
svy ecosystem · Rust/PyO3 · JAX · Polars
AI layer
svy-agents · multiple LLM providers · cost tracking
Reporting
Typst CLI · structured artifacts
Infrastructure
Docker · CI-enforced structural invariants

Open source

The svy ecosystem.

svyLab sits on top of an open-source Python ecosystem for survey design and analysis — also designed and maintained by me. The platform is the polish; the ecosystem is the foundation.

Production · 265K+ downloads

Samplics

Comprehensive Python library for sample selection, weighting, estimation, and small area estimation. Published in JOSS (2021).

Beta

svy

Next-generation survey design and inference. Taylor linearization; replicate variance (BRR, jackknife, bootstrap). Built with Rust/PyO3 and Polars.

Beta

svy-sae

JAX-powered Fay–Herriot and unit-level small area estimation for subnational disease burden and prevalence.

Alpha

svy-io

High-speed I/O for SAS, SPSS, and Stata files — preserving variable labels, value labels, and user-defined missing codes.

In development · ISPE 2026

svy-causal

Survey-weighted causal inference for pharmacoepi — IPTW, stabilized weights, doubly robust estimation, validated against NHANES.

CRAN · production

sae2

R package for model-based small area estimation, developed for the Bureau of Justice Statistics.

Get in touch

Building or buying RWE infrastructure?

Open to senior roles where this thinking is useful — and happy to talk about the tradeoffs even if a role isn't open.