Platform · svyLab

A platform for evidence , reproducible, auditable, isolated.

svyLab is a multi-tenant analytics platform I designed and built solo. It exists to make survey-correct estimation reproducible and auditable by construction, built on top of the open-source svy ecosystem.

This page is my narrative on what I built and why it matters for any serious estimation program. For product information, see svylab.com.

Core architectural principles

Four design choices that shape everything.

These are not features. They are commitments: when a specific design decision conflicts with one of these, the principle wins. Together they're what makes the platform's properties hold up to scrutiny rather than just look good on a marketing page.

Every organization is a complete sandbox

Cross-org reads return 404 at every endpoint, by design, not by access control. A user from org B making a request for a dataset in org A gets the same response as if the dataset did not exist. The only way data flows between orgs is via explicit user action. Architecturally invisible, not just access-controlled.

Lifecycle and classification are orthogonal

A dataset's maturity (in_production → team_finalized → org_validated → archived) reflects how far it has moved through the QA pipeline. Its access classification (public / restricted) reflects who in the org can see its data. These axes are independent and governed by different authorities, and the decoupling is deliberate, designed to prevent the silent-authorization-expansion bugs that leak sensitive data on promotion.

AI makes methodology accessible without hiding it

svy-agents translate natural-language questions into validated svy code. Every translation is shown before execution. Every output persists with full provenance: model, prompt, code, tokens, cost, timestamp. No silent AI-authored data modifications; every transformation is traceable to a user decision. The user finishes their work understanding what they did and why.

Reproducibility is a primitive, not a feature

Immutable dataset versions; versioned survey designs with full replicate-weight support; analysis lineage with rerun-against-current-version; soft delete with selective restore; Typst-based audit reporting. Every numerical result on the platform can be reconstructed from its inputs. Built-in, not bolted on.

Why this matters

The architecture is the argument.

The properties that make svyLab trustworthy, sandbox isolation, reproducibility, governance, and AI-with-provenance, are exactly what any regulated or official-statistics analytics program needs, and they are unusual to find built by the same person who develops the methods. This section explains the mapping.

Tenant isolation

Statistics offices and health agencies handle sensitive microdata across many programs and partners. Architectural sandbox isolation, not access control over a shared schema, is the safer foundation.

Reproducible workflows

Official statistics and regulated analysis require full reproducibility. Every analysis on svyLab produces svy code, provenance metadata, and versioned results. Rebuild is one call.

AI with methodology guardrails

AI in serious analytics has to be auditable. Every svy-agents call logs model, version, prompt, context hash, tokens, and cost. The user sees the translation; nothing happens silently.

Lifecycle governance

Datasets progress through maturity stages. Classification (public/restricted) is governed orthogonally by a separate authority. Maps cleanly to disclosure-control and data-governance requirements.

Institutional library

Finalized datasets enter a curated organizational library for secondary analysis, exactly what agencies managing many surveys and estimation programs need.

Stack

Built solo, end to end.

From the Rust extensions in the svy core, through the Python backend and async SQLAlchemy data layer, to the Astro/Svelte frontend and the AI provenance layer: one designer, one engineer, one accountable methodologist.

Frontend

Astro · Svelte 5 · Tailwind v4

Backend

Python · Litestar · async SQLAlchemy

Storage

PostgreSQL · DuckDB (analytic) · Redis (sessions, rate limiting)

Computation

svy ecosystem · Rust/PyO3 · JAX · Polars

AI layer

svy-agents · multiple LLM providers · cost tracking

Reporting

Typst CLI · structured artifacts

Infrastructure

Docker · CI-enforced structural invariants

Open source

The svy ecosystem.

svyLab sits on top of an open-source Python ecosystem for survey design and analysis, also designed and maintained by me. The platform is the polish; the ecosystem is the foundation.

Production · 270K+ downloads

Samplics

Comprehensive Python library for sample selection, weighting, estimation, and small area estimation. Published in JOSS (2021).

Beta

svy

Next-generation survey design and inference. Taylor linearization; replicate variance (BRR, jackknife, bootstrap). Built with Rust/PyO3 and Polars.

Beta

svy-sae

JAX-powered Fay–Herriot and unit-level small area estimation for subnational disease burden and prevalence.

Alpha

svy-io

High-speed I/O for SAS, SPSS, and Stata files, preserving variable labels, value labels, and user-defined missing codes.

In development · paper in preparation

svy-causal

Survey-weighted causal inference: IPTW, stabilized weights, and doubly robust estimation, validated against NHANES.

CRAN · production

sae2

R package for model-based small area estimation, developed for the Bureau of Justice Statistics.