UltraCompress Track B evidence matrix¶

This is the Track B evidence matrix — architectural-compression evidence only. It is separate from the Track A 2.798-bpw v0.1 reference-artifact benchmark in the project README on GitHub. Do not compare retention numbers across tracks as a single quality curve — Track A is post-training quantization (shipping now); Track B is architectural compression (v0.2, Q3 2026).

Track: B — Fractal Residual Recursion (USPTO 64/049,517, patent pending) Availability: evidence now; product availability v0.2 (Q3 2026) Customer ship status: not yet downloadable; pre-compressed reference models for Track B release in v0.2

Cohort size: 6 models Operating point: uniform across the cohort; method-internal parameters held constant under NDA

Per-model results¶

Every row is row-level-labeled with experiment family and customer ship status so a screenshot of any single row carries its own firewall context.

Model	Model family	Experiment family	Customer ship status	Params (B)	Track	Availability	bpw	Compression vs FP16	T1 retention	T10 retention	T1 agreement	T10 agreement	PPL FP16	PPL compressed
TinyLlama-1.1B	Llama-2	Track B evidence	Not yet downloadable	1.1	B	v0.2	2.3988	1.6985x	83.61%	91.73%	65.01%	94.17%	17.0142	28.8989
OLMo-2-1B	OLMo-2	Track B evidence	Not yet downloadable	1.485	B	v0.2	2.3906	1.7898x	82.75%	90.83%	62.76%	93.06%	20.1537	36.0711
SmolLM2-1.7B	SmolLM2	Track B evidence	Not yet downloadable	1.812	B	v0.2	2.3906	1.8988x	80.84%	90.18%	62.57%	93.20%	18.0321	34.2397
Qwen3-1.7B	Qwen3	Track B evidence	Not yet downloadable	1.7	B	v0.2	2.4017	1.788x	84.65%	90.68%	64.04%	93.88%	33.21	59.4
Mistral-7B-v0.3	Mistral	Track B evidence	Not yet downloadable	7.248	B	v0.2	2.3942	1.6274x	86.21%	93.19%	69.69%	95.06%	12.3569	20.1093
Qwen3-8B	Qwen3	Track B evidence	Not yet downloadable	8.19	B	v0.2	2.3967	1.3859x	91.85%	95.83%	73.56%	96.98%	20.6963	28.6829

bpw vs compression-ratio denominator note: the bpw column measures bits-per-weight on the compressed artifact; the Compression vs FP16 column measures total artifact size relative to the FP16 baseline. The two columns measure different denominators (per-weight cost vs total artifact accounting) and should not be reconciled with each other without consulting the manifest. The same compressed artifact can have a low bpw (per-weight) and a modest compression ratio (per-artifact) when the architectural-compression component dominates the savings.

Cohort summary (Track B at the same operating point)¶

Median bpw: 2.40
Median T1 retention: 84.13%
Median T10 retention: 91.28%
Median compression ratio vs FP16: 1.74x

Envelope across the cohort¶

bpw range: 2.39 – 2.40
Compression ratio range: 1.39x – 1.90x vs FP16
T10 retention floor: 90.18% (worst-case across the cohort)
T10 agreement floor: 93.06%

Track A (separate experiment — referenced for context only)¶

The README headline numbers (95.63% median T1 retention, zero catastrophic failures, 30% smaller than NF4) are the Track A benchmark at 2.798 bpw. Track A and Track B are different methods, different operating points, different bpw targets. Customer evaluations should pick a track based on use case:

Track A (shipping now) — drop-in replacement for bitsandbytes / GPTQ / AWQ / HQQ. Pre-compressed reference artifacts roll out on Hugging Face Hub through April–May 2026.
Track B (v0.2) — architectural compression layered on top of (or in place of) standard quantization. Higher absolute compression, narrower architecture support at v0.2 launch.

Field definitions¶

bpw — bits per weight; effective on-disk per-parameter cost including all overhead (codebooks, scales, zero points, metadata).
Compression vs FP16 — compressed-artifact size relative to FP16 baseline; >1.0 means smaller. Different denominator than bpw; see footnote above.
Experiment family — which experimental track this row belongs to. All rows in this matrix are Track B evidence.
Customer ship status — whether artifacts from this experiment are downloadable today. All rows in this matrix are not yet downloadable; release in v0.2.
PPL FP16 — WikiText-103 perplexity of the FP16 teacher.
PPL compressed — WikiText-103 perplexity of the compressed model at this operating point.
T1 / T10 agreement — fraction of tokens where compressed top-k matches teacher top-k.
T1 / T10 retention — compressed top-k accuracy / teacher top-k accuracy, expressed as a percentage.

Provenance¶

Source: internal Sipsa Labs benchmark archive. SHA-256-verified manifest available under NDA — email legal@sipsalabs.com.
Extracted: 2026-04-27.
Method: direct field copy of public-safe fields only; method internals (operating-point parameters, codebook sizes, calibration constants) deliberately excluded — those live with the filed patent specifications and are accessible only under NDA.
No hand-entered values. Each row is a direct copy from the source archive with rounding only.

Notes¶

All numeric values are direct field copies from the source archive; no hand-entered values.
Models without ppl_fp16 / ppl_compressed ran the agreement/retention pipeline but not the perplexity pipeline.
Cohort medians and envelope are computed by this extractor; readers can recompute from the per-model rows.
This is the Track B evidence matrix — architectural-compression evidence. Track A v0.1 reference-artifact benchmarks are in the README. Do not combine.

Both matrix.md (this file) and matrix.json are direct field copies from the source-of-truth file in the internal Sipsa Labs benchmark archive. Method-internal fields (operating-point parameters, codebook sizes, calibration constants) are deliberately excluded; the patent specifications cover those.