Compression methods overview¶
UltraCompress combines two complementary patent-pending tracks:
- Track A — post-training row-overlay quantization (USPTO 64/049,511) — shipping in v0.1
- Track B — Fractal Residual Recursion (USPTO 64/049,517) — v0.2 (Q3 2026)
This page is a high-level conceptual overview. For implementation specifics, contact legal@sipsalabs.com for an NDA-gated technical deep dive.
Track A — sub-3-bpw weight representation (v0.1, shipping)¶
Quantization is the standard approach to model compression: take a 16-bit floating-point weight and store it in fewer bits. The traditional ceiling for "good quality" was 8 bits per weight (int8). bitsandbytes pushed it to 4 bits with NF4 in 2023, and HQQ pushed it slightly further with group-wise schemes — but every public method we measured falls off a quality cliff below 4 bpw.
Track A is a novel post-training weight representation. In our 6-model benchmark cohort at 2.798 bits per weight:
- ~30% smaller than bitsandbytes NF4 at equivalent retention
- Zero catastrophic failures across the cohort — the only public method we evaluated at this compression frontier with that property in the cohort we tested
- Per-task retention curves (T1, T10, T32, T64, T128, T256) ship in the per-model card on each artifact's Hugging Face Hub repository
For the actual measured numbers and their cohort scope, see evidence/matrix.md.
Track B — architectural compression (v0.2, Q3 2026)¶
Where Track A compresses weights, Track B compresses the architecture — restructuring the transformer block to retain expressive capacity at substantially fewer trainable parameters.
Public detail on Track B is intentionally limited until v0.2 ships. Public-safe Track B evidence is at evidence/matrix.md. The Q3 2026 v0.2 release timing is gated on patent prosecution; early-access design partners can engage now via the pilot program.
What "catastrophic failure" means¶
We use a published T_cat threshold: any cohort member whose perplexity ratio exceeds 10× the FP16 baseline is a catastrophic failure. HQQ at 2-bit and lower produces models that cross this threshold; Track A at 2.798 bpw does not, in the cohort we tested. See catastrophic-failures.md.
What we share publicly vs. under NDA¶
| Information | Public | NDA |
|---|---|---|
| Cohort-level compression and retention numbers | ✅ | ✅ |
| Validation cohort + benchmark methodology summary | ✅ | ✅ |
| Per-model retention envelope | ✅ (model cards) | ✅ (full breakdown) |
| Reproducibility manifest (SHA-256 file index) | reference only | full manifest |
| Track A operating-point parameters and codebook structure | — | ✅ |
| Track B mechanism and architectural specification | — | ✅ |
| Patent specifications (filed April 2026) | — | ✅ when public record |
If you need NDA access to the technical deep dive, email legal@sipsalabs.com.
Reproducibility¶
Every public number ships with:
- A deterministic seed (default seed = 42 across all runs)
- A full sample count (no cherry-picked best-of-N)
- A multi-model cohort (no single-model fluke results)
- A SHA-256-verified manifest of the input artifacts
This is increasingly a procurement gate for enterprise customers. See reproducibility.md for the full reproducibility commitment.
What's next¶
- The Track A supplement extends Track A claim scope; details available under NDA
- Track B v0.2 ships in Q3 2026, gated on patent prosecution timing
- Future research is under active patent strategy; out of scope for public discussion