UltraCompress¶
Extreme compression for large language models.
UltraCompress shrinks transformer language models below the 4-bits-per-weight floor that has stumped every prior open-source method, with zero catastrophic failures on a 6-model head-to-head benchmark.
v0.1 alpha
Pre-compressed reference models are uploading to the Hugging Face Hub throughout April–May 2026. Run uc list for the live catalog at any time.
The compression methods are the subject of pending U.S. patent applications (USPTO 64/049,511 and 64/049,517, filed 2026-04-25). This CLI is the open-source distribution layer; pre-compressed reference models roll out through the sipsalabs organization on the Hugging Face Hub through April–May 2026 — uc list shows the live catalog at any time.
Install¶
60-second quickstart¶
# Browse the official catalog
uc list
# Download a pre-compressed model
uc pull sipsalabs/<model-id>
# Inspect what's in it
uc info ./models/<model-id>
# Run downstream benchmarks
uc bench ./models/<model-id> --tasks hellaswag --limit 500
What's in a pre-compressed artifact¶
Each artifact is a directory with:
model.safetensors— quantized weights in our compressed formatultracompress.json— provenance manifest (bpw, base model ID, SHA-256 of weights, license, method version)tokenizer/— pre-loaded tokenizer matching the base modelLICENSE— the per-model license (research-free or commercial-paid; contactlegal@sipsalabs.com)
Why we exist¶
The published methods most teams use (bitsandbytes, GPTQ, AWQ, HQQ) all hit a wall at 4 bits per weight. Below 4 bits they collapse with catastrophic quality loss. We push past the wall:
| Method | Bits per weight | Quality retention (cohort median) | Catastrophic failures |
|---|---|---|---|
| bitsandbytes int8 | 8.000 | 99.75% | 0/6 |
| bitsandbytes NF4 | 4.000 | 98.31% | 0/6 |
| HQQ 4-bit g64 | 4.500 | 97.72% | 0/6 |
| UltraCompress 2.8 bpw | 2.798 | 95.63% | 0/6 |
| HQQ 3-bit g64 | 3.500 | 72.46% | 1/6 |
| HQQ 2-bit g64 | 2.500 | 3.46% | 6/6 |
Source: 6-model × 8-method × 500-sample head-to-head benchmark on WikiText-103 perplexity ratio, deterministic seed, full SHA-256 verification manifest available under NDA.
Where to go next¶
- First time here? Quickstart
- Want to understand the methods? Compression methods overview
- Need to integrate with your inference stack? Integration guides
- Looking for a specific model? Run
uc listfor the live catalog. - Deploying in a commercial product? Email
legal@sipsalabs.com.
Status¶
UltraCompress is in public alpha as of v0.1.0 (April 2026). The CLI is stable for list, pull, info, bench. Self-compression (uc compress <model>) is intentionally not yet shipped — it depends on the patent-pending compression methods being formally protected. Targeted v0.2 release: late Q3 2026.
Stay in touch¶
- Website: sipsalabs.com
- GitHub: github.com/sipsalabs/ultracompress
- Hugging Face: huggingface.co/sipsalabs
- PyPI: pypi.org/project/ultracompress
- Twitter: @sipsalabs
- Email:
founder@sipsalabs.comfor commercial / partnership inquiries