Skip to content

UltraCompress

Extreme compression for large language models.

PyPI Python License CI

UltraCompress shrinks transformer language models below the 4-bits-per-weight floor that has stumped every prior open-source method, with zero catastrophic failures on a 6-model head-to-head benchmark.

v0.1 alpha

Pre-compressed reference models are uploading to the Hugging Face Hub throughout April–May 2026. Run uc list for the live catalog at any time.

The compression methods are the subject of pending U.S. patent applications (USPTO 64/049,511 and 64/049,517, filed 2026-04-25). This CLI is the open-source distribution layer; pre-compressed reference models roll out through the sipsalabs organization on the Hugging Face Hub through April–May 2026 — uc list shows the live catalog at any time.

Install

pip install ultracompress
uv add ultracompress
git clone https://github.com/sipsalabs/ultracompress.git
cd ultracompress
pip install -e ".[dev]"

60-second quickstart

# Browse the official catalog
uc list

# Download a pre-compressed model
uc pull sipsalabs/<model-id>

# Inspect what's in it
uc info ./models/<model-id>

# Run downstream benchmarks
uc bench ./models/<model-id> --tasks hellaswag --limit 500

What's in a pre-compressed artifact

Each artifact is a directory with:

  • model.safetensors — quantized weights in our compressed format
  • ultracompress.json — provenance manifest (bpw, base model ID, SHA-256 of weights, license, method version)
  • tokenizer/ — pre-loaded tokenizer matching the base model
  • LICENSE — the per-model license (research-free or commercial-paid; contact legal@sipsalabs.com)

Why we exist

The published methods most teams use (bitsandbytes, GPTQ, AWQ, HQQ) all hit a wall at 4 bits per weight. Below 4 bits they collapse with catastrophic quality loss. We push past the wall:

Method Bits per weight Quality retention (cohort median) Catastrophic failures
bitsandbytes int8 8.000 99.75% 0/6
bitsandbytes NF4 4.000 98.31% 0/6
HQQ 4-bit g64 4.500 97.72% 0/6
UltraCompress 2.8 bpw 2.798 95.63% 0/6
HQQ 3-bit g64 3.500 72.46% 1/6
HQQ 2-bit g64 2.500 3.46% 6/6

Source: 6-model × 8-method × 500-sample head-to-head benchmark on WikiText-103 perplexity ratio, deterministic seed, full SHA-256 verification manifest available under NDA.

Where to go next

  • First time here? Quickstart
  • Want to understand the methods? Compression methods overview
  • Need to integrate with your inference stack? Integration guides
  • Looking for a specific model? Run uc list for the live catalog.
  • Deploying in a commercial product? Email legal@sipsalabs.com.

Status

UltraCompress is in public alpha as of v0.1.0 (April 2026). The CLI is stable for list, pull, info, bench. Self-compression (uc compress <model>) is intentionally not yet shipped — it depends on the patent-pending compression methods being formally protected. Targeted v0.2 release: late Q3 2026.

Stay in touch