Skip to content

UltraCompress design-partner pilot

For: chip vendors · OEMs · AI inference platforms · edge-cloud operators · robotics + automotive teams

From: Sipsa Labs, Inc. (Delaware C-Corp in formation; sipsalabs.com)

Filed IP: USPTO 64/049,511 + 64/049,517 (both filed 2026-04-25, patent pending)


The problem you have

Modern transformer language models have outgrown the hardware most of the world actually runs them on:

  • Phone-class and automotive deployments are memory-constrained, often forcing teams to ship smaller local models than the product would otherwise want
  • In-vehicle inference is latency-bound on memory budgets, not capability-bound on the model itself
  • Inference platforms at scale are GPU-memory-bound on margins
  • Model registries absorb storage + egress costs that scale linearly with fleet size

The methods that exist (bitsandbytes, GPTQ, AWQ, HQQ) hit a wall at 4 bits per weight. Below 4 bpw, in our 6-model benchmark cohort, every public method falls off a quality cliff.

What we deliver

Track A — post-training row-overlay quantization (USPTO 64/049,511) — shipping now

Sub-3-bits-per-weight on a 6-model head-to-head cohort. 30% smaller than bitsandbytes NF4 at equivalent retention. Zero catastrophic failures across the cohort — the only public method at this compression frontier with that property in the cohort we tested.

Track B — Fractal Residual Recursion (USPTO 64/049,517) — v0.2 (Q3 2026)

Architectural compression beyond the published academic frontier. Combined with Track A, the strongest end-to-end ratio we've measured for transformer language models in our cohort. Public-safe Track B evidence at docs/evidence/matrix.md.

What ships under a pilot

  • Pre-compressed model artifacts (rolling release on Hugging Face Hub through April–May 2026 — let's discuss which architecture families fit your stack)
  • A reproducibility manifest (SHA-256 of every input + deterministic seed)
  • A reference loader you can drop into your runtime
  • A model card describing the per-task agreement / retention envelope
  • Direct technical support from the founder during the pilot window

Pilot offers

We run two pilot shapes. Both are designed to convert to a recurring license if the technology lands.

Tier 1 — Compression Assessment ($5,000 · 2-week turnaround)

For teams who want to validate UltraCompress against a public/open-weight representative model from your stack before committing to a full deployment pilot.

What you get:

  • Sipsa runs the internal reference pipeline and delivers the assessment on a model + benchmark of your choice
  • Public-method comparison table: UltraCompress vs your current quantization stack
  • Per-task retention curves (T1, T10, T32, T64, T128, T256) on the metrics you care about
  • A 30-minute deep-dive call covering methodology, limits, and the v0.2 roadmap
  • A written assessment report (10-15 pages) you can take to internal stakeholders

What we need from you:

  • The model and the benchmark we should run against
  • Two 30-minute calls (kickoff + readout)
  • A signed mutual NDA before kickoff

Tier 2 — Production Deployment Pilot ($15,000–$25,000 · 60-day pilot window)

For teams ready to put UltraCompress into a development or staging deployment surface and measure the production characteristics.

What you get:

  • Three pre-compressed model artifacts selected or prepared for your target hardware profile (architectures of your choice)
  • Integration support for your inference stack (vLLM, TensorRT-LLM, llama.cpp, custom) — within reason
  • Daily Slack / email channel during the 60-day window
  • Per-deployment performance dashboard: latency, memory, retention, customer-facing metrics
  • A pilot readout deck you can use internally to evaluate go-no-go on a recurring license
  • Right of first negotiation on a per-deployment SaaS license at the end of the pilot

What we need from you:

  • A scoped deployment surface (one product or one internal use case is plenty)
  • A technical lead on your side for daily cadence
  • A signed mutual NDA before kickoff
  • A signed pilot agreement (we provide a template)

What's in scope vs out

In scope (pilot) Out of scope (separate license required)
Public / open-weight model assessment + benchmark Compression of your private/proprietary models (requires NDA + commercial pilot terms)
Methodology deep-dives under NDA Per-device royalty / OEM licensing structure (separate term sheet, scoped per customer)
Bug fixes + integration help Custom new compression methods (separate research engagement)
60-day production pilot window Permanent production deployment (recurring license required)

Patent + commercial licensing path post-pilot

Both pilot tiers convert to one of three commercial license shapes (or you can walk away with the assessment report).

License shape Pricing posture Best fit
Per-deployment SaaS Starts at design-partner-friendly entry pricing; scales with deployment surface Single product / single customer
Multi-deployment SaaS Tiered annual; structured with the customer based on internal use-case count Enterprise with multiple internal use cases
OEM / per-device royalty Custom volume-tiered structure (annual license, per-device royalty, or hybrid); includes patent license Chip vendors and device OEMs

Patent license terms are bundled into the commercial license. Audit rights and standard commercial license terms apply, with redlines worked on a 2-3 week cycle. Specific bands are scoped per customer under NDA.

Get started

Email founder@sipsalabs.com with:

  • Which tier (assessment or pilot) is most useful right now
  • The architecture family / specific model you want benchmarked
  • Your timing window
  • Your preferred call structure (1-on-1 founder, technical team, exec sponsor)

We respond same-day during US business hours and target a kickoff call within 5 business days.


UltraCompress v0.1 alpha shipped 2026-04-25. Pre-compressed reference models release throughout April–May 2026. Track B and uc compress ship in v0.2 (Q3 2026), gated on patent prosecution timing.

The CLI is Apache 2.0. The pre-compressed model artifacts are licensed separately (research-free or commercial-paid). The compression methodology is patent pending.

sipsalabs.com · github.com/sipsalabs/ultracompress · huggingface.co/sipsalabs