UltraCompress design-partner pilot¶
For: chip vendors · OEMs · AI inference platforms · edge-cloud operators · robotics + automotive teams
From: Sipsa Labs, Inc. (Delaware C-Corp in formation; sipsalabs.com)
Filed IP: USPTO 64/049,511 + 64/049,517 (both filed 2026-04-25, patent pending)
The problem you have¶
Modern transformer language models have outgrown the hardware most of the world actually runs them on:
- Phone-class and automotive deployments are memory-constrained, often forcing teams to ship smaller local models than the product would otherwise want
- In-vehicle inference is latency-bound on memory budgets, not capability-bound on the model itself
- Inference platforms at scale are GPU-memory-bound on margins
- Model registries absorb storage + egress costs that scale linearly with fleet size
The methods that exist (bitsandbytes, GPTQ, AWQ, HQQ) hit a wall at 4 bits per weight. Below 4 bpw, in our 6-model benchmark cohort, every public method falls off a quality cliff.
What we deliver¶
Track A — post-training row-overlay quantization (USPTO 64/049,511) — shipping now¶
Sub-3-bits-per-weight on a 6-model head-to-head cohort. 30% smaller than bitsandbytes NF4 at equivalent retention. Zero catastrophic failures across the cohort — the only public method at this compression frontier with that property in the cohort we tested.
Track B — Fractal Residual Recursion (USPTO 64/049,517) — v0.2 (Q3 2026)¶
Architectural compression beyond the published academic frontier. Combined with Track A, the strongest end-to-end ratio we've measured for transformer language models in our cohort. Public-safe Track B evidence at docs/evidence/matrix.md.
What ships under a pilot¶
- Pre-compressed model artifacts (rolling release on Hugging Face Hub through April–May 2026 — let's discuss which architecture families fit your stack)
- A reproducibility manifest (SHA-256 of every input + deterministic seed)
- A reference loader you can drop into your runtime
- A model card describing the per-task agreement / retention envelope
- Direct technical support from the founder during the pilot window
Pilot offers¶
We run two pilot shapes. Both are designed to convert to a recurring license if the technology lands.
Tier 1 — Compression Assessment ($5,000 · 2-week turnaround)¶
For teams who want to validate UltraCompress against a public/open-weight representative model from your stack before committing to a full deployment pilot.
What you get:
- Sipsa runs the internal reference pipeline and delivers the assessment on a model + benchmark of your choice
- Public-method comparison table: UltraCompress vs your current quantization stack
- Per-task retention curves (T1, T10, T32, T64, T128, T256) on the metrics you care about
- A 30-minute deep-dive call covering methodology, limits, and the v0.2 roadmap
- A written assessment report (10-15 pages) you can take to internal stakeholders
What we need from you:
- The model and the benchmark we should run against
- Two 30-minute calls (kickoff + readout)
- A signed mutual NDA before kickoff
Tier 2 — Production Deployment Pilot ($15,000–$25,000 · 60-day pilot window)¶
For teams ready to put UltraCompress into a development or staging deployment surface and measure the production characteristics.
What you get:
- Three pre-compressed model artifacts selected or prepared for your target hardware profile (architectures of your choice)
- Integration support for your inference stack (vLLM, TensorRT-LLM, llama.cpp, custom) — within reason
- Daily Slack / email channel during the 60-day window
- Per-deployment performance dashboard: latency, memory, retention, customer-facing metrics
- A pilot readout deck you can use internally to evaluate go-no-go on a recurring license
- Right of first negotiation on a per-deployment SaaS license at the end of the pilot
What we need from you:
- A scoped deployment surface (one product or one internal use case is plenty)
- A technical lead on your side for daily cadence
- A signed mutual NDA before kickoff
- A signed pilot agreement (we provide a template)
What's in scope vs out¶
| In scope (pilot) | Out of scope (separate license required) |
|---|---|
| Public / open-weight model assessment + benchmark | Compression of your private/proprietary models (requires NDA + commercial pilot terms) |
| Methodology deep-dives under NDA | Per-device royalty / OEM licensing structure (separate term sheet, scoped per customer) |
| Bug fixes + integration help | Custom new compression methods (separate research engagement) |
| 60-day production pilot window | Permanent production deployment (recurring license required) |
Patent + commercial licensing path post-pilot¶
Both pilot tiers convert to one of three commercial license shapes (or you can walk away with the assessment report).
| License shape | Pricing posture | Best fit |
|---|---|---|
| Per-deployment SaaS | Starts at design-partner-friendly entry pricing; scales with deployment surface | Single product / single customer |
| Multi-deployment SaaS | Tiered annual; structured with the customer based on internal use-case count | Enterprise with multiple internal use cases |
| OEM / per-device royalty | Custom volume-tiered structure (annual license, per-device royalty, or hybrid); includes patent license | Chip vendors and device OEMs |
Patent license terms are bundled into the commercial license. Audit rights and standard commercial license terms apply, with redlines worked on a 2-3 week cycle. Specific bands are scoped per customer under NDA.
Get started¶
Email founder@sipsalabs.com with:
- Which tier (assessment or pilot) is most useful right now
- The architecture family / specific model you want benchmarked
- Your timing window
- Your preferred call structure (1-on-1 founder, technical team, exec sponsor)
We respond same-day during US business hours and target a kickoff call within 5 business days.
UltraCompress v0.1 alpha shipped 2026-04-25. Pre-compressed reference models release throughout April–May 2026. Track B and uc compress ship in v0.2 (Q3 2026), gated on patent prosecution timing.
The CLI is Apache 2.0. The pre-compressed model artifacts are licensed separately (research-free or commercial-paid). The compression methodology is patent pending.
sipsalabs.com · github.com/sipsalabs/ultracompress · huggingface.co/sipsalabs