UltraCompress design-partner pilot¶
For: chip vendors · OEMs · AI inference platforms · edge-cloud operators · robotics + automotive teams
From: Sipsa Labs, Inc. (Delaware C-Corp in formation; sipsalabs.com)
Filed IP: U.S. provisional patent applications (filed April 2026, patent pending)
The problem you have¶
Modern transformer language models have outgrown the hardware most of the world actually runs them on:
- Phone-class and automotive deployments are memory-constrained, often forcing teams to ship smaller local models than the product would otherwise want
- In-vehicle inference is latency-bound on memory budgets, not capability-bound on the model itself
- Inference platforms at scale are GPU-memory-bound on margins
- Model registries absorb storage + egress costs that scale linearly with fleet size
The methods that exist (bitsandbytes, GPTQ, AWQ, HQQ) are all lossy — they drift relative to the original weights, which is unacceptable where bf16-equivalent quality is a hard requirement.
What we deliver¶
Near-lossless 5-bit compression (patent pending) — shipping now¶
A near-lossless 5-bit pack (~1% perplexity vs the bf16 reference; lossy) with reproducible, cryptographically verifiable reconstruction — a deterministic decode to the SHA-256-pinned validated artifact (a ~1% PPL reconstruction of the bf16 source, not a bit-identical copy of it). 22 architectures PPL-verified end-to-end (plus 1 ViT cosine-verified) against their bf16 baseline, with end-to-end perplexity ratios within a fraction of a percent of the bf16 teacher.
Research preview (patent pending) — v0.2 (Q3 2026)¶
A separate research-preview compression experiment under active patent strategy. Public-safe research-preview evidence at docs/evidence/matrix.md.
What ships under a pilot¶
- Pre-compressed model artifacts (rolling release on Hugging Face Hub through April–May 2026 — let's discuss which architecture families fit your stack)
- A reproducibility manifest (SHA-256 of every input + deterministic seed)
- A reference loader you can drop into your runtime
- A model card describing the per-task agreement / retention envelope
- Direct technical support from the founder during the pilot window
Pilot offers¶
We run two pilot shapes. Both are designed to convert to a recurring license if the technology lands.
Tier 1 — Compression Assessment ($5,000 · 2-week turnaround)¶
For teams who want to validate UltraCompress against a public/open-weight representative model from your stack before committing to a full deployment pilot.
What you get:
- Sipsa runs the internal reference pipeline and delivers the assessment on a model + benchmark of your choice
- Public-method comparison table: UltraCompress vs your current quantization stack
- Per-task retention curves (T1, T10, T32, T64, T128, T256) on the metrics you care about
- A 30-minute deep-dive call covering methodology, limits, and the v0.2 roadmap
- A written assessment report (10-15 pages) you can take to internal stakeholders
What we need from you:
- The model and the benchmark we should run against
- Two 30-minute calls (kickoff + readout)
- A signed mutual NDA before kickoff
Tier 2 — Production Deployment Pilot ($15,000–$25,000 · 60-day pilot window)¶
For teams ready to put UltraCompress into a development or staging deployment surface and measure the production characteristics.
What you get:
- Three pre-compressed model artifacts selected or prepared for your target hardware profile (architectures of your choice)
- Integration support for your inference stack (vLLM, TensorRT-LLM, llama.cpp, custom) — within reason
- Daily Slack / email channel during the 60-day window
- Per-deployment performance dashboard: latency, memory, retention, customer-facing metrics
- A pilot readout deck you can use internally to evaluate go-no-go on a recurring license
- Right of first negotiation on a per-deployment SaaS license at the end of the pilot
What we need from you:
- A scoped deployment surface (one product or one internal use case is plenty)
- A technical lead on your side for daily cadence
- A signed mutual NDA before kickoff
- A signed pilot agreement (we provide a template)
What's in scope vs out¶
| In scope (pilot) | Out of scope (separate license required) |
|---|---|
| Public / open-weight model assessment + benchmark | Compression of your private/proprietary models (requires NDA + commercial pilot terms) |
| Methodology deep-dives under NDA | Per-device royalty / OEM licensing structure (separate term sheet, scoped per customer) |
| Bug fixes + integration help | Custom new compression methods (separate research engagement) |
| 60-day production pilot window | Permanent production deployment (recurring license required) |
Patent + commercial licensing path post-pilot¶
Both pilot tiers convert to one of three commercial license shapes (or you can walk away with the assessment report).
| License shape | Pricing posture | Best fit |
|---|---|---|
| Per-deployment SaaS | Starts at design-partner-friendly entry pricing; scales with deployment surface | Single product / single customer |
| Multi-deployment SaaS | Tiered annual; structured with the customer based on internal use-case count | Enterprise with multiple internal use cases |
| OEM / per-device royalty | Custom volume-tiered structure (annual license, per-device royalty, or hybrid); includes patent license | Chip vendors and device OEMs |
Patent license terms are bundled into the commercial license. Audit rights and standard commercial license terms apply, with redlines worked on a 2-3 week cycle. Specific bands are scoped per customer under NDA.
Get started¶
Email founder@sipsalabs.com with:
- Which tier (assessment or pilot) is most useful right now
- The architecture family / specific model you want benchmarked
- Your timing window
- Your preferred call structure (1-on-1 founder, technical team, exec sponsor)
We respond same-day during US business hours and target a kickoff call within 5 business days.
UltraCompress v0.1 alpha shipped April 2026. Pre-compressed reference models release throughout April–May 2026. Architectural compression support and uc compress ship in v0.2 (Q3 2026), gated on patent prosecution timing.
The CLI is Apache 2.0. The pre-compressed model artifacts are licensed separately (research-free or commercial-paid). The compression methodology is patent pending.
sipsalabs.com · github.com/sipsalabs/ultracompress · huggingface.co/sipsalabs
Implementation details are proprietary and patent-pending.