API + OpenCompress

Change your base URL. Keep everything else.Same SDK, same code, same models — 40-60% cheaper.

1
Get an API Key

Sign up and grab your sk-occ-* key from the dashboard. Free $10 credit, no card required.

Go to Dashboard
2
Change Your Base URL

Point your OpenAI SDK at our endpoint — that's the only change.

https://www.opencompress.ai/api/v1
3
You're Done

Every request is compressed, routed to the upstream LLM, and billed at the reduced rate.

Quick Start

Two lines change. Everything else stays the same.

from openai import OpenAI

client = OpenAI(
    base_url="https://www.opencompress.ai/api/v1",
    api_key="sk-occ-your-key-here",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Bring Your Own Key

Already have an API key from OpenAI, Anthropic, or another provider? Pass it via headers and we'll route to your account — you only pay for the compression savings.

from openai import OpenAI

client = OpenAI(
    base_url="https://www.opencompress.ai/api/v1",
    api_key="sk-occ-your-key-here",
    default_headers={
        "X-Upstream-Key": "sk-your-openai-key",
        "X-Upstream-Base-Url": "https://api.openai.com/v1",
    },
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

How it works: Your upstream key calls the LLM directly. We compress the prompt first so you use fewer tokens. X-Upstream-Base-Url tells us where to forward.

Streaming

Set stream=True and get SSE exactly like the OpenAI API. Compression happens before forwarding — streaming latency is unaffected.

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Write a haiku"}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Why Use the API

Drop-in replacement for any OpenAI-compatible workflow.

OpenAI-Compatible

Drop-in replacement. Same SDK, same endpoints, same response format.

40-60% Cheaper

Prompts are compressed before forwarding. You keep 80% of the savings.

All Models

ChatGPT, Claude, Gemini, DeepSeek, Grok, Kimi — all via one endpoint.

Streaming

Full SSE streaming. stream=true works like the OpenAI API.

Start Saving

Get your API key and start saving on every LLM call in under a minute.