Now supporting goff-code-1.0

An AI layer designed
for real usage.

Goff sits between your application and model providers, handling routing, usage tracking, and cost control so you can focus on shipping.

1import { createOpenAI } from 'ai';

3export const goffy = createOpenAI({

4apiKey: process.env.GOFFY_API_KEY!,

5baseURL: 'https://api.goffy.ai/v1',

6});

LINEARREPLICATEVERCELSCALEANTHROPIC

Platform

Infrastructure that
stays out of your way.

One SDK. Any model. Full visibility into cost and latency. Built for teams shipping AI to production.

Multi-model access

OpenAI, Anthropic, Llama, Mistral — one endpoint. Switch providers with a config change, not a refactor.

Global edge routing

Requests hit the nearest region. Typical TTFT under 200ms. P99 latency tracked per model.

Per-user rate limits

Set token budgets per user, per key, per project. Hard limits, soft limits, alerts — your call.

Native streaming

SSE out of the box. Consistent chunk format across providers. Graceful error propagation.

Workflow

Integrate once.
Route anywhere.

Drop-in SDK

OpenAI-compatible interface. Swap your base URL. No code changes required.

Model routing

Specify model in request. Goff routes to the provider with the lowest latency and cost.

Automatic failover

Rate limits and outages handled automatically. Your requests never drop.

API

POST /v1/chat/completions

Active

Model A

goff-code-1.0

Response time: 142ms

Pricing Philosophy

Simple pricing.
No surprises.

Pay for what you use. Token-level tracking. Full cost visibility from day one.

Developer

$29/mo

For side projects and early-stage products.

Up to 1M tokens/mo

Access to goff-code-1.0

Standard support

Single project

Pro

$149/mo

For teams shipping AI features to production.

Up to 10M tokens/mo

Access to goff-code-1.0

Priority support

Unlimited projects

Custom rate limits

Enterprise

Custom

For organizations with compliance and scale requirements.

Unlimited tokens

Access to Octo 1.0

24/7 SLA support

SOC2 Compliance

Dedicated VPC

SOC 2 compliant infrastructure. 99.9% uptime SLA.

Ready to optimize your AI infrastructure?

Stop wrestling with provider specific quirks. Get a unified, high-performance gateway for your entire AI stack today.

14ms

Router overhead

99.99%

Uptime SLA

25+

Models supported

∞

Scalability

An AI layer designed for real usage.

Infrastructure thatstays out of your way.