Developer Documentation

Integration Guide

Get started with ThriftyAI in minutes. Reduce costs by 80% and speed up responses by 10x.

Security Best Practices

Critical security guidelines to protect your API keys

Never expose keys in client-side code

Always use server-side environments

Store in environment variables

Use .env files, never commit them

Add .env to .gitignore

Prevent accidental Git exposure

Your keys, your control

We never store provider keys

Quick Start

Choose your AI provider and get the integration code

import requests

response = requests.post(
    "https://thriftyai.app/api/gateway",
    headers={
        "Content-Type": "application/json",
        "x-api-key": "YOUR_THRIFTYAI_API_KEY",
        "x-provider-key": "YOUR_OPENAI_API_KEY",
        "x-provider": "openai",
        # Optional: Track per-user for high-traffic apps
        # "x-end-user-id": "user_12345",
        # Optional: Add fallback support
        # "x-fallback-provider": "google",
        # "x-fallback-model": "gemini-1.5-flash",
        # "x-fallback-key": "YOUR_GEMINI_API_KEY",
        # Optional: Set cache TTL (seconds)
        # "x-cache-ttl": "300",
    },
    json={
        "model": "gpt-4.1-nano",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ]
    }
)

result = response.json()
print(result)

Automatic Fallback

Prevent downtime! If primary provider fails, automatically switch to backup.

x-fallback-providerBackup provider name

x-fallback-modelModel to use on fallback

x-fallback-keyFallback provider API key

Stale-While-Revalidate Caching

Control cache freshness with zero added latency. Users get instant stale responses while fresh data fetches in background.

x-cache-ttlCache TTL in seconds (e.g., 300 = 5 minutes)

Header overrides dashboard default setting

How Stale Cache Works:

Fresh cache (within TTL): Instant response, $0.00 cost
Stale cache (TTL expired): Instant stale response to user ($0.00), background refresh starts
Next request: Fresh cache again, instant response

Background refresh cost is tracked separately in logs with is_background_refresh: true

Cache Management

Purge cache entries when you need fresh responses

When to purge cache:

After updating your app's training data or knowledge base
When you want to force fresh responses for specific prompts
During development and testing phases
When debugging cache behavior

Remove a specific cache entry using its log ID from your dashboard.

import requests

response = requests.post(
    "https://thriftyai.app/api/cache/purge",
    headers={
        "Content-Type": "application/json",
        "x-api-key": "YOUR_THRIFTYAI_API_KEY"
    },
    json={"log_id": "LOG_ID_FROM_DASHBOARD"}
)

result = response.json()
print(result)

Webhooks Integration

Receive real-time notifications for requests, cache hits, errors, and more

Available Events:

request.completedrequest.failedcache.hitcache.misscache.staleerror.quotaerror.rate_limiterror.providerfallback.triggered

Create a webhook to receive notifications. Save the returned secret for signature verification.

import requests

response = requests.post(
    "https://thriftyai.app/api/webhooks",
    headers={
        "Content-Type": "application/json",
        "x-api-key": "YOUR_THRIFTYAI_API_KEY"
    },
    json={
        "name": "My Analytics Webhook",
        "url": "https://your-app.com/webhooks/thriftyai",
        "events": ["request.completed", "cache.hit", "cache.miss"]
    }
)

result = response.json()
print(f"Webhook ID: {result['webhook']['id']}")
print(f"Secret: {result['webhook']['secret']}")

Webhook Features:

HMAC-SHA256 signature verification for security
Automatic retry with exponential backoff (max 5 attempts)
Delivery tracking and debugging in dashboard
Test webhook functionality before going live
9 different event types for comprehensive monitoring
Background refresh operations are NOT sent to webhooks (only user requests)

Cache Hit Payload Example:

{
  "event": "cache.hit",
  "timestamp": "2024-12-02T10:30:00Z",
  "data": {
    "api_key_id": "key_abc123",
    "api_key_name": "Production API Key",
    "cache_type": "exact",
    "similarity": 1.0,
    "provider": "openai",
    "model": "gpt-4o-mini",
    "cached_at": "2024-12-02T10:25:00Z",
    "is_stale": false
  }
}

is_stale field indicates if cached response was expired but served for speed. When true, cache is being refreshed in background (not sent as webhook).

Rate Limiting & Response Headers

Understand rate limits for each plan and how to handle them in your application

Rate Limits by Plan:

Hobby (Free)10 requests / 10 seconds

Pro ($29/month)100 requests / 10 seconds (10x faster!)

Enterprise (Custom)Custom limits up to 1000/10s

Response Headers:

Every gateway response includes rate limit information in the headers:

X-RateLimit-Limit:100# Total requests allowed in window

X-RateLimit-Remaining:95# Requests remaining

X-RateLimit-Reset:2024-12-03T10:30:00Z# When limit resets

Handling Rate Limits (429 Errors):

When you exceed your rate limit, you'll receive a 429 status code with retry information:

import requests
import time

response = requests.post(
    "https://thriftyai.app/api/gateway",
    headers={
        "x-api-key": "YOUR_THRIFTYAI_API_KEY",
        "x-provider-key": "YOUR_PROVIDER_KEY",
        "x-provider": "openai"
    },
    json={"model": "gpt-4o-mini", "messages": [...]}
)

if response.status_code == 429:
    retry_after = int(response.headers.get('Retry-After', 10))
    print(f"Rate limited! Retrying in {retry_after} seconds...")
    time.sleep(retry_after)
    # Retry request
else:
    result = response.json()
    # Process result

Need Higher Limits?

Upgrade to Pro for 10x faster rate limits (100 req/10s) or contact us for Enterprise custom limits up to 1000 req/10s.

Why ThriftyAI?

50-80% Cost Reduction

Cache hits are free, pay only for misses

10x Faster Responses

~100ms cached vs 1-2s provider latency

Cross-Provider Cache

Similar prompts hit same cache across all providers

Your Keys, Your Control

We never store provider keys, you keep full control

Demo Playground

Try ThriftyAI in action with our interactive chat application

Live Demo Application

See ThriftyAI integration in a real chat application with live webhook tracking and debug monitoring

Open Source & Ready to Deploy

Clone the repository and integrate it into your own projects. Full source code available.

Features in the Demo:

Real-time chat with AI providers
Live webhook event tracking
Debug panel for requests
Cache performance monitoring
Cost savings visualization
Easy deployment setup