HIP-37: AI Cloud Platform Standard
Abstract
Hanzo Cloud (cloud.hanzo.ai) is the AI operations platform for the Hanzo ecosystem. It provides a unified control plane for managing LLM inference, agent orchestration, MCP tool invocation, API key lifecycle, usage metering, and credit-based billing -- all through a single dashboard and API.
Cloud does not run inference. It does not host models. It does not execute agent code. Instead, it orchestrates the services that do: LLM Gateway (HIP-0004) for inference routing, Agent SDK (HIP-0009) for multi-agent orchestration, MCP (HIP-0010) for tool invocation, and IAM (HIP-0026) for authentication and balance tracking. Cloud is the layer that turns a collection of AI infrastructure components into a product that teams can adopt in minutes.
Repository: github.com/hanzoai/cloud Production: https://cloud.hanzo.ai Port: 3000 (frontend), 8000 (API via gateway) Cluster: hanzo-k8s (24.199.76.156)
Motivation
The Problem
The Hanzo ecosystem contains powerful but disconnected infrastructure:
-
LLM Gateway (HIP-0004) proxies 100+ AI providers with intelligent routing, caching, and failover. But it exposes a raw API. There is no dashboard to see which models your team uses, how much they cost, or whether error rates are climbing.
-
Agent SDK (HIP-0009) enables multi-agent orchestration with shared state and tool use. But deploying agents, monitoring their runs, and debugging failures requires SSH access and log tailing. There is no managed runtime.
-
MCP tools (HIP-0010) provide 260+ capabilities for AI models. But discovering available tools, configuring them per-project, and auditing which tools an agent invoked requires reading source code.
-
IAM (HIP-0026) handles authentication and tracks credit balances. But there is no interface for users to see their spend breakdown, purchase credits, or set usage alerts.
-
API keys are the primary credential for programmatic access. But generating, scoping, rotating, and revoking keys requires direct database manipulation or API calls with admin tokens.
Each component works. None of them are usable by a product team without significant integration effort. The missing piece is an operations layer -- a control plane that stitches these components together behind a coherent UI and API.
What Teams Actually Need
When a team adopts AI infrastructure, they need answers to five questions on day one:
- "How do I get an API key?" -- Self-service key generation with clear scoping.
- "Which models can I use?" -- A catalog of available models with pricing and capability metadata.
- "How much am I spending?" -- Real-time usage dashboards with cost attribution per model, per user, per project.
- "What went wrong?" -- Error logs, latency histograms, and request traces for debugging production issues.
- "Who has access?" -- Team management with roles, so the intern cannot rotate the production API key.
Hanzo Cloud answers all five. It is the product layer that makes the infrastructure layer usable.
Design Philosophy
This section explains why each major architectural decision was made. The decisions are non-obvious, and understanding the rationale prevents future engineers from reversing them without understanding the consequences.
Why Build vs. Use Existing Platforms
Vercel AI SDK is a frontend toolkit. It helps you build chat UIs and streaming responses in Next.js. It does not manage API keys, track costs, route between providers, or run agents. It solves a different problem (building AI-powered frontends) at a different layer (the application).
AWS SageMaker is an ML training and deployment platform. It manages model training jobs, endpoints, and inference pipelines. It is designed for teams that train their own models and deploy them on AWS infrastructure. It does not proxy third-party providers, does not support MCP tools, and does not provide credit-based billing. Its pricing model (per-instance-hour) penalizes experimentation.
Replicate / Modal / Banana are inference platforms. They run models on GPUs and expose APIs. They do not aggregate multiple providers, do not support agents, and do not provide team management or billing dashboards.
OpenRouter is the closest analogue -- a multi-provider LLM proxy with usage tracking. But it is a pure proxy. It does not provide agent orchestration, MCP tool management, team workspaces, or self-hosted deployment. And it is a third-party service, which means sending all prompts through their infrastructure.
No existing platform unifies LLM proxy, agent orchestration, MCP tools, team management, and usage-based billing in a single experience. That is the gap Hanzo Cloud fills.
Why Separate Cloud and Console
This is the most frequently asked architectural question, and the answer is subtle.
Cloud (cloud.hanzo.ai) is the user-facing product. Teams sign up, create projects, generate API keys, view usage dashboards, manage team members, and purchase credits. The audience is developers and team leads who consume AI infrastructure. The permission model is project-scoped: a user sees only their projects, their keys, and their usage.
Console (console.hanzo.ai) is the operator dashboard. Hanzo staff use it to manage organizations, configure quotas, view cross-tenant metrics, and handle billing disputes. The audience is Hanzo operations engineers. The permission model is org-scoped or global: an operator sees all organizations, all users, and all usage.
The separation exists because the two audiences have fundamentally different threat models:
- A Cloud user who gains Console access can see other organizations' data, modify quotas, and disable accounts. This is a catastrophic breach.
- A Console operator who only has Cloud access cannot perform their job -- they need cross-tenant visibility.
Combining both into one application with feature flags is possible but fragile. A single misconfigured RBAC rule exposes admin functionality to end users. Two separate applications with separate deployments, separate domains, and separate authentication scopes make the boundary enforceable at the network level.
| Dimension | Cloud (cloud.hanzo.ai) | Console (console.hanzo.ai) |
|---|---|---|
| Audience | Developers, team leads | Hanzo operators |
| Scope | Per-project | Per-org / global |
| Auth | OAuth2 user tokens | OAuth2 admin tokens |
| Data access | Own projects only | All tenants |
| Key operations | Generate, rotate, revoke own keys | View all keys, set quotas |
| Billing | View own usage, purchase credits | View all usage, issue refunds |
Why Credit-Based Billing
AI usage is fundamentally variable. A developer testing a new prompt might make 5 LLM calls in a day. A production pipeline might make 50,000. Per-seat pricing (like most SaaS) charges both the same amount, which is unfair and unpredictable for the provider.
Per-request pricing (like AWS Lambda) is granular but unpredictable for the user. A bug that retries requests in a loop can generate a surprise bill in minutes.
Credits provide a middle ground:
- Prepaid: Users purchase credits in advance. There are no surprise bills because spend cannot exceed the balance.
- Granular: Each LLM call, agent run, or tool invocation deducts a precise amount from the balance based on actual token consumption and model pricing.
- Transparent: The dashboard shows real-time balance and per-request cost breakdown. Users always know where they stand.
- Unified: Credits work across all services -- LLM inference, agent runs, MCP tools, storage. One balance, one currency.
Credits are tracked in IAM (HIP-0026) as a balance field on the user entity. This means the same database query that validates a user's authentication token also returns their current balance. The LLM Gateway checks the balance before proxying a request and deducts the cost after completion. No separate billing service roundtrip is required.
The credit unit is USD cents. 1 credit = $0.01 USD. This keeps the arithmetic simple and the display familiar.
How Cloud Orchestrates the Stack
Cloud is a control plane, not a data plane. It never touches prompts, completions, or tool outputs. Here is the request flow for a typical LLM API call:
Developer App
|
| (1) POST /v1/chat/completions
| Header: Authorization: Bearer hk_live_abc123
|
v
Hanzo Cloud API
|
| (2) Validate API key (lookup in Cloud DB)
| (3) Resolve user from key -> check IAM balance
| (4) If balance sufficient, proxy to LLM Gateway
|
v
LLM Gateway (HIP-0004)
|
| (5) Route to optimal provider (OpenAI, Anthropic, etc.)
| (6) Stream response back
|
v
Hanzo Cloud API
|
| (7) Record usage: tokens, cost, latency, model
| (8) Deduct credits from IAM balance
| (9) Emit webhook if usage alert threshold crossed
|
v
Developer App
|
| (10) Receives streamed completion
Steps 2-3 and 7-9 are Cloud's responsibility. Steps 5-6 are Gateway's. The developer's application talks to Cloud's API endpoint, which looks identical to OpenAI's API. The only difference is the API key prefix (hk_ instead of sk-).
For agent orchestration, the flow is similar but Cloud delegates to Agent SDK (HIP-0009) instead of Gateway:
Cloud API -> Agent Runtime -> [LLM Gateway + MCP Tools] -> Cloud API (record usage)
For MCP tool invocations within agent runs, Cloud records each tool call as a separate usage event, enabling per-tool cost attribution.
Specification
API Key Management
API keys are the primary credential for programmatic access to Hanzo Cloud services.
Key Format
hk_live_<32 random bytes, base62 encoded> # Production key
hk_test_<32 random bytes, base62 encoded> # Test key (rate limited, no billing)
The prefix encodes the environment. Parsers can distinguish live keys from test keys without a database lookup.
Key Scoping
Each key has an associated scope that restricts which operations it can perform:
interface APIKeyScope {
// Which models the key can access
models: string[] | "*"; // e.g., ["gpt-4", "claude-3-opus"] or "*"
// Which services the key can invoke
services: ServiceScope[]; // e.g., ["llm", "agents", "mcp"]
// Rate limits (requests per minute)
rateLimit: number; // Default: 60 rpm
// Maximum spend per billing period (USD cents)
spendLimit: number | null; // null = no limit (uses account balance)
// IP allowlist (CIDR notation)
allowedIPs: string[] | null; // null = any IP
// Expiration
expiresAt: Date | null; // null = no expiration
}
type ServiceScope = "llm" | "agents" | "mcp" | "embeddings" | "images" | "audio";
Key Lifecycle API
POST /v1/api-keys # Create key
GET /v1/api-keys # List keys (masked)
GET /v1/api-keys/:id # Get key metadata
PATCH /v1/api-keys/:id # Update scope/limits
DELETE /v1/api-keys/:id # Revoke key
POST /v1/api-keys/:id/rotate # Rotate (new secret, same scope)
Key creation returns the full key exactly once. Subsequent reads return only the last 4 characters for identification. This follows the same security pattern as Stripe and OpenAI.
Usage Tracking
Every request through Cloud is recorded as a usage event:
interface UsageEvent {
id: string;
timestamp: Date;
// Identity
orgId: string;
projectId: string;
userId: string;
apiKeyId: string;
// Request
service: ServiceScope;
model: string;
provider: string; // Which provider actually served the request
// Tokens
promptTokens: number;
completionTokens: number;
totalTokens: number;
// Cost (USD cents)
cost: number;
// Performance
latencyMs: number;
ttftMs: number; // Time to first token (streaming)
status: "success" | "error" | "timeout" | "rate_limited";
// Agent-specific
agentId?: string;
agentRunId?: string;
toolCalls?: ToolCallEvent[];
// Request metadata (no PII, no prompt content)
metadata: Record<string, string>;
}
interface ToolCallEvent {
toolName: string;
toolProvider: string; // MCP server name
durationMs: number;
status: "success" | "error";
}
Usage events are written to PostgreSQL with TimescaleDB hypertables for efficient time-series aggregation. Events older than 90 days are compressed and archived to object storage (MinIO / S3).
Usage Dashboard API
GET /v1/usage/summary # Aggregate usage for current billing period
GET /v1/usage/timeseries # Time-bucketed usage (hourly/daily/monthly)
GET /v1/usage/by-model # Cost breakdown per model
GET /v1/usage/by-user # Cost breakdown per team member
GET /v1/usage/by-key # Cost breakdown per API key
GET /v1/usage/events # Paginated raw events
All endpoints accept start, end, granularity, and filter query parameters.
Model Catalog
Cloud maintains a catalog of available models with pricing, capability metadata, and routing preferences:
interface ModelEntry {
id: string; // e.g., "gpt-4-turbo"
provider: string; // e.g., "openai"
displayName: string; // e.g., "GPT-4 Turbo"
description: string;
// Capabilities
capabilities: {
chat: boolean;
completion: boolean;
embedding: boolean;
imageGeneration: boolean;
imageAnalysis: boolean;
audio: boolean;
functionCalling: boolean;
streaming: boolean;
jsonMode: boolean;
};
// Context
contextWindow: number; // Max tokens
maxOutputTokens: number;
// Pricing (per 1M tokens, USD cents)
pricing: {
promptPer1M: number;
completionPer1M: number;
embeddingPer1M?: number;
imagePer1K?: number; // Per 1K images
};
// Availability
status: "available" | "degraded" | "unavailable";
regions: string[];
}
GET /v1/models # List all available models
GET /v1/models/:id # Get model details and current status
Team Management
Cloud provides project-scoped team management with role-based access:
interface ProjectMember {
userId: string;
email: string;
role: ProjectRole;
joinedAt: Date;
invitedBy: string;
}
type ProjectRole = "owner" | "admin" | "developer" | "viewer";
| Permission | Owner | Admin | Developer | Viewer |
|---|---|---|---|---|
| View usage dashboard | Yes | Yes | Yes | Yes |
| Make API calls | Yes | Yes | Yes | No |
| Create/revoke API keys | Yes | Yes | Own only | No |
| Invite team members | Yes | Yes | No | No |
| Change member roles | Yes | Yes | No | No |
| Purchase credits | Yes | Yes | No | No |
| Delete project | Yes | No | No | No |
| Transfer ownership | Yes | No | No | No |
GET /v1/projects/:id/members # List members
POST /v1/projects/:id/members # Invite member
PATCH /v1/projects/:id/members/:uid # Change role
DELETE /v1/projects/:id/members/:uid # Remove member
Billing
Credit Purchase
POST /v1/billing/credits/purchase
{
"amount": 5000, // USD cents ($50.00)
"paymentMethod": "pm_..."
}
Payment processing is handled by Commerce (HIP-0018). Cloud sends the purchase request, Commerce processes the payment, and on success, Commerce calls IAM to increment the user's balance. Cloud never touches payment card data.
Usage Alerts
Users configure spend thresholds that trigger webhook notifications:
interface UsageAlert {
id: string;
projectId: string;
type: "spend" | "requests" | "errors";
threshold: number; // USD cents for spend, count for requests/errors
period: "daily" | "weekly" | "monthly";
channels: AlertChannel[];
enabled: boolean;
}
type AlertChannel =
| { type: "webhook"; url: string }
| { type: "email"; address: string }
| { type: "slack"; webhookUrl: string };
GET /v1/billing/alerts # List alerts
POST /v1/billing/alerts # Create alert
PATCH /v1/billing/alerts/:id # Update alert
DELETE /v1/billing/alerts/:id # Delete alert
MCP Tool Marketplace
Cloud provides a catalog of MCP tools (HIP-0010) that agents and applications can invoke:
GET /v1/mcp/tools # List available MCP tools
GET /v1/mcp/tools/:id # Tool details, schema, pricing
POST /v1/mcp/tools/:id/invoke # Invoke a tool (standalone)
Each tool invocation is recorded as a usage event. Tools that call external APIs (e.g., web search, code execution) may have additional per-invocation costs that are deducted from the user's credit balance.
Tools are organized by MCP server:
interface MCPToolEntry {
id: string;
server: string; // MCP server name (e.g., "hanzo-browser")
name: string; // Tool name (e.g., "navigate")
description: string;
inputSchema: JSONSchema;
outputSchema: JSONSchema;
costPerInvocation: number; // USD cents (0 for free tools)
category: string; // e.g., "browser", "filesystem", "search"
}
Agent Deployment
Cloud provides managed deployment for agents built with Agent SDK (HIP-0009):
POST /v1/agents # Deploy an agent
GET /v1/agents # List deployed agents
GET /v1/agents/:id # Agent details and status
PATCH /v1/agents/:id # Update agent configuration
DELETE /v1/agents/:id # Undeploy agent
POST /v1/agents/:id/runs # Start an agent run
GET /v1/agents/:id/runs # List runs
GET /v1/agents/:id/runs/:runId # Run details with step-by-step trace
POST /v1/agents/:id/runs/:runId/cancel # Cancel a running agent
Agent runs are traced at every step -- each LLM call, each tool invocation, each state transition is recorded. The run detail endpoint returns a full execution trace for debugging:
interface AgentRun {
id: string;
agentId: string;
status: "running" | "completed" | "failed" | "cancelled";
startedAt: Date;
completedAt: Date | null;
steps: AgentStep[];
totalCost: number; // USD cents
totalTokens: number;
error?: string;
}
interface AgentStep {
index: number;
type: "think" | "act" | "observe" | "tool_call" | "llm_call";
timestamp: Date;
durationMs: number;
input: string; // Summarized, no raw prompts
output: string; // Summarized, no raw completions
model?: string;
tokens?: number;
cost?: number;
toolName?: string;
}
Webhook Notifications
Cloud emits webhooks for key events. Consumers register endpoints and select event types:
POST /v1/webhooks # Register webhook endpoint
GET /v1/webhooks # List registered webhooks
PATCH /v1/webhooks/:id # Update webhook
DELETE /v1/webhooks/:id # Remove webhook
GET /v1/webhooks/:id/deliveries # Delivery log with payloads and responses
POST /v1/webhooks/:id/test # Send test event
Supported event types:
usage.threshold.reached # Spend/request/error alert triggered
api_key.created # New API key generated
api_key.rotated # API key rotated
api_key.revoked # API key revoked
agent.run.completed # Agent run finished
agent.run.failed # Agent run failed
credits.low # Balance below configured threshold
credits.depleted # Balance reached zero
member.invited # Team member invited
member.removed # Team member removed
Webhook payloads include an HMAC-SHA256 signature header (X-Webhook-Signature) computed with a per-webhook secret, enabling receivers to verify authenticity.
Implementation
Technology Stack
| Component | Technology | Rationale |
|---|---|---|
| Frontend | Next.js 14, React 18, Tailwind CSS | SSR for SEO, RSC for performance, Tailwind for consistency with Hanzo UI system |
| API | Node.js, Express | Same runtime as LLM Gateway; shared middleware and SDK code |
| Database | PostgreSQL + TimescaleDB | Relational for entities, time-series hypertables for usage events |
| Cache | Redis (HIP-0028 KV) | Session state, rate limit counters, real-time dashboard data |
| Auth | OAuth2 via IAM (HIP-0026) | Single sign-on across all Hanzo services |
| Payments | Commerce (HIP-0018) | PCI-compliant payment processing |
| Queue | Redis Streams | Async usage event processing, webhook delivery |
Database Schema (Core Tables)
-- Projects
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id TEXT NOT NULL, -- IAM organization
name TEXT NOT NULL,
slug TEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
-- API Keys
CREATE TABLE api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id),
user_id TEXT NOT NULL, -- IAM user who created it
name TEXT NOT NULL,
key_hash TEXT NOT NULL UNIQUE, -- SHA-256 of full key
key_prefix TEXT NOT NULL, -- "hk_live_" or "hk_test_"
key_suffix TEXT NOT NULL, -- Last 4 chars for display
scope JSONB NOT NULL DEFAULT '{}',
expires_at TIMESTAMPTZ,
revoked_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Usage Events (TimescaleDB hypertable)
CREATE TABLE usage_events (
id UUID DEFAULT gen_random_uuid(),
time TIMESTAMPTZ NOT NULL,
org_id TEXT NOT NULL,
project_id UUID NOT NULL,
user_id TEXT NOT NULL,
api_key_id UUID NOT NULL,
service TEXT NOT NULL,
model TEXT NOT NULL,
provider TEXT NOT NULL,
prompt_tokens INTEGER NOT NULL DEFAULT 0,
completion_tokens INTEGER NOT NULL DEFAULT 0,
cost_cents INTEGER NOT NULL DEFAULT 0,
latency_ms INTEGER NOT NULL DEFAULT 0,
ttft_ms INTEGER,
status TEXT NOT NULL,
agent_id UUID,
agent_run_id UUID,
metadata JSONB DEFAULT '{}',
PRIMARY KEY (id, time)
);
SELECT create_hypertable('usage_events', 'time');
-- Compression policy: compress chunks older than 7 days
SELECT add_compression_policy('usage_events', INTERVAL '7 days');
-- Retention policy: move to cold storage after 90 days
SELECT add_retention_policy('usage_events', INTERVAL '90 days');
-- Continuous aggregates for dashboard queries
CREATE MATERIALIZED VIEW usage_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
org_id,
project_id,
model,
COUNT(*) AS request_count,
SUM(prompt_tokens) AS total_prompt_tokens,
SUM(completion_tokens) AS total_completion_tokens,
SUM(cost_cents) AS total_cost_cents,
AVG(latency_ms) AS avg_latency_ms,
COUNT(*) FILTER (WHERE status = 'error') AS error_count
FROM usage_events
GROUP BY bucket, org_id, project_id, model;
Deployment Architecture
Internet
|
[Traefik]
/ \
cloud.hanzo.ai api.hanzo.ai
| |
[Cloud Frontend] [Cloud API]
(Next.js SSR) (Express)
| |
+-------+-------+
|
[PostgreSQL + TimescaleDB]
[Redis]
|
+------------+------------+
| | |
[LLM Gateway] [Agent Runtime] [MCP Servers]
(HIP-0004) (HIP-0009) (HIP-0010)
|
[AI Providers]
OpenAI, Anthropic, Together, ...
All services run on the hanzo-k8s cluster. Cloud Frontend and Cloud API are separate Kubernetes deployments with independent scaling. The API is the only component that talks to downstream services -- the frontend communicates exclusively through the API.
Request Authentication Flow
1. Client sends request with Authorization: Bearer hk_live_abc123
2. Cloud API hashes the key: SHA-256("hk_live_abc123")
3. Lookup hash in api_keys table -> get project_id, user_id, scope
4. Check key not revoked, not expired
5. Check scope allows the requested service and model
6. Resolve user_id -> IAM balance check (cached in Redis, TTL 10s)
7. If balance > estimated cost, proceed
8. Proxy request to downstream service
9. On completion, record usage event and deduct actual cost from IAM
The balance check at step 6 uses an optimistic strategy: the cached balance is checked to avoid a round-trip to IAM on every request. The actual deduction at step 9 is authoritative. If the cache is stale and the user's balance has actually been exhausted, the deduction will fail and the request is still served (to avoid degrading user experience for a billing edge case) but subsequent requests will be blocked after the cache refreshes.
Security
API Key Security
- Keys are never stored in plaintext. Only the SHA-256 hash is persisted.
- The full key is returned exactly once at creation time. It cannot be retrieved afterward.
- Key rotation generates a new secret while preserving the key ID. The old key is revoked after a configurable grace period (default: 24 hours) to allow clients to transition.
- Revoked keys return HTTP 401 immediately. There is no grace period for revocation.
Scope Enforcement
API keys can be scoped to specific models, services, IP ranges, and spend limits. Scope is enforced at the Cloud API layer before any request reaches a downstream service. An over-scoped key (e.g., one that allows all models when the project only uses GPT-4) is a security risk. The dashboard displays warnings when keys have broader permissions than their actual usage pattern suggests.
Rate Limiting
Rate limiting is enforced per API key using Redis sliding window counters:
Key: ratelimit:{api_key_id}:{minute_bucket}
TTL: 120 seconds
Increment on each request
Reject with HTTP 429 when count > key.scope.rateLimit
The default rate limit is 60 requests per minute. Enterprise projects can request higher limits.
Audit Logging
All mutating operations on API keys, team membership, billing, and webhooks are logged to an append-only audit table:
CREATE TABLE audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
actor_id TEXT NOT NULL, -- IAM user ID
actor_email TEXT NOT NULL,
action TEXT NOT NULL, -- e.g., "api_key.create", "member.invite"
resource TEXT NOT NULL, -- e.g., "api_key:uuid", "project:uuid"
details JSONB NOT NULL, -- Action-specific metadata
ip_address INET NOT NULL,
user_agent TEXT
);
Audit logs are immutable. They cannot be modified or deleted through any API. Retention is indefinite for compliance purposes.
Data Isolation
Cloud never stores, logs, or inspects prompt content or completion content. Usage events record token counts and costs, not the text of requests or responses. This is a deliberate architectural constraint:
- Privacy: Prompts may contain PII, trade secrets, or sensitive instructions. Cloud has no business reason to retain them.
- Compliance: Storing prompt content would subject Cloud to data residency regulations in every jurisdiction where users operate. Storing only metadata avoids this.
- Performance: Prompt content can be megabytes per request. Storing it would require orders of magnitude more storage and make the usage database unwieldy.
If a team needs request logging for debugging, they enable it in the LLM Gateway (HIP-0004) with their own storage destination. Cloud is not in that path.
SOC 2 Compliance
For enterprise customers, Cloud maintains SOC 2 Type II compliance:
- Access controls: All internal access requires MFA and is logged.
- Change management: All infrastructure changes go through CI/CD with approval gates.
- Monitoring: Real-time alerting on anomalous access patterns.
- Encryption: All data encrypted at rest (AES-256) and in transit (TLS 1.3).
- Vendor management: Third-party AI providers are assessed for security posture.
OpenAI API Compatibility
Cloud exposes an OpenAI-compatible API so that existing applications can switch to Hanzo by changing only the base URL and API key:
# Before (OpenAI direct)
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# After (Hanzo Cloud)
from openai import OpenAI
client = OpenAI(
api_key="hk_live_...",
base_url="https://cloud.hanzo.ai/v1"
)
# Same code works unchanged
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Hello"}]
)
This compatibility is achieved by implementing the OpenAI API spec at the Cloud API layer and translating requests to the LLM Gateway's internal format. The Gateway (HIP-0004) handles provider-specific translation.
Supported OpenAI-compatible endpoints:
POST /v1/chat/completions # Chat completions (streaming + non-streaming)
POST /v1/completions # Legacy completions
POST /v1/embeddings # Text embeddings
POST /v1/images/generations # Image generation
POST /v1/audio/transcriptions # Speech to text
POST /v1/audio/translations # Audio translation
GET /v1/models # Available models
SDK Integration
Python
from hanzo import Hanzo
client = Hanzo(api_key="hk_live_...")
# LLM inference
response = client.chat.completions.create(
model="claude-3-opus",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Usage check
usage = client.usage.summary()
print(f"This month: ${usage.total_cost / 100:.2f}")
# API key management
key = client.api_keys.create(name="production", scope={"models": ["gpt-4"]})
TypeScript
import { Hanzo } from '@hanzoai/sdk'
const client = new Hanzo({ apiKey: 'hk_live_...' })
// LLM inference
const response = await client.chat.completions.create({
model: 'claude-3-opus',
messages: [{ role: 'user', content: 'Explain quantum computing' }],
})
// Usage
const usage = await client.usage.summary()
console.log(`This month: $${(usage.totalCost / 100).toFixed(2)}`)
Go
import "github.com/hanzoai/go-sdk"
client := hanzo.NewClient("hk_live_...")
resp, err := client.Chat.Completions.Create(ctx, hanzo.ChatCompletionRequest{
Model: "claude-3-opus",
Messages: []hanzo.Message{{Role: "user", Content: "Explain quantum computing"}},
})
Migration Path
For teams currently using OpenAI, Anthropic, or other providers directly:
- Sign up at cloud.hanzo.ai and create a project.
- Generate an API key with appropriate scoping.
- Change two lines in your application: the base URL and API key.
- Monitor usage through the dashboard to understand cost and performance.
- Optimize by enabling model routing rules (e.g., route simple queries to cheaper models).
No code changes beyond the base URL and API key are required for basic functionality. Advanced features (team management, usage alerts, agent deployment) are additive.
Backwards Compatibility
Cloud maintains backwards compatibility with all previous API versions through URL versioning (/v1/, /v2/). Breaking changes are introduced only in new major versions. The previous version remains supported for a minimum of 12 months after a new version is released.
References
- HIP-0004: LLM Gateway
- HIP-0009: Agent SDK
- HIP-0010: MCP Integration
- HIP-0018: Payment Processing
- HIP-0026: Identity & Access Management
- HIP-0027: KMS Standard
- OpenAI API Reference
- TimescaleDB Documentation
Copyright
Copyright 2025 Hanzo AI, Inc. All rights reserved.