HIP-53: Visor Monitoring & Supervision Standard

Abstract

This proposal defines the monitoring, visualization, and supervision standard for the Hanzo ecosystem. Hanzo Visor provides real-time dashboards, AI-specific metrics, anomaly detection, alert management, SLA tracking, and cost attribution across all Hanzo services.

Visor is the presentation and intelligence layer that sits on top of the observability stack (HIP-0031, Zap) and the analytics platform (HIP-0017, Insights). Zap collects metrics, traces, and logs. Insights collects product analytics. Visor consumes both data streams and turns them into actionable dashboards, intelligent alerts, and cost reports. It is the single pane of glass through which operators, engineers, and business stakeholders understand the health, performance, and economics of the Hanzo platform.

Repository: github.com/hanzoai/visor Docker: ghcr.io/hanzoai/visor:latest API Port: 8053 Grafana Port: 3053 Production: visor.hanzo.ai on hanzo-k8s cluster (24.199.76.156)

Motivation

The Gap Between Collection and Understanding

HIP-0031 (Zap) solved the data collection problem. Prometheus scrapes metrics every 15 seconds. ClickHouse stores structured logs. OpenTelemetry traces connect requests across services. But raw data is not understanding.

An engineer staring at Prometheus's built-in expression browser cannot answer: "Is the LLM Gateway healthy right now?" They can write a PromQL query, interpret the result, compare it to yesterday, and decide. That takes minutes. A well-designed dashboard answers the question in two seconds.

The gap is wider for AI workloads. Traditional infrastructure monitoring measures CPU, memory, and request latency. AI services have additional dimensions that matter: tokens per second, cost per request, model error rates, cache hit ratios, time-to-first-token, and prompt/completion token ratios. No off-the-shelf Grafana dashboard covers these.

Why Monitoring AI Is Different

A typical web service processes a request in 50-200ms. The response size is predictable. Failure modes are well-understood: timeouts, 5xx errors, database connection exhaustion.

An LLM request takes 1-120 seconds. Response size varies by orders of magnitude (10 tokens to 4,096 tokens). Cost per request varies by 1000x depending on the model ($0.0001 for a small Zen model, $0.10 for a frontier model with long context). Failure modes include provider rate limits, content filtering, context length exceeded, model degradation (the model returns garbage but HTTP 200), and cost runaway (a bug sends expensive requests in a loop).

These failure modes require AI-specific metrics and AI-specific alerting logic. A traditional "error rate > 5%" alert misses model degradation entirely because the HTTP response is 200. A cost-aware alert that fires when spend exceeds $X/hour catches it immediately.

Why a Separate Service From O11y

The temptation is to add dashboards and alerting directly to the Zap sidecar (HIP-0031). We deliberately separate them for three reasons:

Separation of concerns: Zap is a data plane component. It runs as a sidecar in every pod, must be tiny (~15MB RSS), and must never become a bottleneck. Adding Grafana, alert evaluation, cost calculation, and anomaly detection to the sidecar would bloat it and create failure coupling.
Different scaling profiles: Zap scales with the number of pods (one sidecar per pod). Visor scales with the number of dashboards, alerts, and users viewing them. These are independent dimensions.
Different update cadences: Dashboards and alert rules change frequently as we add services and refine thresholds. The sidecar binary changes rarely. Coupling them means redeploying every sidecar to update a dashboard.

Visor = control plane. Zap = data plane. The same separation that exists between Kubernetes's API server (control plane) and kubelet (data plane).

Cost Attribution: The Business Case

At Hanzo's scale, LLM API costs are the largest operational expense. In January 2026, LLM provider spend exceeded compute infrastructure spend for the first time. Without per-org, per-user, per-model cost attribution, we cannot:

Bill customers accurately for API usage
Identify which models offer the best cost/quality ratio
Detect cost anomalies (a misconfigured agent calling GPT-4 in a loop)
Plan capacity and budget for the next quarter

Visor provides this attribution by joining LLM Gateway metrics (HIP-0031 hanzo_llm_* counters) with pricing data from provider rate cards, broken down by organization, user, model, and endpoint.

Design Philosophy

Why Grafana Over a Custom Dashboard

We evaluated three approaches for the visualization layer:

Approach	Build Time	Maintenance	Ecosystem	Extensibility
Custom React app	3-6 months	High (every panel custom)	None	Unlimited but costly
Grafana (open-source)	2-4 weeks	Low (community maintains)	100+ data source plugins	Custom plugins when needed
Datadog/New Relic SaaS	0	Zero (vendor manages)	Vendor-locked	Limited

Custom React app: Maximum control, maximum cost. Every chart, every filter, every time-range selector must be built and maintained. We estimated 3-6 engineer-months for a minimum viable dashboard, with ongoing maintenance consuming 20-30% of one engineer's time. This is a poor allocation when the same engineer could build AI features.

Grafana: Open-source (AGPL v3), battle-tested at scale (Netflix, Uber, GitLab all use it), supports Prometheus and ClickHouse natively, has a rich plugin ecosystem, and allows custom plugins for domain-specific visualizations. Setup time: days, not months. The trade-off is that Grafana's UI is opinionated -- but its opinions are good, refined over a decade of usage.

SaaS (Datadog/New Relic): Zero build time, but $12K-18K/month at our volume (see HIP-0031 cost analysis), vendor lock-in, and no custom AI-specific panels. The cost alone disqualifies this option.

Decision: Grafana with custom plugins for AI-specific visualizations. We get 90% of the dashboard functionality for free, and build the remaining 10% (AI metrics panels, cost attribution views) as Grafana plugins.

Why AI Monitoring AI (Anomaly Detection)

Static alert thresholds break down for AI workloads. LLM latency has high variance by nature: a 10-token completion takes 500ms, a 4096-token completion takes 30 seconds. Setting a static threshold of "P95 > 10s" either alerts constantly (too sensitive) or misses real degradation (too lenient).

Visor uses a lightweight anomaly detection model that learns normal behavior patterns per model and per endpoint. The model is deliberately simple -- a sliding-window z-score over the last 24 hours with seasonal adjustment for time-of-day patterns. No deep learning, no GPU required. The anomaly detector runs as a goroutine inside the Visor API server, consuming Prometheus query results.

This is "AI monitoring AI" in the most literal sense: a statistical model watching the behavior of language models. The meta-recursion is intentional. The monitoring model is small (megabytes of state), deterministic (reproducible anomaly scores), and explainable (z-score with a clear threshold). It does not suffer from the same failure modes as the models it monitors.

Integration Architecture

┌───────────────────────────────────────────────────────────────────┐
│                      Visor (visor.hanzo.ai)                       │
├──────────────────┬──────────────────┬─────────────────────────────┤
│  Grafana (:3053) │ Visor API (:8053)│  Anomaly Detector (internal)│
│  - AI Dashboards │ - Cost Engine    │  - Z-score per metric       │
│  - SLA Panels    │ - Alert Router   │  - Seasonal adjustment      │
│  - Custom Plugins│ - SLA Tracker    │  - 24h sliding window       │
└────────┬─────────┴────────┬─────────┴──────────┬──────────────────┘
         │                  │                     │
    ┌────▼────┐       ┌─────▼──────┐        ┌────▼─────┐
    │Prometheus│       │ClickHouse  │        │ Insights │
    │(HIP-0031)│       │(HIP-0031)  │        │(HIP-0017)│
    │ Metrics  │       │Logs/Traces │        │Analytics │
    └──────────┘       └────────────┘        └──────────┘

Visor reads from Prometheus (real-time metrics), ClickHouse (historical logs and traces), and Insights (business analytics). It does not duplicate data collection -- that remains Zap's responsibility (HIP-0031). Visor only consumes, transforms, visualizes, and alerts.

Specification

AI-Specific Metrics

Beyond the standard infrastructure metrics defined in HIP-0031, Visor tracks and visualizes AI-specific metrics that the Grafana dashboards present:

Token Throughput

# Tokens per second (input), 5-minute rate
sum(rate(hanzo_llm_tokens_total{direction="input"}[5m]))

# Tokens per second (output), per model
sum(rate(hanzo_llm_tokens_total{direction="output"}[5m])) by (model)

# Token ratio (output/input) -- measures model verbosity
sum(rate(hanzo_llm_tokens_total{direction="output"}[5m])) by (model)
  /
sum(rate(hanzo_llm_tokens_total{direction="input"}[5m])) by (model)

Latency Percentiles

# P50 latency by model
histogram_quantile(0.50, sum(rate(hanzo_llm_request_duration_seconds_bucket[5m])) by (le, model))

# P95 latency by model
histogram_quantile(0.95, sum(rate(hanzo_llm_request_duration_seconds_bucket[5m])) by (le, model))

# P99 latency by model
histogram_quantile(0.99, sum(rate(hanzo_llm_request_duration_seconds_bucket[5m])) by (le, model))

# Time-to-first-token (streaming requests only)
histogram_quantile(0.95, sum(rate(hanzo_llm_ttft_seconds_bucket[5m])) by (le, model))

Error Classification

# Error rate by provider and error type
sum(rate(hanzo_llm_provider_errors_total[5m])) by (provider, error)

# Model-level error rate (errors / total requests)
sum(rate(hanzo_llm_provider_errors_total[5m])) by (model)
  /
sum(rate(hanzo_llm_request_duration_seconds_count[5m])) by (model)

# Rate limit events (a leading indicator of provider saturation)
sum(rate(hanzo_llm_provider_errors_total{error="rate_limit"}[5m])) by (provider)

Cost Per Request

Cost is not a Prometheus metric -- it is computed by the Visor API server by joining token counts with pricing tables:

{
  "cost_per_request": {
    "model": "gpt-4-turbo",
    "input_tokens": 1500,
    "output_tokens": 500,
    "input_cost_per_1k": 0.01,
    "output_cost_per_1k": 0.03,
    "total_cost_usd": 0.030
  }
}

The Visor API exposes a /api/v1/costs endpoint that Grafana queries via the JSON API data source plugin. Pricing tables are maintained in a YAML configuration file:

# visor-pricing.yaml
providers:
  openai:
    gpt-4-turbo:
      input_per_1k_tokens: 0.01
      output_per_1k_tokens: 0.03
    gpt-4o:
      input_per_1k_tokens: 0.0025
      output_per_1k_tokens: 0.01
  anthropic:
    claude-3-opus:
      input_per_1k_tokens: 0.015
      output_per_1k_tokens: 0.075
    claude-3-sonnet:
      input_per_1k_tokens: 0.003
      output_per_1k_tokens: 0.015
  together:
    zen-72b:
      input_per_1k_tokens: 0.0009
      output_per_1k_tokens: 0.0009
    zen-8b:
      input_per_1k_tokens: 0.0002
      output_per_1k_tokens: 0.0002

Cost Attribution

Visor attributes costs along four dimensions:

Dimension	Source	Granularity
Organization	`hanzo.org_id` span attribute	Per-org totals, hourly/daily/monthly
User	`user_id` property from LLM Gateway logs	Per-user totals
Model	`model` label on `hanzo_llm_tokens_total`	Per-model cost breakdown
Endpoint	`http.path` span attribute	Per-API-endpoint cost

Cost Attribution Query

-- ClickHouse: daily cost by organization and model (last 30 days)
SELECT
    toDate(ts) AS day,
    JSONExtractString(attributes, 'org_id') AS org,
    JSONExtractString(attributes, 'model') AS model,
    sum(JSONExtractFloat(attributes, 'cost_usd')) AS total_cost_usd,
    sum(JSONExtractUInt(attributes, 'tokens.input')) AS input_tokens,
    sum(JSONExtractUInt(attributes, 'tokens.output')) AS output_tokens
FROM hanzo_logs
WHERE service = 'llm-gateway'
  AND msg = 'request completed'
  AND ts >= now() - INTERVAL 30 DAY
GROUP BY day, org, model
ORDER BY day DESC, total_cost_usd DESC

SLA Monitoring

Visor tracks uptime and performance SLAs per service. SLA definitions are stored in the Visor API configuration:

# visor-slas.yaml
slas:
  llm-gateway:
    availability: 99.9          # percentage uptime target
    p95_latency_ms: 10000       # max acceptable P95 latency
    error_rate: 0.01            # max 1% error rate
    measurement_window: 30d     # rolling 30-day window

  iam:
    availability: 99.95
    p95_latency_ms: 500
    error_rate: 0.001
    measurement_window: 30d

  insights-capture:
    availability: 99.99
    p95_latency_ms: 100
    error_rate: 0.001
    measurement_window: 30d

SLA status is computed from Prometheus metrics:

# Availability: fraction of time the service returned non-5xx responses
1 - (
  sum(rate(hanzo_llm_provider_errors_total{error=~"5.."}[30d]))
  /
  sum(rate(hanzo_llm_request_duration_seconds_count[30d]))
)

# SLA burn rate: how fast are we consuming our error budget?
# A burn rate of 1.0 means we will exactly exhaust the budget over the window.
# Above 1.0 means we are burning faster than sustainable.
(
  sum(rate(hanzo_llm_provider_errors_total[1h]))
  /
  sum(rate(hanzo_llm_request_duration_seconds_count[1h]))
)
/
(1 - 0.999)  # error budget = 1 - SLA target

Anomaly Detection

The Visor anomaly detector evaluates metrics every 60 seconds. For each monitored metric, it computes a z-score against the 24-hour baseline, adjusted for time-of-day seasonality.

Algorithm

For each metric M at time T:
  1. Fetch M values for the same hour-of-day over the last 7 days
  2. Compute mean(M) and stddev(M) from that window
  3. Fetch current M value
  4. z_score = (current - mean) / stddev
  5. If |z_score| > threshold (default 3.0), emit anomaly alert

Configuration

# visor-anomaly.yaml
anomaly_detection:
  enabled: true
  evaluation_interval: 60s
  default_threshold: 3.0        # standard deviations
  seasonal_window: 7d           # lookback for baseline
  min_data_points: 48           # minimum samples before alerting (avoid cold-start noise)

  metrics:
    - name: hanzo_llm_request_duration_seconds
      quantile: 0.95
      threshold: 3.0
      labels: [model, provider]

    - name: hanzo_llm_tokens_total
      rate_interval: 5m
      threshold: 2.5            # more sensitive for throughput drops
      labels: [direction, model]

    - name: hanzo_llm_provider_errors_total
      rate_interval: 5m
      threshold: 2.0            # very sensitive for error spikes
      labels: [provider, error]

Alert Management

Visor routes alerts to multiple channels. Prometheus Alertmanager evaluates rules; Visor enriches alerts with context (cost impact, affected users, historical frequency) and routes them.

Alert Routing

# visor-alerts.yaml
routes:
  - match:
      severity: critical
    receivers: [pagerduty, slack-critical]
    repeat_interval: 5m
    group_wait: 30s

  - match:
      severity: warning
    receivers: [slack-warnings]
    repeat_interval: 30m
    group_wait: 2m

  - match:
      severity: info
      type: anomaly
    receivers: [slack-ai-ops]
    repeat_interval: 1h

  - match:
      type: cost
    receivers: [slack-finance, webhook-billing]
    repeat_interval: 6h

receivers:
  pagerduty:
    type: pagerduty
    service_key: "${PAGERDUTY_SERVICE_KEY}"
    severity_map:
      critical: critical
      warning: warning

  slack-critical:
    type: slack
    webhook_url: "${SLACK_CRITICAL_WEBHOOK}"
    channel: "#incidents"
    mention: "@oncall"

  slack-warnings:
    type: slack
    webhook_url: "${SLACK_WARNINGS_WEBHOOK}"
    channel: "#monitoring"

  slack-ai-ops:
    type: slack
    webhook_url: "${SLACK_AI_OPS_WEBHOOK}"
    channel: "#ai-ops"

  webhook-billing:
    type: webhook
    url: "https://commerce.hanzo.ai/api/v1/alerts"
    method: POST
    headers:
      Authorization: "Bearer ${COMMERCE_API_TOKEN}"

AI-Specific Alert Rules

groups:
  - name: visor-ai
    rules:
      - alert: LLMCostSpike
        expr: |
          sum(rate(hanzo_llm_tokens_total[5m]) * on(model)
            group_left visor_model_cost_per_token) by (org_id)
          > 10
        for: 5m
        labels:
          severity: warning
          type: cost
        annotations:
          summary: "Org {{ $labels.org_id }} spending >$10/min on LLM"
          runbook: "https://visor.hanzo.ai/runbooks/cost-spike"

      - alert: ModelDegradation
        expr: |
          visor_anomaly_score{metric="hanzo_llm_request_duration_seconds"} > 3.0
            and
          rate(hanzo_llm_provider_errors_total[5m]) < 0.01
        for: 10m
        labels:
          severity: warning
          type: anomaly
        annotations:
          summary: "Model {{ $labels.model }} latency anomaly (z={{ $value }}) with low error rate -- possible degradation"

      - alert: ProviderRateLimitEscalation
        expr: |
          rate(hanzo_llm_provider_errors_total{error="rate_limit"}[5m])
          / rate(hanzo_llm_request_duration_seconds_count[5m])
          > 0.10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Provider {{ $labels.provider }} rate-limiting >10% of requests"

      - alert: SLABurnRateHigh
        expr: visor_sla_burn_rate > 5.0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Service {{ $labels.service }} burning SLA error budget 5x faster than sustainable"

      - alert: CacheEfficiencyDrop
        expr: |
          rate(hanzo_llm_cache_hit_total[1h])
          / (rate(hanzo_llm_cache_hit_total[1h]) + rate(hanzo_llm_cache_miss_total[1h]))
          < 0.20
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Semantic cache hit rate below 20% -- check cache invalidation"

Grafana Dashboards

Visor ships five standard dashboards, provisioned automatically via Grafana's dashboard-as-code:

1. AI Operations Overview

The primary dashboard for AI platform operators. Six rows, twelve panels:

Row	Panels	Purpose
Request Flow	Request rate (stacked by provider), Active requests gauge	Traffic volume
Latency	P50/P95/P99 time series by model, TTFT distribution	Performance
Tokens	Input/output token rate, Token ratio by model	Throughput
Errors	Error rate by provider, Error classification pie chart	Reliability
Cost	Hourly cost accumulation, Cost by org (top 10)	Economics
Cache	Cache hit ratio gauge, Cache savings (USD avoided)	Efficiency

2. Cost Attribution

Breakdown of LLM spend across every dimension. Designed for finance and billing teams.

Total spend: hourly, daily, monthly time series
By organization: stacked bar chart
By model: pie chart with cost-per-1K-token annotation
By user: table with top 50 spenders
By endpoint: treemap visualization
Budget tracking: actual vs. projected spend with alert threshold lines

3. SLA Status

Traffic-light dashboard for service level objectives.

Per-service availability gauge (green/yellow/red)
Error budget remaining (percentage bar)
Burn rate trend (7-day time series)
Incident timeline (annotations from PagerDuty)
Monthly SLA report summary table

4. Anomaly Detection

Real-time view of the anomaly detector output.

Anomaly score time series per metric (z-score with threshold line)
Active anomalies table (metric, labels, score, duration)
Historical anomaly frequency heatmap (hour-of-day vs day-of-week)
False positive rate tracking (operator feedback loop)

5. Infrastructure Health

Extended version of HIP-0031's infrastructure dashboard, with Visor-specific additions:

Cross-service dependency graph (which services call which)
Pod resource utilization with headroom indicators
Database connection pool saturation
Kafka consumer lag across all topics

Custom Grafana Plugins

Visor includes two custom Grafana plugins for AI workload visualization:

hanzo-ai-cost-panel

A panel plugin that renders cost attribution with drill-down. Click on an organization to see per-user breakdown. Click on a user to see per-model breakdown. Click on a model to see per-endpoint breakdown. This hierarchical drill-down is not possible with standard Grafana panels.

Plugin ID: hanzo-ai-cost-panel
Type: Panel
Data sources: JSON API (Visor /api/v1/costs)
Install: grafana-cli plugins install hanzo-ai-cost-panel

hanzo-sla-gauge

A panel plugin that renders SLA status as a gauge with error budget consumption. The gauge shows:

Current availability percentage (large number, color-coded)
Error budget remaining (progress bar)
Burn rate arrow (up/down/stable)
Time until budget exhaustion at current burn rate

Plugin ID: hanzo-sla-gauge
Type: Panel
Data sources: Prometheus (via Visor SLA recording rules)
Install: grafana-cli plugins install hanzo-sla-gauge

API Specification

The Visor API server (port 8053) exposes REST endpoints consumed by Grafana, external alerting systems, and the Hanzo CLI.

GET /api/v1/costs

Returns cost data for Grafana's JSON API data source.

GET /api/v1/costs?org=hanzo&from=2026-02-01&to=2026-02-23&group_by=model HTTP/1.1
Host: visor.hanzo.ai
Authorization: Bearer ${VISOR_API_TOKEN}

Response:

{
  "total_cost_usd": 4231.50,
  "period": {"from": "2026-02-01", "to": "2026-02-23"},
  "breakdown": [
    {"model": "gpt-4-turbo", "cost_usd": 1850.20, "requests": 62340, "tokens": 45000000},
    {"model": "claude-3-sonnet", "cost_usd": 1120.80, "requests": 89200, "tokens": 38000000},
    {"model": "zen-72b", "cost_usd": 620.30, "requests": 340000, "tokens": 120000000},
    {"model": "zen-8b", "cost_usd": 640.20, "requests": 1200000, "tokens": 280000000}
  ]
}

GET /api/v1/sla

Returns current SLA status for all monitored services.

GET /api/v1/sla HTTP/1.1
Host: visor.hanzo.ai
Authorization: Bearer ${VISOR_API_TOKEN}

Response:

{
  "services": [
    {
      "name": "llm-gateway",
      "target": 99.9,
      "current": 99.94,
      "budget_remaining_pct": 42.0,
      "burn_rate": 0.8,
      "window": "30d",
      "status": "healthy"
    },
    {
      "name": "iam",
      "target": 99.95,
      "current": 99.98,
      "budget_remaining_pct": 71.0,
      "burn_rate": 0.4,
      "window": "30d",
      "status": "healthy"
    }
  ]
}

GET /api/v1/anomalies

Returns active anomalies detected by the anomaly engine.

GET /api/v1/anomalies?active=true HTTP/1.1
Host: visor.hanzo.ai
Authorization: Bearer ${VISOR_API_TOKEN}

Response:

{
  "anomalies": [
    {
      "metric": "hanzo_llm_request_duration_seconds",
      "labels": {"model": "gpt-4-turbo", "quantile": "0.95"},
      "z_score": 3.4,
      "current_value": 18.2,
      "baseline_mean": 8.1,
      "baseline_stddev": 2.97,
      "detected_at": "2026-02-23T14:30:00Z",
      "duration_seconds": 600
    }
  ]
}

Implementation

Deployment Architecture

Visor runs three containers on the hanzo-k8s cluster:

Component	Image	Port	CPU	Memory	Purpose
`visor-api`	`ghcr.io/hanzoai/visor:latest`	8053	250m	256Mi	Cost engine, SLA tracker, anomaly detector, alert router
`visor-grafana`	`grafana/grafana-oss:11.x`	3053	250m	512Mi	Dashboards, custom plugins
`visor-alertmanager`	`prom/alertmanager:0.27`	9093	100m	128Mi	Alert deduplication, routing, silencing

Total resource footprint: 600m CPU, 896Mi memory. Minimal overhead for the value delivered.

Kubernetes Manifests

apiVersion: apps/v1
kind: Deployment
metadata:
  name: visor-api
  namespace: hanzo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: visor-api
  template:
    metadata:
      labels:
        app: visor-api
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8053"
    spec:
      containers:
      - name: visor-api
        image: ghcr.io/hanzoai/visor:latest
        ports:
        - containerPort: 8053
          name: api
        env:
        - name: PROMETHEUS_URL
          value: "http://prometheus.hanzo.svc:9090"
        - name: CLICKHOUSE_URL
          value: "tcp://clickhouse.hanzo.svc:9000"
        - name: ALERTMANAGER_URL
          value: "http://visor-alertmanager.hanzo.svc:9093"
        volumeMounts:
        - name: config
          mountPath: /etc/visor
        resources:
          requests: { cpu: "100m", memory: "128Mi" }
          limits:   { cpu: "250m", memory: "256Mi" }
        readinessProbe:
          httpGet:
            path: /health
            port: 8053
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /health
            port: 8053
          initialDelaySeconds: 10
          periodSeconds: 30
      volumes:
      - name: config
        configMap:
          name: visor-config
---
apiVersion: v1
kind: Service
metadata:
  name: visor-api
  namespace: hanzo
spec:
  selector:
    app: visor-api
  ports:
  - port: 8053
    targetPort: 8053
    name: api

Grafana Provisioning

Dashboards are provisioned as code via ConfigMap. Data sources connect to Prometheus, ClickHouse, and the Visor API:

# grafana-datasources.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus.hanzo.svc:9090
    isDefault: true

  - name: ClickHouse
    type: grafana-clickhouse-datasource
    access: proxy
    jsonData:
      host: clickhouse.hanzo.svc
      port: 9000
      defaultDatabase: default

  - name: Visor API
    type: marcusolsson-json-datasource
    access: proxy
    url: http://visor-api.hanzo.svc:8053/api/v1
    jsonData:
      httpHeaderName1: Authorization
    secureJsonData:
      httpHeaderValue1: "Bearer ${VISOR_API_TOKEN}"

  - name: Insights
    type: grafana-clickhouse-datasource
    access: proxy
    jsonData:
      host: insights-clickhouse.hanzo.svc
      port: 9000
      defaultDatabase: posthog

Visor API: Go Implementation

The Visor API server is written in Go for the same reasons as Zap (HIP-0031): small binary, fast startup, low memory, ecosystem alignment.

package main

import (
    "net/http"
    "time"

    "github.com/hanzoai/visor/anomaly"
    "github.com/hanzoai/visor/cost"
    "github.com/hanzoai/visor/sla"
    "github.com/prometheus/client_golang/api"
)

func main() {
    promClient, _ := api.NewClient(api.Config{
        Address: envOrDefault("PROMETHEUS_URL", "http://prometheus:9090"),
    })

    costEngine := cost.NewEngine("visor-pricing.yaml")
    slaTracker := sla.NewTracker("visor-slas.yaml", promClient)
    detector := anomaly.NewDetector("visor-anomaly.yaml", promClient)

    // Start background loops
    go slaTracker.Run(30 * time.Second)
    go detector.Run(60 * time.Second)

    mux := http.NewServeMux()
    mux.HandleFunc("/health", healthHandler)
    mux.HandleFunc("/api/v1/costs", costEngine.Handler)
    mux.HandleFunc("/api/v1/sla", slaTracker.Handler)
    mux.HandleFunc("/api/v1/anomalies", detector.Handler)
    mux.HandleFunc("/metrics", metricsHandler)

    http.ListenAndServe(":8053", mux)
}

Security Considerations

Access Control

Grafana integrates with Hanzo IAM (hanzo.id) via OAuth2 for single sign-on. Role mapping:

IAM Role	Grafana Role	Access
`org:admin`	Admin	Full dashboard CRUD, data source config, user management
`org:member`	Editor	View and edit dashboards, cannot modify data sources
`org:viewer`	Viewer	Read-only dashboard access

# grafana.ini
[auth.generic_oauth]
enabled = true
name = Hanzo IAM
client_id = ${VISOR_OAUTH_CLIENT_ID}
client_secret = ${VISOR_OAUTH_CLIENT_SECRET}
auth_url = https://hanzo.id/oauth/authorize
token_url = https://hanzo.id/oauth/token
api_url = https://hanzo.id/api/userinfo
scopes = openid profile email
role_attribute_path = contains(groups[*], 'admin') && 'Admin' || contains(groups[*], 'editor') && 'Editor' || 'Viewer'
allow_sign_up = true

API Authentication

The Visor API requires a bearer token on all /api/v1/* endpoints. Tokens are issued through Hanzo IAM and validated on each request. The /health and /metrics endpoints are unauthenticated (required for Kubernetes probes and Prometheus scraping).

Network Isolation

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: visor-api-policy
  namespace: hanzo
spec:
  podSelector:
    matchLabels:
      app: visor-api
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: visor-grafana
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - port: 8053
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: prometheus
    ports:
    - port: 9090
  - to:
    - podSelector:
        matchLabels:
          app: clickhouse
    ports:
    - port: 9000
  - to:
    - podSelector:
        matchLabels:
          app: visor-alertmanager
    ports:
    - port: 9093

Visor API can only read from Prometheus and ClickHouse, and write to Alertmanager. It has no access to user databases, IAM internals, or the internet.

Sensitive Data Handling

Cost data and SLA metrics are not PII, but they are commercially sensitive. The Visor API does not log request bodies or response payloads. Cost attribution data is only accessible to users with org:admin or org:member roles. Per-user cost breakdowns require org:admin.

References

HIP-0031: Observability & Metrics Standard -- Data collection layer (Zap)
HIP-0017: Analytics Event Standard -- Product analytics (Insights)
HIP-0004: LLM Gateway -- LLM metrics source
HIP-0030: Event Streaming Standard -- Kafka infrastructure
Grafana Documentation
Prometheus Alertmanager
Grafana Plugin Development
Google SRE: Alerting on SLOs
Visor Repository -- Reference implementation

Visor Monitoring & Supervision Standard

HIP-53: Visor Monitoring & Supervision Standard

Abstract

Motivation

The Gap Between Collection and Understanding

Why Monitoring AI Is Different

Why a Separate Service From O11y

Cost Attribution: The Business Case

Design Philosophy

Why Grafana Over a Custom Dashboard

Why AI Monitoring AI (Anomaly Detection)

Integration Architecture

Specification

AI-Specific Metrics

Token Throughput

Latency Percentiles

Error Classification

Cost Per Request

Cost Attribution

Cost Attribution Query

SLA Monitoring

Anomaly Detection

Algorithm

Configuration

Alert Management

Alert Routing

AI-Specific Alert Rules

Grafana Dashboards

1. AI Operations Overview

2. Cost Attribution

3. SLA Status

4. Anomaly Detection

5. Infrastructure Health

Custom Grafana Plugins

hanzo-ai-cost-panel

hanzo-sla-gauge

API Specification

GET /api/v1/costs

GET /api/v1/sla

GET /api/v1/anomalies

Implementation

Deployment Architecture

Kubernetes Manifests

Grafana Provisioning

Visor API: Go Implementation

Security Considerations

Access Control

API Authentication

Network Isolation

Sensitive Data Handling

References

Copyright