2026-05-15DataMesh Consulting
15 May — Intelligence dashboard gets weekly quotas, cached-vs-billable split, and live provider rate-limit panel
Three small but operationally important additions to the operator dashboard. Weekly budget cards stop us silently overspending across a week of normal-looking days. The 24-hour LLM call count now separates cached hits from real billable calls, so the headline number reflects actual spend instead of being padded by cache returns. And a new provider rate-limit panel captures Moonshot's `x-ratelimit-*` response headers in real time so we can see how close we are to throttling before it bites.
Where this picks up from yesterday
The V3 self-healing refactor that landed yesterday gave us the
data: every Kimi call writes an analytics.llm_call_logs row
with model, tokens in/out, duration, cached flag, and siteId.
The Intelligence dashboard tab gave us the first read on it
— daily token spend, per-site attribution, top keywords by
cost.
What it didn't give us was forward visibility. Three blind spots remained, and today's commit closes them.
1 — Weekly budget cards
The existing dashboard tracked daily budget usage against
AI_DAILY_TOKEN_BUDGET / AI_DAILY_USD_BUDGET. Useful, but
deceptive at a weekly level: seven days at 70% of daily quota
each look fine on every individual day card and add up to a
month-end surprise.
New cards alongside the daily ones:
- Weekly tokens used / budget — accumulated total since
- Weekly USD used / budget — same window, priced via the
LLM_PRICING table that powers daily.
Both default to 7× the daily budget when
AI_WEEKLY_TOKEN_BUDGET / AI_WEEKLY_USD_BUDGET aren't set
explicitly, so the cards render meaningfully even with no
operator config. Override them when the actual weekly target
isn't 7× daily — for example, lower on weekends.
2 — Cached vs billable, separated
The "LLM Calls (24h)" tile had a credibility problem. Every
cached embedding hit, every dedup-layer short-circuit, every
Layer-2 page-fingerprint hit produced an
analytics.llm_call_logs row with cachedResult=true. Those
are wins — they're the savings the V3 dedup layers deliver —
but they were padding the headline call count by ~30-50% on
busy days, making the tile look like we were doing far more
LLM work than we actually were.
Today's rename and split:
- Tile is now "Billable Calls (24h)" — strictly
cachedResult=false rows. This is the number that
corresponds to actual Moonshot/Kimi cost.
- Sub-line below: "Cached: N · Total events: M" so the
Combined with yesterday's V3 S2 + S4 (Layer-3 tender-content dedup, Layer-2 page-fingerprint dedup), the savings are now plain in the UI rather than implicit in the cost-delta math.
3 — Provider rate-limit panel
The thing that scared us most about scaling Hermes calls was
hitting Moonshot's per-key rate limit without warning. Up to
now we'd see 429s in the application logs and infer "we got
close." A new ProviderRatelimitService lifts the
x-ratelimit- response headers off every Moonshot HTTP
call (we already had the underlying axios interceptors —
this just captures the existing data) and writes a snapshot
to Redis keyed on (provider, model) with a 1h TTL.
Surfaced at GET /v1/dashboard/intelligence/ratelimits:
- Remaining requests in window (
x-ratelimit-remaining-requests) - Remaining tokens in window (
x-ratelimit-remaining-tokens) - Reset timer (
x-ratelimit-reset-, normalized to seconds-
- Usage bar showing consumed / limit, colour-coded — green
Wired into three call paths:
backend/.../kimi.service— direct Kimi calls (match,
backend/.../embedding.service— embedding requests.- Forwarded from Hermes HTTP fallback via
/analyze-site→
relearn.processor — so site-learner calls show up in
the panel even though they originate in Hermes.
One honest gap to call out: CLI subprocess paths (Kimi Code CLI) don't surface response headers, so those calls produce no rate-limit snapshots. The panel only reflects HTTP-fallback usage. CLI quota is subscription- billed rather than per-token, so this is consistent — the panel is about per-token API limits, which only the HTTP path can hit.
Why these three together
Each of these is a piece of the same picture: before, we could only see what AI spend had happened. After today, we can see what's about to happen:
- Weekly cards catch trends across days that look fine
- Cached/billable split shows whether our caching layers are
- Rate-limit panel warns us before a 429 storm rather than
Combined with yesterday's site-health audit + auto-relearn, the operator dashboard is now closer to a system you can run without watching it constantly. That's the bar we've been trying to clear.
Status & what's next
Backend code shipped tonight. Dashboard UI shipped in the
same commit (a single Intelligence-tab update). No data
migration — analytics.llm_call_logs already had the
columns we needed; the rate-limit Redis keys self-populate
on first call after deploy.
What's queued next:
- V3 S7 — admin surfaces in web + iOS for the site-
- V3 S8 — docs sweep. AGENTS.md, SYSTEM-STATE.md, and