This project is Phase 2 of the Cloud Costs Monitor platform. Where Phase 1 focused on automated data collection, anomaly detection, and scheduled reporting, Phase 2 adds a real-time conversational interface: users open a browser, type a question like “What did the team X spend on cloud Y last quarter?”, and the system autonomously writes and executes the SQL, verifies the numbers, and returns a grounded answer — all within seconds.
The chat is not a thin wrapper around an LLM. It is a multi-layered pipeline that enforces data integrity at every step: the AI decides which queries to run, a SQL guard validates every statement before execution, a grounding engine ensures data questions are backed by actual query results, and a post-response number verifier checks that every dollar figure and percentage in the answer traces back to a value returned by SQL. If the AI hallucinates a number, the system catches it.
Key Contributions
Multi-turn LLM tool-calling loop with up to 10 rounds per request: the model autonomously decides which SQL queries and configuration lookups to execute, iterating until it has enough data to answer. Tool definitions are loaded from a structured JSON schema, not hardcoded.
8 LLM-callable tools exposing the full cost data model:
run_sql_query(read-only SQL execution),get_owner_cte/get_vendor_cte/get_cost_type_cte(CTE generators fromconfig.yaml),get_category_config,get_owner_config,get_db_date_range, andget_db_schema.multi-layer SQL defense-in-depth: physical read-only connection (
?mode=ro), ATTACH prevention (setlimit), sqlparse-based statement validation (rejects non-SELECT, DDL, DML, and 20 forbidden keywords), row cap (5,000), execution timeout (30 s via progress handler), query-cost preflight analysis (EXPLAIN QUERY PLAN warns about full table scans), and disabledload_extension.Grounding enforcement: a heuristic classifier detects data questions (cost, spend, budget, forecast, etc.) and requires at least one successful SQL query before the answer is accepted. Ungrounded answers receive a disclaimer instead of fabricated numbers.
Anti-hallucination number verifier: after every response, a dedicated module extracts every dollar amount, percentage, and contextual count from the answer text and checks each against the SQL result sets from the same turn. Unmatched figures are flagged as warnings; an optional auto-retry mode re-prompts the LLM to self-correct.
SSE streaming endpoint (
POST /api/v1/chat/stream): real-time Server-Sent Events delivertool_start,tool_end,delta(answer text),done(full response JSON), anderrorevents — the frontend shows tool progress live while the model is still working.Per-user authentication with RBAC: password hashing, admin and user roles, signed cookie sessions, CSRF double-submit pattern, sliding idle expiry (configurable) with a hard absolute ceiling, remember-me persistent device tokens using a selector+validator pattern (encrypted and hashed, rotated on each use), and a configurable concurrent session limit per user.
Admin dashboard: browser-based UI for user management (create, update, disable, reset password), active session listing and forced revocation, audit log viewer with event-type and username filters, and API token management.
Personal Access Tokens (PAT) for machine-to-machine access: tokens with 192 bits of entropy, bcrypt-hashed storage, scoped permissions, expiry dates, and revocation support. Enables automated triage workflows and CI/CD integration without browser sessions.
Bug report system: users file reports directly from the chat interface (
Bug report: <description>); reports are stored with a link to the full conversation transcript, triaged by admins via REST API or a dedicated CLI tool, and tracked through anopen→reviewed→closedlifecycle.Comprehensive audit trail: every login, logout, chat message, SQL execution, admin action, and bug report event is logged to a dedicated audit table with request ID, username, IP address, endpoint, and JSON detail blob. A separate query log captures every SQL statement with execution time, row count, truncation status, and preflight warnings.
React SPA frontend: React + TypeScript + Vite with client-side routing (login, chat, admin pages), a conversation sidebar with search/pin/delete, inline SQL blocks with syntax highlighting, methodology disclosure, chart rendering (bar/line/pie via recharts), and a change-password modal.
Two deployment modes: a packaged
exe(PyInstaller one-dir, no Python required on the server) or a standard Python/venv + uvicorn setup. Both support IIS reverse proxy with ARR + URL Rewrite for HTTPS termination, and DPAPI encryption of the.envfile on production servers.450+ automated tests across the webapp test suite covering authentication flows, session lifecycle, CSRF enforcement, rate limiting, SQL guard edge cases, grounding logic, number verification, chat pipeline, admin endpoints, migration framework, and full integration tests via FastAPI TestClient.
Highlights
The chat is not a general-purpose AI assistant — it is a purpose-built cost analyst with a domain-specific system prompt, structured tool definitions, and built-in knowledge of the cost data model (sentinel filters, double-counting rules, amortized vs unblended cost selection, owner/category/vendor CTE generation). The LLM never sees raw database credentials or write-capable connections.
Query-cost preflight runs EXPLAIN QUERY PLAN before every SQL execution. If the plan reveals a full scan of large tables without an index, the system emits an advisory warning — visible to both the LLM (so it can rewrite the query) and the user (in the response metadata).
Number verification catches a class of errors that grounding alone cannot: the LLM might execute correct queries but then misquote the results in its answer (e.g., swapping two owners’ totals, rounding incorrectly, or citing a number from a prior turn). The verifier extracts every $X,XXX.XX, XX.X%, and N accounts pattern from the answer and matches each against the union of all SQL result cells, column totals, and simple two-value arithmetic (sum, difference, ratio). Unmatched figures surface as warnings with the exact unverifiable values listed.
The CSRF implementation uses a double-submit cookie pattern where the cc_csrf cookie is intentionally NOT HttpOnly (so the React app can read it) while cc_session is HttpOnly (invisible to JavaScript). Bearer token authentication is exempt from CSRF checks because tokens are inherently not vulnerable to cross-site request forgery.
Background cleanup runs hourly: purges expired sessions, removes stale audit log entries older than the retention period, and cleans up old query log rows.
Chat pipeline (per request)
| Step | Component | What happens |
|---|---|---|
| 1 | Rate limiter | Reject if > 10 requests/min for this session |
| 2 | Bug report interceptor | Messages starting with Bug report: are filed directly; no LLM call |
| 3 | Conversation resolver | Get or create conversation; verify ownership |
| 4 | System prompt | Load from Markdown template; inject current date |
| 5 | History loader | Last 20 messages from app.db for context |
| 6 | Tool-calling loop | Up to 10 rounds: LLM → tool calls → results → LLM |
| 7 | Grounding check | Data questions without successful SQL → disclaimer |
| 8 | Number verification | Every $, %, and count figure checked against SQL results |
| 9 | Auto-retry (optional) | Re-prompt LLM to correct unmatched figures |
| 10 | Persistence | Save messages to app.db; update conversation timestamp |
| 11 | Audit | Log event with tool call count, token usage, duration |