Technologies: FastAPI, React 18, TypeScript, Node.js 18, PostgreSQL, ChromaDB, Claude API, Docker, Nginx, SQLAlchemy, Vite
- AI-Powered Analysis Pipeline: Built production-grade async FastAPI service with multi-step deterministic pipeline combining regex-based clause extraction, RAG retrieval from versioned playbook chunks, heuristic deviation scoring, and optional LLM validation via Claude Sonnet 4 with streaming responses.
- Advanced RAG Implementation: Architected immutable playbook versioning system with ChromaDB vector store (all-MiniLM-L6-v2 embeddings), 800-word sliding window chunking, grounding enforcement requiring chunk citations for every finding, and hot reindexing without downtime.
- Real-Time Streaming Architecture: Implemented Server-Sent Events (SSE) with custom EventBus using asyncio.Queue for live status updates, partial findings, and final results, with Nginx configured for unbuffered SSE passthrough and 100% async/await throughout backend.
- Multi-Layer Security: Hardened service with prompt-injection defense (regex content filtering), per-IP rate limiting (SlowAPI), strict Pydantic schema validation, grounding checks dropping ungrounded findings, and CORS middleware for controlled access.
- Database Flexibility: Designed auto-detection system using PostgreSQL 15 (production) with async SQLAlchemy ORM in Docker, SQLite with aiosqlite and WAL mode for local development, handling postgres+asyncpg → postgresql+asyncpg conversion and automatic data directory creation.
- Cost & Usage Tracking: Implemented token counting for Claude API calls with estimated cost calculation ($0.000015/input, $0.000075/output token), per-analysis usage metrics surfaced in UI and API responses, enabling offline testing with heuristic fallback when API key not configured.
- Production Deployment: Deployed on AWS EC2 with Docker Compose orchestration (4 services: db, backend, frontend, nginx), multi-stage builds with Node 20-slim and Nginx Alpine, health checks with automatic playbook seeding, and graceful embedding rebuild on volume loss.
- Testing & Quality: Built comprehensive test suite with pytest-asyncio, FastAPI TestClient, integration tests for health checks, analysis flow, guardrail triggers, rate limiting, and offline mode (BYPASS_DB_FOR_TESTS), with Python 3.12 compatibility patches.
Project Scale: 13 Python modules (~1,200 LOC), 10 RESTful API endpoints, 6 React TypeScript components, 7 clause types analyzed (payment terms, retainage, notice periods, indemnification, termination, dispute resolution, liquidated damages)
Malayalam Voice Agent (Streaming API)
GitHub
Technologies: FastAPI, WebSockets, WebRTC VAD, Google Translate API, gTTS, asyncio, Pydantic, NumPy, Uvicorn
- Full-Duplex Voice Pipeline Architecture: Engineered production-grade streaming pipeline implementing VAD → STT → LLM → TTS flow with bi-directional WebSocket transport, handling simultaneous audio input processing and output streaming with sub-400ms barge-in response times.
- WebRTC VAD Integration with Fallback: Implemented robust voice activity detection using WebRTC VAD (aggressiveness=2) on 20ms PCM16 frames at 16kHz mono, with amplitude-based fallback for malformed frames ensuring 100% frame processing reliability despite network packet irregularities.
- Intelligent Barge-In Controller: Built event-driven cancellation system tracking TTS playback state with per-chunk timestamps, triggering instant cancellation (<400ms target) when user speech detected, measuring and reporting stop latency metrics for every interruption event.
- Latency-Masking Filler Injection: Designed adaptive filler manager that preemptively streams Malayalam interjections ("Mm...", "Aa...", "Sheri...") during LLM inference (>200ms threshold), with cancellable task queues and 300-600ms spacing to maintain conversational naturalness while hiding backend latency.
- Tone-Aware Response Generation: Architected dialogue orchestrator with tone profiles (reminder: 0.95x rate, -0.1 pitch; sales: 1.05x rate, +0.1 pitch) and context-aware prompt templates for script continuation, off-script Q&A recovery, and objection handling with Malayalam-English code-switching support.
- Strict WebSocket Protocol with Init Validation: Enforced connection-level validation requiring compliant init frame (mono, 16kHz, pcm16 codec) before audio streaming, with Pydantic schema validation, structured JSON events (partial_stt, filler_start, barge_in, utterance_start/end, metrics), and graceful error handling with specific close codes.
- Per-Session Metrics & Observability: Implemented comprehensive per-call tracking: TTFB (time-to-first-byte audio), barge-in stop latencies (histogram), invalid frame ratio for packet loss detection, and end-of-session summary events enabling SLA validation against <900ms p95 TTFB target.
- REST TTS Helper Service: Built authenticated POST /api/text-to-speech endpoint with Google Translate integration for English → Malayalam translation and gTTS synthesis, returning base64-encoded MP3 audio with lightweight browser demo (/demo) featuring embedded audio playback for rapid QA testing.
- Concurrent Load Harness: Developed automated test harness simulating 20 concurrent WebSocket connections across 5 scripted scenarios (barge-in, payment confirmation, sales pitch, unclear audio, filler timing), aggregating p50/p95 latency percentiles and median barge-in stop times for regression testing.
- Async-First Backend Design: Architected 100% async/await codebase using asyncio primitives (Event, Queue, create_task, to_thread) with ThreadPoolExecutor for blocking gTTS calls, maintaining event loop responsiveness under concurrent load with CORS middleware and optional auth bypass for local development.
Project Scale: 729 Python LOC across 6 modules, WebSocket streaming endpoint, REST TTS API, browser demo UI, automated load harness with 5 test scenarios, targeting <900ms TTFB (p95) and <400ms barge-in stop latency
Technical Metrics: 16kHz mono PCM16 audio, 20ms frame size (320 samples), WebRTC VAD with aggressiveness=2, filler injection at >200ms LLM latency, 2-filler max per wait cycle, session-level packet loss tracking via invalid frame ratio
Computer Use Backend
Technologies: FastAPI, SSE (Server-Sent Events), Anthropic API, Docker Compose, noVNC, asyncio
- Session-Centric Agent Orchestration: Wrapped Anthropic's computer-use agent loop in stateful FastAPI service with per-session locking (asyncio.Lock), durable in-memory chat history, and persistent screenshot storage enabling multi-turn agentic execution with full conversation context preservation.
- Real-Time Event Streaming: Streamed assistant deltas, tool invocations (computer, bash, text_editor), and tool results over Server-Sent Events (SSE) for live UI updates, handling message chunking and JSON serialization with proper SSE formatting (data: prefix, double newline delimiters).
- Lightweight Demo with Embedded VNC: Packaged HTML/JS demo with embedded noVNC client for real-time desktop viewing, Docker Compose orchestration (FastAPI backend + X11 VNC container), enabling rapid local simulation of computer-use scenarios without external infrastructure dependencies.