Vysakh Ramakrishnan

Vysakh Ramakrishnan

Full-Stack AI Engineer

Full-stack AI engineer with experience across backend systems, ML deployment, and data pipelines. Strong background in TypeScript, Node.js, Next.js, and React, building scalable microservices, RESTful APIs, and productionizing AI systems. Skilled in PostgreSQL, AWS, Docker, and automation/testing with a focus on reliability and performance.

+44-7788291429

Work Experience

AI Engineer Intern

Sony Europe

  • Designed and implemented a temporally-aware semantic retrieval service using TypeScript and Node.js with scalable backend components, RESTful API boundaries, and automated evaluation workflows.
  • Developed custom differentiable IDF frameworks using neural reconstruction, contributing to production-grade pipelines, monitoring, and performance optimization across SLURM distributed systems.

Software Engineer (AI / Full Stack)

Kontex.dev

  • Built and architected a semantic retrieval system over TurboBuffer using TypeScript and Node.js, implementing vector indexing, metadata-augmented schemas, and query-time filtering, reducing end-to-end inference latency by 50%.
  • Designed and integrated Vault-based information stores for SDK, embedding-aware secure document ingestion, structured metadata management, and consistent retrieval flows across downstream AI services.
  • Developed a real-time avatar streaming web application using Next.js, TypeScript, and React with LiveKit WebRTC integration, server-side JWT token generation via RESTful APIs, and WebSocket-based media transport for interactive virtual presence demos.

Research Scholar

Massachusetts General Hospital, Harvard Medical School

Advised by Prof. Sandeep Mannava, Dr. Lana Schumacher

  • Developed real-time segmentation for tracking and localizing forceps for robotic surgery video, achieving 81–89% IoU for multi-class anatomy segmentation using our memory-based Transformer model.
  • Built a Visual Question Localization system with task-specific heads to detect and localize anatomic features in surgical scenes, integrating the multi-modal system into a query-driven interface.

Senior Full-Stack Engineer

Capgemini Technology Services India Limited

  • Implemented end-to-end features across SAP UI, ABAP business logic, RESTful API integrations, and SQL-backed tablespaces, delivering production workflows used daily by finance teams.
  • Triaged data end-to-end (screen → ABAP logic → database), fixed ledger mismatches via SAP Notes and targeted data corrections, and improved reliability for high-volume transactions.

Selected Projects

Contract Clause Analyzer Live Demo GitHub

Technologies: FastAPI, React 18, TypeScript, Node.js 18, PostgreSQL, ChromaDB, Claude API, Docker, Nginx, SQLAlchemy, Vite

  • AI-Powered Analysis Pipeline: Built production-grade async FastAPI service with multi-step deterministic pipeline combining regex-based clause extraction, RAG retrieval from versioned playbook chunks, heuristic deviation scoring, and optional LLM validation via Claude Sonnet 4 with streaming responses.
  • Advanced RAG Implementation: Architected immutable playbook versioning system with ChromaDB vector store (all-MiniLM-L6-v2 embeddings), 800-word sliding window chunking, grounding enforcement requiring chunk citations for every finding, and hot reindexing without downtime.
  • Real-Time Streaming Architecture: Implemented Server-Sent Events (SSE) with custom EventBus using asyncio.Queue for live status updates, partial findings, and final results, with Nginx configured for unbuffered SSE passthrough and 100% async/await throughout backend.
  • Multi-Layer Security: Hardened service with prompt-injection defense (regex content filtering), per-IP rate limiting (SlowAPI), strict Pydantic schema validation, grounding checks dropping ungrounded findings, and CORS middleware for controlled access.
  • Database Flexibility: Designed auto-detection system using PostgreSQL 15 (production) with async SQLAlchemy ORM in Docker, SQLite with aiosqlite and WAL mode for local development, handling postgres+asyncpg → postgresql+asyncpg conversion and automatic data directory creation.
  • Cost & Usage Tracking: Implemented token counting for Claude API calls with estimated cost calculation ($0.000015/input, $0.000075/output token), per-analysis usage metrics surfaced in UI and API responses, enabling offline testing with heuristic fallback when API key not configured.
  • Production Deployment: Deployed on AWS EC2 with Docker Compose orchestration (4 services: db, backend, frontend, nginx), multi-stage builds with Node 20-slim and Nginx Alpine, health checks with automatic playbook seeding, and graceful embedding rebuild on volume loss.
  • Testing & Quality: Built comprehensive test suite with pytest-asyncio, FastAPI TestClient, integration tests for health checks, analysis flow, guardrail triggers, rate limiting, and offline mode (BYPASS_DB_FOR_TESTS), with Python 3.12 compatibility patches.

Project Scale: 13 Python modules (~1,200 LOC), 10 RESTful API endpoints, 6 React TypeScript components, 7 clause types analyzed (payment terms, retainage, notice periods, indemnification, termination, dispute resolution, liquidated damages)

Malayalam Voice Agent (Streaming API) GitHub

Technologies: FastAPI, WebSockets, WebRTC VAD, Google Translate API, gTTS, asyncio, Pydantic, NumPy, Uvicorn

  • Full-Duplex Voice Pipeline Architecture: Engineered production-grade streaming pipeline implementing VAD → STT → LLM → TTS flow with bi-directional WebSocket transport, handling simultaneous audio input processing and output streaming with sub-400ms barge-in response times.
  • WebRTC VAD Integration with Fallback: Implemented robust voice activity detection using WebRTC VAD (aggressiveness=2) on 20ms PCM16 frames at 16kHz mono, with amplitude-based fallback for malformed frames ensuring 100% frame processing reliability despite network packet irregularities.
  • Intelligent Barge-In Controller: Built event-driven cancellation system tracking TTS playback state with per-chunk timestamps, triggering instant cancellation (<400ms target) when user speech detected, measuring and reporting stop latency metrics for every interruption event.
  • Latency-Masking Filler Injection: Designed adaptive filler manager that preemptively streams Malayalam interjections ("Mm...", "Aa...", "Sheri...") during LLM inference (>200ms threshold), with cancellable task queues and 300-600ms spacing to maintain conversational naturalness while hiding backend latency.
  • Tone-Aware Response Generation: Architected dialogue orchestrator with tone profiles (reminder: 0.95x rate, -0.1 pitch; sales: 1.05x rate, +0.1 pitch) and context-aware prompt templates for script continuation, off-script Q&A recovery, and objection handling with Malayalam-English code-switching support.
  • Strict WebSocket Protocol with Init Validation: Enforced connection-level validation requiring compliant init frame (mono, 16kHz, pcm16 codec) before audio streaming, with Pydantic schema validation, structured JSON events (partial_stt, filler_start, barge_in, utterance_start/end, metrics), and graceful error handling with specific close codes.
  • Per-Session Metrics & Observability: Implemented comprehensive per-call tracking: TTFB (time-to-first-byte audio), barge-in stop latencies (histogram), invalid frame ratio for packet loss detection, and end-of-session summary events enabling SLA validation against <900ms p95 TTFB target.
  • REST TTS Helper Service: Built authenticated POST /api/text-to-speech endpoint with Google Translate integration for English → Malayalam translation and gTTS synthesis, returning base64-encoded MP3 audio with lightweight browser demo (/demo) featuring embedded audio playback for rapid QA testing.
  • Concurrent Load Harness: Developed automated test harness simulating 20 concurrent WebSocket connections across 5 scripted scenarios (barge-in, payment confirmation, sales pitch, unclear audio, filler timing), aggregating p50/p95 latency percentiles and median barge-in stop times for regression testing.
  • Async-First Backend Design: Architected 100% async/await codebase using asyncio primitives (Event, Queue, create_task, to_thread) with ThreadPoolExecutor for blocking gTTS calls, maintaining event loop responsiveness under concurrent load with CORS middleware and optional auth bypass for local development.

Project Scale: 729 Python LOC across 6 modules, WebSocket streaming endpoint, REST TTS API, browser demo UI, automated load harness with 5 test scenarios, targeting <900ms TTFB (p95) and <400ms barge-in stop latency

Technical Metrics: 16kHz mono PCM16 audio, 20ms frame size (320 samples), WebRTC VAD with aggressiveness=2, filler injection at >200ms LLM latency, 2-filler max per wait cycle, session-level packet loss tracking via invalid frame ratio

Computer Use Backend

Technologies: FastAPI, SSE (Server-Sent Events), Anthropic API, Docker Compose, noVNC, asyncio

  • Session-Centric Agent Orchestration: Wrapped Anthropic's computer-use agent loop in stateful FastAPI service with per-session locking (asyncio.Lock), durable in-memory chat history, and persistent screenshot storage enabling multi-turn agentic execution with full conversation context preservation.
  • Real-Time Event Streaming: Streamed assistant deltas, tool invocations (computer, bash, text_editor), and tool results over Server-Sent Events (SSE) for live UI updates, handling message chunking and JSON serialization with proper SSE formatting (data: prefix, double newline delimiters).
  • Lightweight Demo with Embedded VNC: Packaged HTML/JS demo with embedded noVNC client for real-time desktop viewing, Docker Compose orchestration (FastAPI backend + X11 VNC container), enabling rapid local simulation of computer-use scenarios without external infrastructure dependencies.

Education

Master (M2) in Artificial Intelligence and Advanced Visual Computing

Ecole Polytechnique, France

Relevant Courses: NLP, Deep Reinforcement Learning, Computer Graphics, Computer Vision

Post Graduate Diploma in AI, ML, and Leadership

Plaksha University

Relevant Courses: Machine Learning, NLP, Computer Vision

Technical Skills

Backend / Full Stack

TypeScript, Node.js, Next.js, React, Python, FastAPI, RESTful APIs, MongoDB, PostgreSQL, Redis, SQLAlchemy, Uvicorn, WebSockets, SSE

Machine Learning & AI

PyTorch, TensorFlow, Scikit-learn, NumPy, OpenCV, LangChain, ChromaDB, Claude API, RAG, Vector Embeddings, Voice Activity Detection (VAD)

Tools & DevOps

Docker, Docker Compose, AWS (EC2), Git, GitHub Actions, Nginx, Pytest, Vite, asyncio, Pydantic, CORS, uv (Python package manager)

Audio & Speech Technologies

WebRTC VAD, PCM16 Audio Processing, gTTS, Google Translate API, Streaming Audio Pipelines, Barge-in Systems, Real-time Voice Processing