Vysakh Ramakrishnan

Work Experience

AI Engineer Intern

Sony Europe

June 2025 - November 2025 Stuttgart, Germany

Designed and implemented a temporally-aware semantic retrieval service using TypeScript and Node.js with scalable backend components, RESTful API boundaries, and automated evaluation workflows.
Developed custom differentiable IDF frameworks using neural reconstruction, contributing to production-grade pipelines, monitoring, and performance optimization across SLURM distributed systems.

Software Engineer (AI / Full Stack)

Kontex.dev

September 2025 - October 2025 Freelance, Germany (Remote)

Built and architected a semantic retrieval system over TurboBuffer using TypeScript and Node.js, implementing vector indexing, metadata-augmented schemas, and query-time filtering, reducing end-to-end inference latency by 50%.
Designed and integrated Vault-based information stores for SDK, embedding-aware secure document ingestion, structured metadata management, and consistent retrieval flows across downstream AI services.
Developed a real-time avatar streaming web application using Next.js, TypeScript, and React with LiveKit WebRTC integration, server-side JWT token generation via RESTful APIs, and WebSocket-based media transport for interactive virtual presence demos.

Research Scholar

Massachusetts General Hospital, Harvard Medical School

Advised by Prof. Sandeep Mannava, Dr. Lana Schumacher

August 2023 - March 2024 Boston, Massachusetts, USA

Developed real-time segmentation for tracking and localizing forceps for robotic surgery video, achieving 81–89% IoU for multi-class anatomy segmentation using our memory-based Transformer model.
Built a Visual Question Localization system with task-specific heads to detect and localize anatomic features in surgical scenes, integrating the multi-modal system into a query-driven interface.

Senior Full-Stack Engineer

Capgemini Technology Services India Limited

September 2019 - May 2022 Bangalore, India

Implemented end-to-end features across SAP UI, ABAP business logic, RESTful API integrations, and SQL-backed tablespaces, delivering production workflows used daily by finance teams.
Triaged data end-to-end (screen → ABAP logic → database), fixed ledger mismatches via SAP Notes and targeted data corrections, and improved reliability for high-volume transactions.

Selected Projects

Contract Clause Analyzer Live Demo GitHub

Technologies: FastAPI, React 18, TypeScript, Node.js 18, PostgreSQL, ChromaDB, Claude API, Docker, Nginx, SQLAlchemy, Vite

AI-Powered Analysis Pipeline: Built production-grade async FastAPI service with multi-step deterministic pipeline combining regex-based clause extraction, RAG retrieval from versioned playbook chunks, heuristic deviation scoring, and optional LLM validation via Claude Sonnet 4 with streaming responses.
Advanced RAG Implementation: Architected immutable playbook versioning system with ChromaDB vector store (all-MiniLM-L6-v2 embeddings), 800-word sliding window chunking, grounding enforcement requiring chunk citations for every finding, and hot reindexing without downtime.
Real-Time Streaming Architecture: Implemented Server-Sent Events (SSE) with custom EventBus using asyncio.Queue for live status updates, partial findings, and final results, with Nginx configured for unbuffered SSE passthrough and 100% async/await throughout backend.
Multi-Layer Security: Hardened service with prompt-injection defense (regex content filtering), per-IP rate limiting (SlowAPI), strict Pydantic schema validation, grounding checks dropping ungrounded findings, and CORS middleware for controlled access.
Database Flexibility: Designed auto-detection system using PostgreSQL 15 (production) with async SQLAlchemy ORM in Docker, SQLite with aiosqlite and WAL mode for local development, handling postgres+asyncpg → postgresql+asyncpg conversion and automatic data directory creation.
Cost & Usage Tracking: Implemented token counting for Claude API calls with estimated cost calculation ($0.000015/input, $0.000075/output token), per-analysis usage metrics surfaced in UI and API responses, enabling offline testing with heuristic fallback when API key not configured.
Production Deployment: Deployed on AWS EC2 with Docker Compose orchestration (4 services: db, backend, frontend, nginx), multi-stage builds with Node 20-slim and Nginx Alpine, health checks with automatic playbook seeding, and graceful embedding rebuild on volume loss.
Testing & Quality: Built comprehensive test suite with pytest-asyncio, FastAPI TestClient, integration tests for health checks, analysis flow, guardrail triggers, rate limiting, and offline mode (BYPASS_DB_FOR_TESTS), with Python 3.12 compatibility patches.

Project Scale: 13 Python modules (~1,200 LOC), 10 RESTful API endpoints, 6 React TypeScript components, 7 clause types analyzed (payment terms, retainage, notice periods, indemnification, termination, dispute resolution, liquidated damages)

Malayalam Voice Agent (Streaming API) GitHub

Technologies: FastAPI, WebSockets, WebRTC VAD, Google Translate API, gTTS, asyncio, Pydantic, NumPy, Uvicorn

Full-Duplex Voice Pipeline Architecture: Engineered production-grade streaming pipeline implementing VAD → STT → LLM → TTS flow with bi-directional WebSocket transport, handling simultaneous audio input processing and output streaming with sub-400ms barge-in response times.
WebRTC VAD Integration with Fallback: Implemented robust voice activity detection using WebRTC VAD (aggressiveness=2) on 20ms PCM16 frames at 16kHz mono, with amplitude-based fallback for malformed frames ensuring 100% frame processing reliability despite network packet irregularities.
Intelligent Barge-In Controller: Built event-driven cancellation system tracking TTS playback state with per-chunk timestamps, triggering instant cancellation (<400ms target) when user speech detected, measuring and reporting stop latency metrics for every interruption event.
Latency-Masking Filler Injection: Designed adaptive filler manager that preemptively streams Malayalam interjections ("Mm...", "Aa...", "Sheri...") during LLM inference (>200ms threshold), with cancellable task queues and 300-600ms spacing to maintain conversational naturalness while hiding backend latency.
Tone-Aware Response Generation: Architected dialogue orchestrator with tone profiles (reminder: 0.95x rate, -0.1 pitch; sales: 1.05x rate, +0.1 pitch) and context-aware prompt templates for script continuation, off-script Q&A recovery, and objection handling with Malayalam-English code-switching support.
Strict WebSocket Protocol with Init Validation: Enforced connection-level validation requiring compliant init frame (mono, 16kHz, pcm16 codec) before audio streaming, with Pydantic schema validation, structured JSON events (partial_stt, filler_start, barge_in, utterance_start/end, metrics), and graceful error handling with specific close codes.
Per-Session Metrics & Observability: Implemented comprehensive per-call tracking: TTFB (time-to-first-byte audio), barge-in stop latencies (histogram), invalid frame ratio for packet loss detection, and end-of-session summary events enabling SLA validation against <900ms p95 TTFB target.
REST TTS Helper Service: Built authenticated POST /api/text-to-speech endpoint with Google Translate integration for English → Malayalam translation and gTTS synthesis, returning base64-encoded MP3 audio with lightweight browser demo (/demo) featuring embedded audio playback for rapid QA testing.
Concurrent Load Harness: Developed automated test harness simulating 20 concurrent WebSocket connections across 5 scripted scenarios (barge-in, payment confirmation, sales pitch, unclear audio, filler timing), aggregating p50/p95 latency percentiles and median barge-in stop times for regression testing.
Async-First Backend Design: Architected 100% async/await codebase using asyncio primitives (Event, Queue, create_task, to_thread) with ThreadPoolExecutor for blocking gTTS calls, maintaining event loop responsiveness under concurrent load with CORS middleware and optional auth bypass for local development.

Project Scale: 729 Python LOC across 6 modules, WebSocket streaming endpoint, REST TTS API, browser demo UI, automated load harness with 5 test scenarios, targeting <900ms TTFB (p95) and <400ms barge-in stop latency

Technical Metrics: 16kHz mono PCM16 audio, 20ms frame size (320 samples), WebRTC VAD with aggressiveness=2, filler injection at >200ms LLM latency, 2-filler max per wait cycle, session-level packet loss tracking via invalid frame ratio

Computer Use Backend

Technologies: FastAPI, SSE (Server-Sent Events), Anthropic API, Docker Compose, noVNC, asyncio

Session-Centric Agent Orchestration: Wrapped Anthropic's computer-use agent loop in stateful FastAPI service with per-session locking (asyncio.Lock), durable in-memory chat history, and persistent screenshot storage enabling multi-turn agentic execution with full conversation context preservation.
Real-Time Event Streaming: Streamed assistant deltas, tool invocations (computer, bash, text_editor), and tool results over Server-Sent Events (SSE) for live UI updates, handling message chunking and JSON serialization with proper SSE formatting (data: prefix, double newline delimiters).
Lightweight Demo with Embedded VNC: Packaged HTML/JS demo with embedded noVNC client for real-time desktop viewing, Docker Compose orchestration (FastAPI backend + X11 VNC container), enabling rapid local simulation of computer-use scenarios without external infrastructure dependencies.

Education

Master (M2) in Artificial Intelligence and Advanced Visual Computing

Ecole Polytechnique, France

CGPA: 3.79

Relevant Courses: NLP, Deep Reinforcement Learning, Computer Graphics, Computer Vision

Post Graduate Diploma in AI, ML, and Leadership

Plaksha University

CGPA: 3.98

Relevant Courses: Machine Learning, NLP, Computer Vision

Technical Skills

Backend / Full Stack

TypeScript, Node.js, Next.js, React, Python, FastAPI, RESTful APIs, MongoDB, PostgreSQL, Redis, SQLAlchemy, Uvicorn, WebSockets, SSE

Machine Learning & AI

PyTorch, TensorFlow, Scikit-learn, NumPy, OpenCV, LangChain, ChromaDB, Claude API, RAG, Vector Embeddings, Voice Activity Detection (VAD)

Tools & DevOps

Docker, Docker Compose, AWS (EC2), Git, GitHub Actions, Nginx, Pytest, Vite, asyncio, Pydantic, CORS, uv (Python package manager)

Audio & Speech Technologies

WebRTC VAD, PCM16 Audio Processing, gTTS, Google Translate API, Streaming Audio Pipelines, Barge-in Systems, Real-time Voice Processing