Deepanshu Sonwane

ML Engineer

https://talent.gravityer.com/deepanshu-sonwane

ML Engineer with less than a year in Generative AI & MLOps

Softvyom Consulting Services Pvt. Ltd.

Key Strengths

Extensive hands-on experience in Generative AI, LLM fine-tuning, and RAG architectures, directly aligning with the ML Engineer target role.
Proficient in MLOps practices, including Docker, CI/CD, model versioning (MLflow), and deployment on AWS EC2, demonstrating ability to take ML models to production.
Strong system design skills, evidenced by building scalable microservices, dual-service architectures, and VRAM-aware model schedulers.
Deep understanding of performance optimization, including INT8 quantization, latency reduction, and bottleneck identification in RAG systems.
Demonstrated ability to benchmark and evaluate LLMs and RAG systems using metrics like MRR, Hit@k, faithfulness, and context precision/recall.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's portfolio showcases a strong passion for Machine Learning and AI, with several ambitious personal projects that go beyond typical academic exercises. The diversity of projects, from RAG systems to WebSocket simulators and model schedulers, indicates a broad technical curiosity and a proactive learning attitude. The detailed descriptions of challenges faced and solutions implemented suggest a transparent and collaborative approach to sharing knowledge. The focus on open-source tools (Ollama, FastAPI) and cloud platforms (AWS) aligns with modern industry practices.

Soft Skills & Operational Fit

The candidate demonstrates strong problem-solving skills through reverse-engineering protocols and identifying system bottlenecks. Their project descriptions highlight an ability to work independently on complex technical challenges and deliver end-to-end solutions. The focus on robust deployment (systemd, CI/CD) and performance monitoring indicates a strong operational mindset. The detailed evaluation frameworks used in projects suggest a methodical and data-driven approach to problem-solving.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

Projects

Zerodha KiteConnect WebSocket Simulator

January 1, 2025 – June 1, 2025

Technologies: Python, WebSocket, FastAPI, AWS EC2, systemd, binary struct protocol • Reverse-engineered and replicated Zerodha's binary QUOTE-mode tick protocol exactly - built a drop-in WebSocket server streaming live-simulated tick data for 10 NSE instruments at is intervals, eliminating the need for a paid KiteConnect subscription during development • Designed a dual-service architecture on AWS EC2: WebSocket server (port 8765) for binary tick streaming and a FastAPI management API (port 8766) for key provisioning, token rotation, and revocation — mirroring real Zerodha auth behaviour including close code 4001 on invalid credentials • Implemented NSE market-phase logic with live holiday detection and fallback layers - binary tick frames during market hours (09:15-15:30 IST), heartbeat bytes during off-hours, ensuring connection persistence identical to production Zerodha behaviour • Deployed with systemd service management, enabling auto-restart on crash; documented full peer-sharing guide with MySQL, CSV/JSON/Excel save clients and one-line FastAPI project integration

View Project

AgriGPT-RAG System with Multi-Round Evaluation Pipeline

January 1, 2025 – June 1, 2025

Technologies: RAG, Pinecone, BGE embeddings, Groq (llama-3.3-70b), Gemma 4 26B, Ollama, FastAPI, EC2, RAGAS • Built a bilingual (Hindi/English) RAG system for Indian farmers to query agricultural schemes and crop guidance; designed and ran a 5-round evaluation framework covering retrieval quality (Hit@k, MRR) and generation quality (faithfulness, context precision/recall) using RAGAS-style LLM-as-judge methodology • Identified retrieval as the primary bottleneck: real Pinecone pipeline achieved MRR 0.396 and Hit@3 0.521 vs near-perfect generation (faithfulness 1.0) when correct context was supplied — proving model swaps were the wrong optimization priority • Ran a head-to-head benchmark of Groq llama-3.3-70b (cloud) vs Gemma 4 26B self-hosted on EC2 via Ollama; both achieved 97.5–100% pass rate on retrieval-success pairs with zero hallucinations, with Gemma's thinking-model reasoning extracting answers from loosely-matched context • Built a fully automated eval runner discovering installed Ollama models at startup, benchmarked 3 model sizes (1B → 32B); found qwen2.5:32b self-hosted matched Groq cloud llama-3.3-70b exactly at 67.2% pass rate with no per-call cost

View Project

Event Attendee Search Service

January 1, 2025 – June 1, 2025

Technologies: FastAPI, Pinecone, fastembed (BGE), Groq, Docker, GitHub Actions CI/CD, AWS EC2, systemd • Built a standalone semantic search microservice for event networking platforms — attendees register in plain text and become discoverable via natural language queries like "ML engineers in healthcare with less than 5 years experience" using BAAI/bge-small-en-v1.5 (384-dim) local ONNX embeddings with zero per-query embedding cost • Implemented a two-stage query pipeline: Groq llama-3.1-8b parses free-text queries into a semantic component + hard metadata filters (experience level, organisation) in ~200ms; Pinecone ANN search applies pre-filters before cosine similarity ranking, eliminating irrelevant results below a 0.25 score threshold • Designed a provider-agnostic LLM layer — switching from Groq cloud to self-hosted Gemma on EC2 requires two env-var changes and zero code modifications; packaged with Docker, nginx reverse proxy, and a seed script generating 100 synthetic attendees across 12 test query patterns • Set up GitHub Actions CI/CD pipeline for auto-deploy to EC2 on every push to main - SSH pull, dependency sync, and systemd service restart with no manual intervention

View Project

Ollama Intelligent Model Scheduler

January 1, 2025 – June 1, 2025

Technologies: Python, FastAPI, asyncio, Ollama, AWS g5.2xlarge (A10G GPU), nginx, systemd • Designed and deployed a VRAM-aware batch scheduling layer on AWS g5.2xlarge (A10G, 24GB VRAM) for multi-model Ollama inference - model-affinity reordering drains all requests for the loaded model before switching, reducing VRAM swap overhead by 80-90% under mixed-traffic conditions • Implemented VRAM-budget-aware model switching: before each model load, checks if current + incoming model VRAM exceeds the 22GB budget; forces Ollama eviction via keep_alive=0 and CUDA allocator sleep only when necessary — saving latency on cheap swaps (e.g. 2B → 4B) • Enforced a single asyncio worker guarantee for fully deterministic VRAM state — eliminating race conditions between concurrent model-load requests; designed for horizontal scaling via model-family queue sharding on multi-GPU instances • Exposed per-model latency metrics API separating execution latency from queue wait time - enabling distinction between scheduler contention and true model throughput degradation; deployed with nginx reverse proxy and systemd with graceful 30s drain on shutdown

View Project

Key Strengths

Extensive hands-on experience in Generative AI, LLM fine-tuning, and RAG architectures, directly aligning with the ML Engineer target role.
Proficient in MLOps practices, including Docker, CI/CD, model versioning (MLflow), and deployment on AWS EC2, demonstrating ability to take ML models to production.
Strong system design skills, evidenced by building scalable microservices, dual-service architectures, and VRAM-aware model schedulers.
Deep understanding of performance optimization, including INT8 quantization, latency reduction, and bottleneck identification in RAG systems.
Demonstrated ability to benchmark and evaluate LLMs and RAG systems using metrics like MRR, Hit@k, faithfulness, and context precision/recall.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Deepanshu Sonwane

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit