ML Engineer with less than a year in Generative AI & MLOps
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Machine Learning Engineer with hands-on experience building and deploying production ML systems. Specializing in Generative AI, LLM fine-tuning, RAG architectures, and MLOps. Proficient in TensorFlow, PyTorch, Transformers, and cloud-based LLM integration. Strong foundation in building scalable ML pipelines from research to production deployment.
DAVV, Indore
M.Tech · Computer Science & Engineering
August 1, 2023 – June 30, 2025
Oriental College of Technology, Bhopal
B.Tech · Computer Science & Engineering
August 1, 2018 – June 30, 2022
Softvyom Consulting Services Pvt. Ltd.
Machine Learning Developer (Internship Program)
August 1, 2024 – January 1, 2025
Indore, Madhya Pradesh, India
Zerodha KiteConnect WebSocket Simulator
January 1, 2025 – June 1, 2025
Technologies: Python, WebSocket, FastAPI, AWS EC2, systemd, binary struct protocol • Reverse-engineered and replicated Zerodha's binary QUOTE-mode tick protocol exactly - built a drop-in WebSocket server streaming live-simulated tick data for 10 NSE instruments at is intervals, eliminating the need for a paid KiteConnect subscription during development • Designed a dual-service architecture on AWS EC2: WebSocket server (port 8765) for binary tick streaming and a FastAPI management API (port 8766) for key provisioning, token rotation, and revocation — mirroring real Zerodha auth behaviour including close code 4001 on invalid credentials • Implemented NSE market-phase logic with live holiday detection and fallback layers - binary tick frames during market hours (09:15-15:30 IST), heartbeat bytes during off-hours, ensuring connection persistence identical to production Zerodha behaviour • Deployed with systemd service management, enabling auto-restart on crash; documented full peer-sharing guide with MySQL, CSV/JSON/Excel save clients and one-line FastAPI project integration
View ProjectAgriGPT-RAG System with Multi-Round Evaluation Pipeline
January 1, 2025 – June 1, 2025
Technologies: RAG, Pinecone, BGE embeddings, Groq (llama-3.3-70b), Gemma 4 26B, Ollama, FastAPI, EC2, RAGAS • Built a bilingual (Hindi/English) RAG system for Indian farmers to query agricultural schemes and crop guidance; designed and ran a 5-round evaluation framework covering retrieval quality (Hit@k, MRR) and generation quality (faithfulness, context precision/recall) using RAGAS-style LLM-as-judge methodology • Identified retrieval as the primary bottleneck: real Pinecone pipeline achieved MRR 0.396 and Hit@3 0.521 vs near-perfect generation (faithfulness 1.0) when correct context was supplied — proving model swaps were the wrong optimization priority • Ran a head-to-head benchmark of Groq llama-3.3-70b (cloud) vs Gemma 4 26B self-hosted on EC2 via Ollama; both achieved 97.5–100% pass rate on retrieval-success pairs with zero hallucinations, with Gemma's thinking-model reasoning extracting answers from loosely-matched context • Built a fully automated eval runner discovering installed Ollama models at startup, benchmarked 3 model sizes (1B → 32B); found qwen2.5:32b self-hosted matched Groq cloud llama-3.3-70b exactly at 67.2% pass rate with no per-call cost
View ProjectEvent Attendee Search Service
January 1, 2025 – June 1, 2025
Technologies: FastAPI, Pinecone, fastembed (BGE), Groq, Docker, GitHub Actions CI/CD, AWS EC2, systemd • Built a standalone semantic search microservice for event networking platforms — attendees register in plain text and become discoverable via natural language queries like "ML engineers in healthcare with less than 5 years experience" using BAAI/bge-small-en-v1.5 (384-dim) local ONNX embeddings with zero per-query embedding cost • Implemented a two-stage query pipeline: Groq llama-3.1-8b parses free-text queries into a semantic component + hard metadata filters (experience level, organisation) in ~200ms; Pinecone ANN search applies pre-filters before cosine similarity ranking, eliminating irrelevant results below a 0.25 score threshold • Designed a provider-agnostic LLM layer — switching from Groq cloud to self-hosted Gemma on EC2 requires two env-var changes and zero code modifications; packaged with Docker, nginx reverse proxy, and a seed script generating 100 synthetic attendees across 12 test query patterns • Set up GitHub Actions CI/CD pipeline for auto-deploy to EC2 on every push to main - SSH pull, dependency sync, and systemd service restart with no manual intervention
View ProjectOllama Intelligent Model Scheduler
January 1, 2025 – June 1, 2025
Technologies: Python, FastAPI, asyncio, Ollama, AWS g5.2xlarge (A10G GPU), nginx, systemd • Designed and deployed a VRAM-aware batch scheduling layer on AWS g5.2xlarge (A10G, 24GB VRAM) for multi-model Ollama inference - model-affinity reordering drains all requests for the loaded model before switching, reducing VRAM swap overhead by 80-90% under mixed-traffic conditions • Implemented VRAM-budget-aware model switching: before each model load, checks if current + incoming model VRAM exceeds the 22GB budget; forces Ollama eviction via keep_alive=0 and CUDA allocator sleep only when necessary — saving latency on cheap swaps (e.g. 2B → 4B) • Enforced a single asyncio worker guarantee for fully deterministic VRAM state — eliminating race conditions between concurrent model-load requests; designed for horizontal scaling via model-family queue sharding on multi-GPU instances • Exposed per-model latency metrics API separating execution latency from queue wait time - enabling distinction between scheduler contention and true model throughput degradation; deployed with nginx reverse proxy and systemd with graceful 30s drain on shutdown
View ProjectDeep Learning Specialization
DeepLearning.AI
June 1, 2026 – Present
Machine Learning Specialization
Stanford University
June 1, 2026 – Present
Cultural Fit Analysis
The candidate's portfolio showcases a strong passion for Machine Learning and AI, with several ambitious personal projects that go beyond typical academic exercises. The diversity of projects, from RAG systems to WebSocket simulators and model schedulers, indicates a broad technical curiosity and a proactive learning attitude. The detailed descriptions of challenges faced and solutions implemented suggest a transparent and collaborative approach to sharing knowledge. The focus on open-source tools (Ollama, FastAPI) and cloud platforms (AWS) aligns with modern industry practices.
Soft Skills & Operational Fit
The candidate demonstrates strong problem-solving skills through reverse-engineering protocols and identifying system bottlenecks. Their project descriptions highlight an ability to work independently on complex technical challenges and deliver end-to-end solutions. The focus on robust deployment (systemd, CI/CD) and performance monitoring indicates a strong operational mindset. The detailed evaluation frameworks used in projects suggest a methodical and data-driven approach to problem-solving.