
ML Infrastructure Engineer with 3+ years in GPU/CPU Performance & LLM Serving
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
GPU systems and ML inference infrastructure engineer. I make models run faster and cheaper on real hardware CUDA kernels, GPU/CPU performance, NUMA-aware multi-node systems, quantization, and LLM serving runtimes. Production contributions to llama.cpp and OpenBLAS. First-authored IEEE HIPC M.Tech (Systems & Compilers), IIT Bombay
IIT Bombay
M.Tech · Computer Science & Engineering
August 1, 2021 – June 30, 2023
Fujitsu Research
Software Development Engineer
June 1, 2023 – Present
India
PageRank Acceleration
January 1, 2021 – January 1, 2023
Implemented GPU-accelerated PageRank (power iteration) in CUDA - CSR graph representation for coalesced warp memory access, shared memory reduction for convergence checks, and pointer-swap double buffering to eliminate data-race conditions; achieved ~50× speedup over CPU baseline on 1M-node graphs via full SM occupancy on T4.
Low-Level Static Analysis Engine for C++
January 1, 2021 – January 1, 2023
Built an LLVM-IR static analysis engine for C++ custom alias-analysis passes feeding SAT/SMT constraints into a bounded model checker, with full exception-handling (invoke/landingpad/resume) encoding; cut solve time 13% and verified 6/10 cases where CBMC crashed on all.
View ProjectCassandra-Inspired Distributed KV Store
January 1, 2021 – January 1, 2023
Built a leaderless 6-node KV store CHORD ring with finger tables for O(log n) gRPC routing, gossip protocol for decentralised membership and failure detection; LSM-tree write path with locked memtable, async SSTable flush, and background compaction; node addition/removal with automatic key rebalancing and cache with fine-grained locking.
oneDAL Optimization for ARM SVE
IEEE HIPC 2024
January 1, 2024 – Present
Maximizing Multi-Core Efficiency in BLAS: A Scalable Architecture for Performance.
IEEE HIPC 2024 / arXiv
January 1, 2024 – Present
Cultural Fit Analysis
The candidate's profile shows a strong fit for a high-performance, research-oriented engineering culture. Their academic projects and professional experience involve tackling challenging, low-level optimization problems, often pushing the boundaries of performance. Contributions to open-source projects like llama.cpp and OpenBLAS, along with publications, indicate a proactive and collaborative approach to problem-solving and a desire to contribute to the broader technical community. The diversity of projects, from static analysis to distributed KV stores and GPU acceleration, demonstrates a broad technical curiosity and adaptability.
Soft Skills & Operational Fit
The candidate demonstrates strong problem-solving skills, evidenced by their ability to diagnose and resolve complex performance issues in large-scale systems. Their contributions to open-source projects and publications suggest a collaborative and knowledge-sharing mindset. The detailed descriptions of their work indicate a methodical approach to engineering and a focus on measurable impact. Their recognition as 'Employee of the Quarter' and 'Fujitsu Grand Award' further highlight their operational excellence and impact.