Shivam Gautam

ML Infrastructure Engineer

https://talent.gravityer.com/shivam-gautam-7165635

ML Infrastructure Engineer with 3+ years in GPU/CPU Performance & LLM Serving

Fujitsu Research

Key Strengths

Deep expertise in GPU programming (CUDA) and performance optimization for ML workloads, evidenced by significant speedups (3.14x, 50x) and contributions to llama.cpp and OpenBLAS.
Strong understanding of low-level system architecture, including NUMA, memory hierarchies (SRAM, HBM), and SIMD (SVE, NEON) for high-performance computing.
Proven ability to diagnose and resolve complex performance bottlenecks in multi-threaded and distributed systems.
Experience with quantization techniques (Q4_0, Q8_0, Q2_K, Q3_K, Q4_HQQ) directly relevant to ML inference efficiency.
Academic background in Systems Software & Compilers (IIT Bombay) and practical experience align perfectly with an ML Infrastructure Engineer role.
Demonstrated ability to contribute to open-source projects (llama.cpp, OpenBLAS) and publish research (IEEE HIPC).

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's profile shows a strong fit for a high-performance, research-oriented engineering culture. Their academic projects and professional experience involve tackling challenging, low-level optimization problems, often pushing the boundaries of performance. Contributions to open-source projects like llama.cpp and OpenBLAS, along with publications, indicate a proactive and collaborative approach to problem-solving and a desire to contribute to the broader technical community. The diversity of projects, from static analysis to distributed KV stores and GPU acceleration, demonstrates a broad technical curiosity and adaptability.

Soft Skills & Operational Fit

The candidate demonstrates strong problem-solving skills, evidenced by their ability to diagnose and resolve complex performance issues in large-scale systems. Their contributions to open-source projects and publications suggest a collaborative and knowledge-sharing mindset. The detailed descriptions of their work indicate a methodical approach to engineering and a focus on measurable impact. Their recognition as 'Employee of the Quarter' and 'Fujitsu Grand Award' further highlight their operational excellence and impact.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

Projects

PageRank Acceleration

January 1, 2021 – January 1, 2023

Implemented GPU-accelerated PageRank (power iteration) in CUDA - CSR graph representation for coalesced warp memory access, shared memory reduction for convergence checks, and pointer-swap double buffering to eliminate data-race conditions; achieved ~50× speedup over CPU baseline on 1M-node graphs via full SM occupancy on T4.

Low-Level Static Analysis Engine for C++

January 1, 2021 – January 1, 2023

Built an LLVM-IR static analysis engine for C++ custom alias-analysis passes feeding SAT/SMT constraints into a bounded model checker, with full exception-handling (invoke/landingpad/resume) encoding; cut solve time 13% and verified 6/10 cases where CBMC crashed on all.

View Project

Cassandra-Inspired Distributed KV Store

January 1, 2021 – January 1, 2023

Built a leaderless 6-node KV store CHORD ring with finger tables for O(log n) gRPC routing, gossip protocol for decentralised membership and failure detection; LSM-tree write path with locked memtable, async SSTable flush, and background compaction; node addition/removal with automatic key rebalancing and cache with fine-grained locking.

Key Strengths

Deep expertise in GPU programming (CUDA) and performance optimization for ML workloads, evidenced by significant speedups (3.14x, 50x) and contributions to llama.cpp and OpenBLAS.
Strong understanding of low-level system architecture, including NUMA, memory hierarchies (SRAM, HBM), and SIMD (SVE, NEON) for high-performance computing.
Proven ability to diagnose and resolve complex performance bottlenecks in multi-threaded and distributed systems.
Experience with quantization techniques (Q4_0, Q8_0, Q2_K, Q3_K, Q4_HQQ) directly relevant to ML inference efficiency.
Academic background in Systems Software & Compilers (IIT Bombay) and practical experience align perfectly with an ML Infrastructure Engineer role.
Demonstrated ability to contribute to open-source projects (llama.cpp, OpenBLAS) and publish research (IEEE HIPC).

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Shivam Gautam

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit