KTU

KHAJA TABARAK UDDIN

Devops Engineer

https://talent.gravityer.com/khaja-tabarak-uddin

Devops Engineer with less than a year in Cloud & Infrastructure Automation

Key Strengths

Extensive experience in designing and implementing production-grade cloud-native microservices platforms on AWS using Kubernetes (EKS) and Terraform.
Strong proficiency in CI/CD pipeline automation with GitHub Actions, Docker, Helm, and ArgoCD (GitOps).
Demonstrated ability to build highly available, scalable, and fault-tolerant systems with defined RTO/RPO objectives.
Solid understanding and implementation of observability stacks (Prometheus, Grafana, CloudWatch) and SRE principles (SLI, SLO, RCA).
Experience with secure secret management (OIDC, IRSA) and automated security validation in pipelines.
Proven ability to optimize CI/CD pipelines and troubleshoot complex infrastructure and networking issues.
Relevant experience with AI infrastructure, including serverless pipelines, vector databases (pgvector), and embedding workflows.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate's project diversity, including a general cloud-native platform and an AI infrastructure platform, shows adaptability and a broad skill set. Their experience in an open-source, multi-contributor environment and emphasis on best practices (IaC, Git branching) suggest a collaborative and quality-focused mindset. The detailed descriptions of problem-solving and optimization efforts indicate a proactive and improvement-oriented approach, which aligns well with a dynamic technical culture.

Soft Skills & Operational Fit

The candidate demonstrates strong operational fit through their detailed descriptions of incident troubleshooting, root cause analysis, and adherence to SRE principles (SLA, SLO, SLI, RTO, RPO). Their collaboration in open-source, multi-contributor environments and standardization efforts with development teams indicate good teamwork and communication skills. The focus on automation, reliability, and performance optimization aligns well with a senior DevOps role.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

Cloud & infrastructure intern with a strong background in reliability engineering, infrastructure automation, CI/CD, containerized platforms, and Linux-based systems. Experienced in designing, deploying, and operating distributed systems for AI applications, including vector databases, serverless pipelines, and microservices. Proven ability to improve system availability, reduce deployment failures, and optimize performance through automation, monitoring, and root cause analysis. Proficient in monitoring system health using Prometheus, Grafana, and CloudWatch, implementing GitOps-based continuous delivery with ArgoCD for Kubernetes, and designing systems aligned with SLA/SLO/SLI/RTO/RPO objectives.

Top Skills

High AvailabilityCiCd

Projects

Production-Ready, 99.9%+ НА Kubernetes Platform with Canary Releases

June 29, 2026 – Present

Designed and delivered a production-grade cloud-native microservices platform structured across three independent repositories for infrastructure, applications, and GitOps-based deployments. Implemented a Terraform-based AWS infrastructure stack provisioning VPC, EKS, RDS, IAM, IRSA, and multi-environment configurations. Dev environment was deployed via the dev branch for iterative infrastructure testing before staging and production. Developed and containerized polyglot microservices (Java, Python, Node.js) with automated CI pipelines, Docker builds, vulnerability scanning, and immutable image publishing through the dev branch. Implemented Helm-based Kubernetes deployments with ArgoCD GitOps workflows for declarative environment configuration and automated state synchronization. Engineered the platform for 99.9%+ availability, enabling zero-downtime deployments and low-latency performance under peak traffic. Designed a horizontally scalable architecture capable of handling 5–10× traffic spikes without service degradation. Built a fault-tolerant, self-healing Kubernetes system eliminating single points of failure across compute and database layers. Defined recovery objectives of RTO < 17 minutes and near-zero RPO for rapid restoration and minimal data loss. Implemented OIDC federation and IAM Roles for Service Accounts (IRSA) to eliminate hardcoded secrets and enforce least-privilege AWS access. Designed Kubernetes init jobs to automate database schema ingestion and initialization during service startup. Implemented and validated canary deployment strategies for controlled production rollouts with zero user impact. Built a comprehensive observability stack enabling real-time monitoring, metrics collection, alerting, and proactive incident detection. Eliminated credential exposure risks by enforcing secure secret management and removing hardcoded secrets from services and pipelines. Introduced automated security validation gates in CI/CD pipelines to prevent vulnerable workloads from reaching production. Established a resilient and scalable platform ensuring high availability and responsiveness under real-world load. Collaborated in an open-source, multi-contributor environment, following production-grade workflows, Git branching strategies, and IaC best practices.

View Project

AI Infrastructure Platform

June 29, 2026 – Present

Owned the reliability and deployment of a production-grade AI application handling document ingestion, vector embedding generation, and semantic retrieval. Implemented and automated AWS infrastructure using Terraform, ensuring isolation, scalability, and fault tolerance. Built serverless ingestion and processing pipelines using AWS Lambda and S3, supporting AI deployments. Implemented monitoring, alerting, and automated health checks, achieving 99.9% availability during embedding workflows with PostgreSQL (pgvector). Led troubleshooting and root cause analysis for deployment and runtime issues, introducing retry mechanisms and rollback-safe CI/CD pipelines. Optimized CI/CD pipelines by introducing caching and parallel execution, reducing build and deployment time by 30%.

View Project

Key Strengths

Extensive experience in designing and implementing production-grade cloud-native microservices platforms on AWS using Kubernetes (EKS) and Terraform.
Strong proficiency in CI/CD pipeline automation with GitHub Actions, Docker, Helm, and ArgoCD (GitOps).
Demonstrated ability to build highly available, scalable, and fault-tolerant systems with defined RTO/RPO objectives.
Solid understanding and implementation of observability stacks (Prometheus, Grafana, CloudWatch) and SRE principles (SLI, SLO, RCA).
Experience with secure secret management (OIDC, IRSA) and automated security validation in pipelines.
Proven ability to optimize CI/CD pipelines and troubleshoot complex infrastructure and networking issues.
Relevant experience with AI infrastructure, including serverless pipelines, vector databases (pgvector), and embedding workflows.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

KHAJA TABARAK UDDIN

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Key Strengths

Cultural & Operational Fit