AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
AI Engineer with less than a year in Computer Vision & On-Device ML
As an AI Engineering student with deep hands-on expertise in Computer Vision, On-Device ML, Natural Language Processing, and Multimodal Systems, I have a proven track record of architecting and deploying production-grade AI solutions on resource-constrained edge hardware. My work spans facial emotion recognition, speech emotion detection, and posture analysis, all compressed and optimized for sub-100 ms on-device inference via TFLite INT8 on Android, with no cloud dependency. With a passion for pushing the boundaries of deployable AI, I am adept at translating cutting-edge research into real-world applications that function reliably in low-resource environments. Let's discuss how my deep technical skills and engineering-first approach can drive your organization's AI and Computer Vision initiatives forward.
National Textile University, Faisalabad
Bachelor of Science · Artificial Intelligence
August 1, 2022 – June 30, 2026
National Textile University, Faisalabad
Teaching Assistant Computer Organization & Assembly Language
February 1, 2025 – June 1, 2025
Faisalabad, Punjab, Pakistan
Sound Emotion Recognition Dual-Approach Audio Pipeline
January 1, 2026 – January 1, 2026
Built two independent SER pipelines and benchmarked them head-to-head: Approach 1 (40-dim MFCC + CNN-LSTM) reached 75% accuracy; Approach 2 (3-channel Mel spectrogram stacking Mel, Δ, and ΔΔ into a physics-aware 94×128×3 image + Bi-LSTM + custom Attention) reached 80% accuracy — a 5-point gain by encoding temporal dynamics as image channels rather than sequence features. Approach 2 uses rectangular CNN kernels (5×3 time-focus, 3×5 frequency-focus) and a custom Attention layer that suppresses silence and padding frames, focusing the model on voiced segments. Both pipelines use Mixed Float16 precision, light audio augmentation (Gaussian noise, ±1 semitone pitch shift, 0.95-1.05× time stretch), and WarmUp + Cosine Decay scheduling.
View ProjectTell2Design Generative AI for Architectural Floor Plans
January 1, 2026 – January 1, 2026
Developed a generative pipeline that transforms natural language room descriptions into structured 2D floor plan layouts using Graph Attention Networks (GAT) trained on the 3D-FRONT indoor scene dataset (18,000+ professionally designed rooms). Modeled spatial dependencies between rooms as a graph — nodes represent rooms, edges encode adjacency constraints — enabling the model to generate layouts that respect real-world architectural relationships (e.g. kitchen adjacent to dining, bedroom separate from living areas). Evaluated layout coherence with IoU-based room overlap metrics; demonstrated that GAT-based spatial reasoning outperforms sequence-to-sequence baselines on constraint satisfaction for multi-room plans.
View ProjectFacial Emotion Recognition Blendshape MLP Pipeline
January 1, 2026 – January 1, 2026
Explored three architectures progressively before arriving at the production model: (1) FaceNet backbone fine-tuning with 3-phase curriculum training; (2) CNN/ViT baselines; (3) a compact 165-dim temporal blendshape MLP — chosen for its combination of accuracy, pose invariance, and mobile deployability. Refactored from a 1404-dim raw landmark feature space to a 52-dim FACS-aligned blendshape representation, achieving a 3x model size reduction with higher pose invariance; expanded to 165-dim by appending temporal derivatives and 4 geometric ratios (EAR, MAR, brow-to-eye, mouth pull). Architecture: GLU blocks + Residual MLP + Multi-Head Self-Attention with Supervised Contrastive Loss (T=0.07) + Focal Loss (y=2); trained on AffectNet + RAF-DB with AdamW + Cosine Decay Restarts. Achieved 78% validation accuracy on AffectNet and 68% zero-shot transfer on RAF-DB (held out entirely during training), demonstrating generalization beyond dataset-specific texture cues. Exported to TFLite INT8 with representative-dataset calibration; quantization metadata JSON enables direct drop-in to Android MediaPipe pipelines.
View ProjectVirtual HR Multimodal Mock Interview Analysis System
January 1, 2025 – January 1, 2026
Built a real-time Android application that conducts AI-powered mock interviews: generates job-specific questions via LLM, records candidate responses, and produces a structured multimodal feedback report covering verbal fluency, facial emotion, and posture – all running entirely on-device with no server dependency. Integrated Whisper (speech-to-text) + BERT (fluency and coherence scoring) for verbal analysis; MediaPipe Pose for posture detection; and the custom 165-dim blendshape MLP (see FER project) for real-time facial emotion recognition – three parallel inference streams in a single APK. Deployed full inference stack via TFLite INT8 quantization, achieving sub-100 ms on-device latency across all three modules on mid-range Android hardware. (AI/ML backend and model integration by author; Flutter frontend by collaborator.) Delivered a complete, installable APK – not a prototype – with per-module structured feedback output designed for repeat use by job seekers.
View ProjectEdge AI Livestock Tracking & Classification System
January 1, 2025 – January 1, 2025
Developed an edge-based animal tracking and classification system using ESP32-S3, ESP32-CAM, and TensorFlow Lite; dual IR-sensor direction detection triggers real-time image capture and on-device inference with a quantized MobileNet V1 model classifying livestock (Cow, Goat, Hen) with no cloud dependency. Logged classification events and running counts to InfluxDB via Wi-Fi; added OLED status display and a Streamlit interface for validating the TFLite model on uploaded images during development.
View ProjectYouTube Video Question Answering RAG Pipeline
January 1, 2024 – January 1, 2024
Built a Streamlit application that answers natural language questions over any YouTube video — extracts the transcript, chunks and indexes it with FAISS + all-MiniLM-L6-v2 embeddings, then generates context-grounded answers via Meta LLAMA 3 (LangChain + Together AI). Supports multi-language transcripts and configurable retrieval size k; includes chunk-level source inspection so users can verify which segments informed each answer — addressing the hallucination transparency problem common in naive RAG deployments. Handles edge cases gracefully: missing captions, API rate limits, and videos with auto-generated vs. human transcripts are all caught and surfaced to the user with actionable error messages.
View ProjectCultural Fit Analysis
The candidate's academic projects showcase a strong drive for innovation and practical application of AI, aligning well with a culture that values hands-on development and real-world impact. Their diverse project portfolio, spanning computer vision, NLP, audio ML, and generative AI, indicates a broad interest and adaptability, which are beneficial for dynamic team environments. The emphasis on optimizing for edge devices suggests a resource-conscious and efficient approach to engineering.
Soft Skills & Operational Fit
The candidate demonstrates strong problem-solving skills, evidenced by their detailed approach to model optimization, architecture selection, and handling edge cases in projects. Their teaching assistant role suggests good communication and mentoring abilities, which are valuable for team collaboration and knowledge sharing. The focus on delivering complete, installable solutions indicates a product-oriented mindset and attention to detail.