AI Research Engineer with 2+ years in NLP, LLM & RAG Systems
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Second-year AI & Data Science undergraduate who has independently built and shipped three full NLP/ML systems alongside coursework, including a self-built 40,000-pair Sinhala transliteration corpus, a published open-source PyPI package, and a hybrid retrieval legal assistant covering 100+ Sri Lankan legal documents. Comfortable across the full pipeline: data collection, model fine-tuning (T5, mT5, Gemma 3 with LoRA/PEFT), evaluation, and lightweight deployment. Seeking an internship to apply this foundation in a production ML/AI team.
Robert Gordon University
BSc (Hons) Artificial Intelligence & Data Science · Data Engineering, Artificial Intelligence, Machine Learning, Computational Mathematics
August 1, 2023 – June 30, 2028
Dual-Architecture Singlish-to-Sinhala Transliteration System & sin-transliterator PyPI Package
January 1, 2025 – June 1, 2026
Built a 40,000-pair Sinhala transliteration corpus from scratch (no pre-existing labelled data) and fine-tuned multiple transformer architectures (T5, mT5, Gemma 3); Gemma 3 with LoRA/PEFT achieved CER 0.16 and WER 0.31 on a held-out 2,549-row test set, outperforming widely-used existing tools on ad-hoc and code-mixed Singlish that they typically fail to handle. Designed a two-stage inference pipeline (lightweight seq2seq model for real-time use, LLM refinement layer for ambiguous/code-mixed input), quantised to INT8 via CTranslate2 for CPU-only hosting, and published it as the sin-transliterator PyPI package with automatic CPU/GPU detection and versioned Hugging Face weights. Built a custom stochastic data augmenter to generate realistic ad-hoc Singlish variations and scraped code-mixed training examples from YouTube Live Chat, expanding the corpus beyond formal text and directly improving the model’s robustness on real-world informal input.
View ProjectMyLawLLM: Sri Lankan Legal RAG Assistant
January 1, 2025 – June 1, 2026
Built a hybrid retrieval pipeline combining dense vector search with BM25 keyword matching over 100+ Sri Lankan legal documents, pairing a plain-English explanation with the underlying legal basis so non-experts can get oriented on routine legal questions without consulting a lawyer first. Implemented end-to-end with a FastAPI backend, Qdrant Cloud for vector storage, and a lightweight web interface; BM25 re-ranking on top of pre-indexed vectors keeps response latency low. Designed the retrieval and prompting layer to cite the specific source document for every answer, so users can trace any explanation back to the original legal text rather than relying on an unverifiable summary.
View ProjectCustomer Churn Prediction: Neural Network vs. Decision Tree Study
January 1, 2025 – June 1, 2026
Built and benchmarked a custom ANN against a Decision Tree on structured customer data, with a full evaluation suite covering confusion matrix, precision, recall, F1-score, and ROC-AUC; addressed class imbalance using random oversampling and SMOTE so both models learned from minority churn cases. Extracted human-readable decision rules from the tree model to surface the strongest churn predictors, giving a non-technical stakeholder a clear basis for prioritising retention efforts alongside the neural network's performance metrics. Compared model performance across the full evaluation suite to recommend which model fits which use case, balancing the decision tree’s interpretability against the ANN’s stronger raw predictive accuracy.
View ProjectFine-Tuning Large Language Models
Hugging Face
January 1, 2025 – Present
Professional Certificate in Machine Learning
IIT PDU
January 1, 2025 – Present
Supervised Machine Learning: Regression & Classification
ULSA
January 1, 2025 – Present
Cultural Fit Analysis
The candidate's projects demonstrate a strong alignment with the target role of AI Research Engineer, focusing heavily on NLP, LLMs, and RAG systems. The diversity of projects, from transliteration to legal RAG and churn prediction, showcases a broad interest and ability to apply AI/ML across different domains. The independent nature of these projects, coupled with open-source contributions, indicates a self-starter mentality and a passion for learning and building, which are positive indicators for cultural fit in an innovative environment.
Soft Skills & Operational Fit
The candidate's project descriptions indicate a proactive and independent approach to problem-solving, evidenced by building systems from scratch and addressing real-world challenges like code-mixed language. The focus on interpretability (Decision Tree rules, source citation in RAG) suggests an understanding of stakeholder needs and responsible AI practices. The publication of a PyPI package and open-source contributions imply a collaborative and sharing mindset, which aligns well with operational fit in a technical team.