Arjun's Portfolio

Experiences

Machine Learning Engineer

Responsibilities:

Improved key search metrics and eliminated null queries by productionizing AI-based Hybrid Search System using fine-tuned bi-encoder/cross-encoder transformers and Mongo Vector database.
Developed top revenue generating Recommendation Systems for cross sell and newly discovered products using Contrastive Learning, Two-Tower model and Neural Collaborative Filtering.
Built Multi Agent RAG chatbot for different HR functions using RAG techniques, LangGraph, Llama 3, Guardrails and efficient memory management.
Significantly accelerated model training pipelines using Distributed training, Model distillation and Quantization techniques.
Improved latency of ML APIs using NVIDIA Triton inference server, TensorRT and optimized MongoDB queries.
Deployed scalable inference pipelines for Search and Recommendation Systems using containers, Kubernetes & Seldon.
Spearheaded team of two on Model Portal, enhancing team visibility and driving increased collaboration with new teams.
Decreased time to ship features, models and APIs by developing multiple SDKs to (i) efficiently push and retrieve features to offline and online Feature Store using Databricks and MongoDB (ii) track ML experiments, register ML models and log metrics using MLFlow (iii) develop templates for boilerplate API builds using OOPs.
Experimented with Sequential and Session-based recommendation transformers on clickstream data to predict the next item to purchase based on the customer journey.
Developed end-to-end chit-chat/noise classifier on Chatbot data using BERT and MLFlow.

Data Science Intern

Responsibilities:

Designed MLOps Platform using GCP Vertex AI from Data Extraction to Model Deployment to continuously improve predictive performance of prospective AI investments.
Created baseline models on time-series data using Neural Networks in Tensorflow.
Mitigated Memory Usage and Runtime by 3 times by efficiently optimizing Glasswing’s Data Pipeline.
Eliminated manual effort by 100% by containerizing and deploying Glasswing’s data pipeline on Google Cloud Run.
Visualized, analyzed and aggregated data to provide insights to the investment team and enrich the Glasswing platform.

Machine Learning Research Intern

Responsibilities:

Researched WhatsApp’s network architecture and conducted experiments to collect WhatsApp network traffic-flow data.
Attained 97.3% accuracy using a 2-layer Ensemble ML model consisting of Naive Bayes, KNN, Decision Trees, Logistic Regression to identify whether media transfer occurred in a WhatsApp chat.
Obtained 95.6% accuracy using XGBoost to classify WhatsApp messages as delivered, received, or seen.

Education

		Northeastern University Sept 2019- Apr 2022 MS in Computer Science CGPA: 3.67 out of 4 Courses Taken: Deep Learning Large Scale Data Processing Data Mining Techniques Information Retrieval Database Management Algorithms Program Design Paradigm
		SRM University Jul 2015- May 2019 B.Tech in Software Engineering CGPA: 8.52 out of 10 Taken Courses Machine Learning Linear Algebra Probability and Statistics Advanced Calculus Software Testing Agile Software Process

Projects/Open Source

Fine-tuned Phi-3 VLM on CIFAR10 Image

Jul 2020

● Created a dataset with Question Answer pairs by generating image descriptions using SmolVLM2 on CIFAR10 images.
● Generated aligned image embeddings using SigLIP model.
● Fine-tuned Phi-3 as VLM for vision-text alignment using QLoRA and descriptions based on image embeddings from the SigLIP model.
● Optimized training with 4-bit quantization, Flash Attention 2, gradient checkpointing, and mixed-precision training, achieving memory-efficient and stable model convergence.

Shakespearean Text Generator

Jul 2020

● Designed and trained a GPT-2 model from scratch on Shakespeare’s complete works to generate stylistically authentic text.
● Implemented tokenization, dataset preprocessing, and transformer architecture customization with UI app using Gradio.

News Summarization Agent

Jul 2020

● Developed an AI-driven news aggregation and summarization agent leveraging LangGraph and MCP to automate the discovery, analysis, and synthesis of news articles.
● Designed a dynamic workflow that interprets user queries, collects and evaluates articles from multiple sources using NewsAPI and Beautiful Soup, and generates cohesive, multi-article summaries using advanced language models like gpt-4o-mini.
● Implemented adaptive search strategies, parallel content analysis, and context-aware summarization to deliver actionable insights and reduce information overload.

Sentiment Analysis Web App

May 2020

● Developed a Web App that predicts the sentiment of an user input review.
● Performed text cleaning and preprocessing including stemming, stopword removal, tokenization and HTML parsing for over 50,000 reviews and uploaded the transformed data to AWS S3.
● Built an LSTM model with Word Embedding layer using skip-gram architecture to learn sentiments from the data.
● Deployed the model for testing on AWS Sagemaker and achieved a test accuracy of 84%.
● Hosted the model on my Web App using AWS Lambda and AWS API Gateway.

Star

Machine Translation

Jun 2020

● Built a Machine Translation model that translates English sentences to French using Keras.
● Developed a comprehensive pipeline to preprocess over 1.8 million English and French words.
● Experimented with different architectures that include; Embedding layer + Bidirectional-GRU, Embedding layer with GRU, Bidirectional-GRU, Vanilla GRU, Encoder-Decoder with LSTM.

Star

Image Denoiser Using Convolutional Autoencoder

Jul 2020

● Created custom MNIST image dataset by adding Gaussian noise.
● Implemented a Denoiser by using an Encoder-Decoder model with Convolutional layers.
● Trained the Denoiser by supplying noise images as input and original images as target.

Dog Breed Classifier Using CNN

Apr 2020

● Created a CNN that predicts the dog breed if given a dog image or the closest dog breed resemblance when given a human image.
● Detected human faces in the images using OpenCVs Haar Cascades.
● Performed dog face detection and breed classification using Transfer Learning from VGG16 model and achieved 86% accuracy on unseen data.

Star

Patient Experience Website

Oct 2019 - Dec 2020

● Designed and built a JavaScript-based website for patients to look up doctors based on medical conditions and location.
● Integrated RESTful services as a Middle Level Tier to handle CRUD operations using JPA controllers and DAOs.
● Built a robust database using MySQL and formulated advanced queries like joins, nested queries, triggers, views.
● Hosted the database on AWS RDS and the entire website on AWS Elastic Beanstalk as an EC2 instance.

Star

Topical Web Crawler

Jan 2020

● Implemented an algorithm for a web crawler using link graphs and customized priority queues.
● Crawled over 140,000 web pages on Barack Obama and indexed the crawl data on ElasticCloud.
● Created a Vertical Search using Flask to retrieve relevant pages based on keywords using BM25 text retrieval model.

Star

Neural Style Transfer

May 2020

● Developed a CNN that applies the style of an image onto the content of another image.
● Extracted features and constructed loss function for style and content using Gram Matrix.
● Performed Transfer Learning from VGG19 model to build CNN.

Star

Automatic Speech Recogniser

Jul 2020

● Implemented an End-to-End Automatic Speech recognition pipeline using Keras.
● Preprocessed raw audio to feature representations like MFCC and Spectrograms.
● Built Acoustic Models to map audio features to the transcribed text.
● Experimented with different Neural Network architectures that include; Deep RNN + TimeDistributed Dense, CNN + RNN + TimeDistributed Dense, Bidirectional RNN + TimeDistributed Dense.

Star

Email Spam/Ham Classifier

Apr 2020

● Performed text preprocessing by email parsing, stemming and stopword removal using NLTK.
● Indexed the data to Elasticsearch and transformed text data to sparse matrices using CountVectorizer.
● Devised feature extraction using NLP techniques like Skipgrams and TFIDF.
● Developed Decision Trees, Logistic Regression and SVM models to achieve an ROC score of 96%.

Image Segmentation using K-Means Clustering on MapReduce

Nov 2020

● Attained Speedup of 1.35 and Scaleup of 1.05 while performing Image Segmentation.
● Designed the K-Means algorithm from scratch to work on distributed sources of data and paralleize compute using Hadoop ecosystem.

Star

Face Mask Detection

Dec 2020

● Achieved sensitivity of 96% on the largest real-world Face Mask Dataset.
● Built a custom ResNet model with Data Augmentation.
● Implemented a Face Detection pipeline using YOLOv5 and OpenCV to detect faces.

Human Protein Classification

Jul 2020

● Ranked in the top 20% of Human Protein Classification Kaggle Competition.
● Developed CNNs using Densenet and Resnet capable of classifying mixed patterns of proteins in microscope images.

Star

Image Processing Application

Nov 2019

● Built an Image Filter App that applies various filters on user input images.
● Implemented image processing operations such as blur, greyscale, sepia and mosaic from scratch.
● Devised an MVC architecture using Object Oriented Programming and Command Pattern Design.
● Additional features include generating custom images such as rainbow, checkerboards and flags based on user input.

Hi, I am Arjun

Arjun Prashanth

Skills

Python

Machine Learning/Deep Learning

Generative AI

MLOPs

Big Data and Databases

Model Optimization

Java

C/C++

Linux

Git

Agile

Experiences

Machine Learning Engineer

Ahold Delhaize USA

Responsibilities:

Data Science Intern

Glasswing Ventures

Responsibilities:

Machine Learning Research Intern

DRDO

Responsibilities:

Education

Northeastern University

MS in Computer Science

CGPA: 3.67 out of 4

Courses Taken:

SRM University

B.Tech in Software Engineering

CGPA: 8.52 out of 10

Taken Courses

Projects/Open Source

Fine-tuned Phi-3 VLM on CIFAR10 Image

Shakespearean Text Generator

News Summarization Agent

Sentiment Analysis Web App

Machine Translation

Image Denoiser Using Convolutional Autoencoder

Dog Breed Classifier Using CNN

Patient Experience Website

Topical Web Crawler

Neural Style Transfer

Automatic Speech Recogniser

Email Spam/Ham Classifier

Image Segmentation using K-Means Clustering on MapReduce

Face Mask Detection

Human Protein Classification

Image Processing Application