About me

I’m a Data Scientist and ML Engineer working in Search and Recommendations team at Jio Platforms Ltd. I have worked in a wide range of recommedation systems for Jio Cinema, JioTv and JioTV+(Set up box application).
I have been working on products with 5 mn+ DAU, and served recommendations on various time scales ranging from a few hours to a week . My work starts right from the business problem, followed by defining business KPIs and then mapping it to ML problem. My workflow generally includes data fetching, data enrichment, data cleaning, preprocessing, modelling, evaluating results, deployment, retraining and maintenance.
I am comfortable designing ML systems, code review, defining features for products and collaborating and assigning work to various other stakeholders like DevOPS engineers, Data Engineers, Quality Analyst and Product Managers.
In the past, I have solved business problems using Machine Learning, Deep Learning, Natural Language Processing(NLP) and Large Language Models. Solving impactful problems, which effect user experience on scale gives me work satisfaction.
I’m interested in modeling how people communicate about their subjective experiences through text, especially when these communications occur in online communities centered on Media, Advertisements, Healthcare, Recruitment or building communities.
Previously, I have worked on various projects as self case study during my undergrad.
In Feb 2023, I gave a talk on designing and building Search and Recommendations systems together. Please find the link to code and lecture slides here.
My work/publication related to Medical Report Generation using X-Ray Images can be found here .
In past, I worked as a Data Scientist at HT Media Ltd, where I worked for Shine.com team - India’s second largest recruiting platform.
Data Scientist : Nov 2021 - Till Date
Bengaluru, India
- Build and Maintained systems to recommend similar shows(go to any of the platforms mentioned above, search a show, watch it and then see the similar shows recommended- if you think its awesome, drop a heart. If you think, it could be improved- reach out with your feedback and suggestions) to users.
- Trained Models to create embeddings for Show Genres, Category, actors and directors .
- Used BERT, and other LLMs for feature extractions and integrated it with various existing recommendation systems.
- Enhanced various existing products with new versions and boosted CTRs by xx percentage .
Data Scientist : October 2020 - Nov 2021
Gurugram, India
- Worked on a SOTA CV Parser for shine.com
- Created a custom named entity recognition and normalization system for extracting various details from resume text.
- Cleaned, explored and trained models on large datasets and built \textbf{SaaS product} for in-house and market use.
- Used BERT for feature extractions followed by stacking various LSTM, CNN architectures for fine-tuning models
- Implemented the NLP sections of a tensorflow learning pipeline for the core product.
- Saved 0.01 $ per job apply for the recruiting firm, three percent of total cost-cutting during pandemic}.
- Patent filed for the product by company, with me as one of the inventors
Data Science Intern : June 2020 - October 2020
Gurugram, India
- Worked on a SOTA CV Parser for shine.com
- Created a custom named entity recognition and normalization system for extracting various details from resume text.
- Constructed a Dataset which could be used as absolute benchmark for various applications related to NER.
- Implemented TFIDF and Bag of Words for text vectorizations.
- Created Ensemble Text Embeddings using GLoVE, Word2Vec and FastText for Text Embeddings.
- Implemented a baseline model for SOTA CV Parser to be built In-House.
Toffee Insurance
Data Science Intern : June 2019 - August 2019
Gurugram, India
- Worked on a Fraud Detection model on Insurance Claims.
- Created a model using Random Forests and XGBoost.
- Achieved an AUC score of 0.84 from previously 0.70 in predicting frauds.
- Cleaned Dataset discussing various domain specific details from various Product Managers.
Guru Gobind Singh Indraprastha University, New Delhi
B.Tech in Electrical and Electronics Engineering
August 2016 - June 2020
University of Michigan
Applied Machine Learning in Python.
University of Michigan
Python Data Structures
University of Washington
Machine Learning Foundations: A Case Study Approach
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
Databases: SQL, Mongo, Cassandra, Redis
Modeling: Machine Learning, Deep Learning, Natural language Processing(NLP), Large Langauge Models(LLMs) etc
Computer Languages
Python (primary), C++ (secondary language)
Python Stack
spaCy, Gensim, scikit-learn, pandas, SciPy, NumPy, Tensorflow, Keras
MLFlow, Weights and Biases, Docker, Kubernetes, Airflow
GCP, Azure
Stanford CoreNLP, NLTK, openNLP, Seaborn, Matplotlib, Agile Methods
Human Languages
English , Hindi
- BeamAtt: Generating Medical Diagnosis from Chest X-Rays Using Sampling-Based Intelligence : Link
- Comparative Analysis of Bagging and Boosting Algorithms for Sentiment Analysis(Final Year Project): Link
Volunteer Work
- ICML Conference(Top ML Conference) - July 2020
- ICLR Conference(Top ML Conference) - May 2020
Co-Curricular Events
- Debating, Extempore, Elocution
- National Cadet Core (NCC)
- Magazine/Newsletter Editor
You can download a PDF of my CV here.