Arif Waghbakriwala arif9799

Hi there, I am Arif!

Snapshot... Me, Myself!

Myself Arif, a Data Scientist with a Master's in Data Science from Northeastern University. My expertise lies in transforming complex datasets into actionable and meaningful insights. I can craft end-to-end automated solutions, translate business inquiries into fully-fledged ML Systems, excel at building statistical & predictive models while simplifying complex concepts and present results to both technical and non-technical stakeholders via creative mediums, including animations!

I've tackled impactful projects at PUMA North America, automated outlier detection in retail sales data and built sales and demand forecasting system for optimizing raw supply inventory. I've also served as a Graduate TA for CS7150-Deep Learning and DS4400- Machine Learning and Data Mining, designed homeworks & rubrics, performed grading and conducted Office Hours to Simplify concepts.

My technical strengths include Time Series Analysis, Machine Learning, Statistics, SQL and Software Integration. Proficient in Python, R, C/C++ and libraries like Pandas, Scikit-learn, Numpy, Tensorflow, PyTorch, Keras, bs4 etc. I am a highly motivated detail-oriented individual with a strong work ethic, keen and always on the verge of learning something new.

In the Pages of My Journey!

Surfing through the internet only to stumble upon this , a coincidence? Nah! You’re just in right place. Myself Arif, Data Scientist, Data Engineer & a Python Developer with a passion for automation & zeal to craft solutions that requires minimal or no effort eradicating human intervention. I love fiddling around Datasets, exploring relations, extracting significant insights, retrieving never-heard-before stories, scrutinizing data not only to the point where all WHYs have been answered but uprooting causes to problems one never knew existed in first place, leveraging them to train apt ML algorithms to tackle challenges similar to or even better then humans.📊🔍

Fueled by passion for this domain, preparing myself for an extended progressive career that provides ample opportunities for constantly growing in the field. Ever since my first encounter of stunning capabilities of AI, that triggered the course of events of my life- inducing me to pursue Data Science, I’ve been awe-struck every single time comprehending working mechanism of the feats achieved in this field, which seems creativity is fused with math wherein numbers tend to be more reliable in making decisions then human instincts.🤖📈

I hold a Masters Degree in Data Science from Northeastern University - Khoury College of Computer Sciences and a Bachelor's Degree in Computer Engineering from Gujarat Technological University - Pacific School of Engineering. I am an experienced Data Scientist who can build fully fledged end-to-end applications of Data Science and Machine Learning. Furthermore, I have a strong practical and theoretical experience in the development of Supervised and Unsupervised Machine Learning, Deep Learning and AI Models.🎓💻

I worked at companies in US, such as PUMA North America and Khoury College of Computer Sciences as a Data Scientist and Graduate Teaching Assistant, generating valuable information, insights and conclusion from data and contributing field knowledge to the peers and fellow students of Northeastern. This portfolio demonstrates a wide range of skills that I possess in solving and tackling machine learning problems and is a proof of my work of contribution to the field of Data Science.📚💡

Skills!

The following fragments of tools and technologies along with essentials, define my Capabilities and go-to stack for solving Data Science Problem

Code/ Cloud Mastery	Frameworks/ Packages/ Libraries	Familiar Libraries/ Softwares	IDEs/ Version Control	I contribute to	Proficient in Tools

Alma Mater

Khoury College of Computer Sciences, Northeastern University: Masters of Science in Data Science

Pacific School of Engineering, Gujarat Technological University: Bachelors of Engineering in Computer Engineering

Radiant English Academy: High School

Sanjeewan Vidyalaya: Junior School

Career Kaleidoscope!

PUMA North America, Data Scientist

July'22- December'22

Successfully delivered and deployed two fully fledged ML Production Grade Projects, that are still in effect today!

SALES AND DEMAND FORECASTING

Predictive models used: AR, MA, ARIMA, SARIMA, Exponential Smoothing, Random Forest Regression, XGBoost, fbProphet, etc
Cloud and Libraries used: Azure ML Studio, Azure Data Factory, Azure DataBricks, pandas, numpy, scikit-learn, Pytorch, Tensorflow, etc

Situation: Operational hurdles in fulfilling Customer demands due to inadequate forecast of raw materials, impacting revenue streams
Task: Develop Time Series Forecasting System to forecast future Demands, enabling proactive raw material procurement strategies
Action: Preformed Data Cleaning, standardization, feature engineering, imputations & deployed Univariate/Multivariate Models on Cloud
Results: Achieved 34% reduction in RMSE & enhanced accuracy to 80%, facilitating decision-making efficiency & increased revenue

ANOMALY DETECTION

Predictive models used: AR, MA, ARIMA, SARIMA, Exponential Smoothing, Random Forest Regression, XGBoost, etc
Cloud and Libraries used: pandas, numpy, scikit-learn, Pytorch, openpyxl, ADTK, Seaborn, MatplotLib, sktime, darts, StatsModels etc

Situation: Data transmission issues from local store registers, leads to significant gaps in analytical reports, requiring intervention
Task: Develop Python application using Machine Learning and Time Series methods to automate anomaly detection in retails sales data
Action: Implemented automated data retrieval & trained Univariate time series models for various rolling stats, streamlining detection
Results: Achieved 90% reduction in man-hours with 60% accuracy, deployed as a Python APK for outlier detection and rectification.

Khoury College of Computer Sciences, Graduate TA

Jan'22- December'22

CS 7150: Deep Learning
- The course curriculum needed enhancement to emphasize practical applications of Neural Networks including Diffusion and LLMs
- Helped restructure the curriculum, create an interactive learning environment & building logistics & infrastructure
- Designed and implemented rubrics & advanced homeworks on topics like Transformers and Diffusion Models from scratch
- Set up discussion panels on latest AI research, invoking live discussions while fostering 90% increase in student participation
- Restructured curriculum resulted in students not only grasping theoretical concepts but also understanding real-world relevance

DS 4400: ML & Data Mining
- The course aimed to provide students with a comprehensive understanding of Data Science,ML and Advanced Predictive Analytics
- My role was integral to bridging theoretical knowledge with hands-on experience in building ML Models from scratch in Python
- Simplified complex academic theories into simple laymen terms, translating them into tangible real-world applications
- Guided students through experiential learning projects to provide a practical bridge between theory and application
- Contributed to comprehensive course coverage, ensuring a strong foundation in both theory and practice
- Students gained a robust understanding of the subject, applying theoretical knowledge to real-world scenarios successfully

Project Portfolio: Crafting Solutions

The following section will showcase my skills further utilized in personal and curriculum Projects. I invite you to explore the github profile and readme files of the Projects for further information as this is just a short summary into the vast array of possibilities where my skills can be applied, post collaboration with SMEs.

Visual Grounding, Deep Learning

You speak, We'll find it!

Used an API for accessing flickr30k dataset to extract entities of Annotation (B-box), Phrases & Images, assembling pipeline to extract, transform & collate from multiple data source. Then, Pre-calculated high level general representation embeddings of Images & Textual content using Pre-trained Vision Transformers & BERT respectively. Moved on to Build & train baseline Transformer Encoder-Decoder Model on concatenated embeddings of image and text, to predict B-boxes with 75% IoU achieving an accuracy of 58% that can significantly spot objects in images as described by corresponding context provided. Also, developed and experimented light-weight architectures like Textual-Encoder & Decoder, Vision-Encoder & Decoder and Decoder Only Network with equivalent performance that outperforms the baseline models.

Neural Style Transfer, Deep Learning

The Canvas Conundrum: Imposing Style of an image onto Contents of another

An endeavor of imposition of an artistic style image onto contents of another, employing Transfer Learning concept using pre-trained CNN Model VGG-19. Normalized Images, built content-style loss function & convolved through CNN (with frozen weights) while back-propagating summed loss to Noise Image. Also, performed Hyper-parameter Tuning to find optimal values of Learning Rates, 'ɑ' & 'β' (ɑ,β determine proportion of content & style to be injected) with 21% MSE Loss

Sentimental Recommendation System, Unsupervised ML

OpinioCraft: Unleashing Sentimental Insights through Unsupervised ML

An unsupervised approach to mine opinions, thoughts and emotions based on the mathematical notion of the words that determines the sentiment of the reviews that are being processed to achieve results for recommendation. The principal focus is to retrieve user’s search query (Product & Category), based on which the user will be recommended top-n products from that category alone. The Underlying mechanism in simplest terms is to figure out the sentiments of the reviews either as positive or negative, followed by clustering unique items to decide top-k products based on higher average of connotation scores.

Feature Analysis, Supervised ML

RateVue: Decoding IMDb – A Feature Alchemy

Initialized with importing primary dataset of 45k+ records, merging it with secondary dataset to handle missing values of certain variables, then validating custom procedures focused on Data Wrangling, typecasting, pivoting erratic variables to Sparse Matrix and much more. Conducted Univariate Exploratory Data Analysis to explore relations among dependent and independent Variables. Trained the simple models namely Logistic Regression & kNN which outperformed complex ones such as Decision Tree & Random Forest.

Diabetic Classification, Supervised ML

InsuLens: Focusing Clarity in Diabetic Classification

Cleaned and preprocessed anthropometric datasets with a whopping 1.8 million observations collected from 9 different states in India. Analyzed and performed hyperparameter tuning with 'Grid Search Cross Validation' to derive optimal Parameters for training the MultiLayer Perceptron Classifier to classify the Diabetics. Upsampled the minority class from the imbalanced dataset using SMOTE technique that drastically increased the accuracy of predicting diabetic class from 13% to an impressive 71.4%.

Product Sales Analysis, Data Science

DollaLlama: Wrangling Sales Data with Quirky Precision

Coalesced 180k+ records of sales into a file, performed Data Wrangling & Mining and Feature Engineered Variables, Envisioned strategic analysis based on Month, Quantities, Revenue generated & best-sellers to drive product decisions and Analyzed consumer behavior pattern of sales & extrapolated items to recommend based on frequently bought together.Electronic Appliances Sales Data – Exploratory Data Analysis Coalesced 180k+ records of sales into a file, performed Data Wrangling & Mining and Feature Engineered Variables, Envisioned strategic analysis based on Month, Quantities, Revenue generated & best-sellers to drive product decisions and Analyzed consumer behavior pattern of sales & extrapolated items to recommend based on frequently bought together.

Life Expectancy Prediction, Data Science and Supervised ML

Tomorrow's Time, Today's Numbers: Life Expectancy in a Snap

Forecasted life expectancy by constructing a Linear Regression Model on independent attributes of primary Dataset, with subsequent Feature Engineering of 5 Candidate Predictors & selection of 3 independent variables as Predictors for the Regression Model. Conclusion with Accomplishment an RMSE of 0.0095, exhausting all combination of predictors with response variable ‘Life Expectancy’.

GDP Visualisation, Data Science and EDA

Graphonomics: Crafting a Visual representation of Economic Growth

Started with extraction of demographic economic data from ‘World Development Indicator’ Datasets in WDI Package in ‘R’ language and then Analyzed data by plotting time series graphs of the GDP of certain countries for last 6 decades and constructing a Mini-Poster to Contrast. Finally, Inferencing various peaks of GDP & correlations of the variables & made presumptions of “The Great Recession”.

Medium Blogs: Turning Geek-Speak into Shakespeare, One Article at a Time.

Sentimental Recommendation System, Opinion Mining

In this Article, I describe in detail how we built an Opinion based unsupervised Recommendation Engine utilizing the Amazon reviews dataset. Based on word embeddings, clustering techniques and custom calculation of Connotation Scores, we determnine the top-n products to recommend to a particular user based on their search queries alone. The workflow involves data pre-processing, text tokenization, model training using Word2Vec, KMeans clustering, class determination, sentiment scoring, and generating personalized recommendations based on user queries and product categories. The project is generic and applicable to various review datasets

Time Series Prediction Intervals, since predictions alone are not enough

In this Article, I delve into time series forecasting with a focus on calculating prediction intervals, emphasizing the variation in width for time series data compared to non-time series data. I've involved and shed some light on concepts such as confidence intervals, confidence levels, prediction intervals, normal distributions and z-values. In this article I've explored statistical basics, including confidence intervals, population vs. sample, and normal distribution. Also, highlighting the use of z-values to standardize normal distributions and introduces the Central Limit Theorem. In the final section, I explain prediction intervals for both single-step and multi-step forecasting, considering the variability of standard error with the forecasting horizon. The content concludes with a practical example of predicting stock prices and calculating corresponding prediction intervals.

Attention is not Enough, RNNs to Transformers, the Journey!

In this Article, I explore the evolution of deep learning architectures, focusing on the shortcomings of Recurrent Neural Networks (RNNs) that led to the rise of Transformer Neural Networks. I have covered concepts such as positional encoding, the relational database's Key-Query-Value analogy serving as the basis for transformer self-attention and the transition from RNNs to Transformers. This article emphasizes the importance of attention mechanisms, detailing the Self-Attention mechanism and its role in learning the structure of input sequences. Additionally, this article introduces Multi-head Attention in Transformers, demonstrating how it allows the model to learn multiple aspects simultaneously. I concludesd the article by hinting at the cross-attention mechanism in the decoder, leaving room for future exploration of detailed and easy explanation in upcoming series of Deep Learning articles.

Contribution Statistics!

Competencies: Because Juggling Wasn't on Resume

Leadership Skills
Communication Skills
Team Work
Self Starter
Problem-solving Skills
Keen and Curios
Time Management
Problem Solving abilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arif Waghbakriwala arif9799

Achievements

Achievements

Block or report arif9799

Hi there, I am Arif!

Snapshot... Me, Myself!

In the Pages of My Journey!

Skills!

Alma Mater

Khoury College of Computer Sciences, Northeastern University: Masters of Science in Data Science

Pacific School of Engineering, Gujarat Technological University: Bachelors of Engineering in Computer Engineering

Radiant English Academy: High School

Sanjeewan Vidyalaya: Junior School

Career Kaleidoscope!

PUMA North America, Data Scientist

July'22- December'22

Successfully delivered and deployed two fully fledged ML Production Grade Projects, that are still in effect today!

SALES AND DEMAND FORECASTING

ANOMALY DETECTION

Khoury College of Computer Sciences, Graduate TA

Jan'22- December'22

Project Portfolio: Crafting Solutions

Visual Grounding, Deep Learning

Neural Style Transfer, Deep Learning

Sentimental Recommendation System, Unsupervised ML

Feature Analysis, Supervised ML

Diabetic Classification, Supervised ML

Product Sales Analysis, Data Science

Life Expectancy Prediction, Data Science and Supervised ML

GDP Visualisation, Data Science and EDA

Medium Blogs: Turning Geek-Speak into Shakespeare, One Article at a Time.

Sentimental Recommendation System, Opinion Mining

Time Series Prediction Intervals, since predictions alone are not enough

Attention is not Enough, RNNs to Transformers, the Journey!

Contribution Statistics!

Competencies: Because Juggling Wasn't on Resume

Pinned

Arif Waghbakriwala arif9799

Block or report arif9799

Hi there, I am Arif!

Snapshot... Me, Myself!

In the Pages of My Journey!

Skills!

Alma Mater

Khoury College of Computer Sciences, Northeastern University: Masters of Science in Data Science Pacific School of Engineering, Gujarat Technological University: Bachelors of Engineering in Computer Engineering Radiant English Academy: High School Sanjeewan Vidyalaya: Junior School

Career Kaleidoscope!

PUMA North America, Data Scientist

July'22- December'22

Successfully delivered and deployed two fully fledged ML Production Grade Projects, that are still in effect today!

SALES AND DEMAND FORECASTING

ANOMALY DETECTION

Khoury College of Computer Sciences, Graduate TA

Jan'22- December'22

Project Portfolio: Crafting Solutions

Medium Blogs: Turning Geek-Speak into Shakespeare, One Article at a Time.

Contribution Statistics!

Competencies: Because Juggling Wasn't on Resume

Pinned

Khoury College of Computer Sciences, Northeastern University: Masters of Science in Data Science

Pacific School of Engineering, Gujarat Technological University: Bachelors of Engineering in Computer Engineering

Radiant English Academy: High School

Sanjeewan Vidyalaya: Junior School