GitHub - dhirajmahato/Foundation_of_Data_Science_IIMB: "Learn the fundamental concepts in probability, statistics, optimization and linear algebra which form the foundations for data science"

Foundation of Data Science IIMB Swayam Course

According to Dr Jim Gray, Data Science is the fourth paradigm that drives innovative solutions to organizational problems.

basic concepts in probability such as joint and conditional probabilities.
ML algorithms for Market Basket Analysis and Recommender Systems.
random variables, discrete and continuous probability distributions, sampling, estimation and central limit theorem
feature selection to avoid overfitting and underfitting
regression and logistic regression use hypothesis testing to select features.
optimization techniques, and algorithms such as Gradient Descent

In analytics, we usually deal with problems under uncertainty. Probability is a measure of uncertainty and it becomes essential part of data science. Data is usually reported in a matrix form and thus the knowledge of linear algebra is important for understanding intricacies in analytical model development. In AI and ML, while trying to find the Feature Weights, we end up minimizing a loss function and thus optimization concept becomes the base for data science.

"Probability theory, optimization and linear algebra form the basis of data science on which the entire Artificial Intelligence is built"

Week1: Descriptive Statistics and Data Visualisations

Descriptive Analytics is used to describe data and derive insights using descriptive statistics, data visualization and queries. Descriptive analytics involves finding “what has happened” in a specific business context using the past data. Analyzing past data can provide insights that can assist organizations in taking appropriate decisions.

Data Types and Scale
Population and Sample
Measure of Central Tendency
Percentile, Decile, and Quartile
Measure of variation
Measure of Shape
Data visualisation charts

Week2: Probability Theory and Applications

"Probability theory is the foundation on which descriptive and predictive models are built"

Introduction
Axioms of Probability
Applications of Simple Probability
Bayes' Theorem
Random Variable
PMF and CDF of Discrete Random Variable
Geometric Distribution
Parameters of Continuous Distributions
Uniform Distribution
Poisson Distribution
Binomial Distribution
Normal Distribution

Week3: Sampling and Estimation

"Sampling is necessary when it is difficult or expensive to collect data on the entire population"

Introduction to Sampling
Population Parameters and Sample Statistics
Sampling
Non-Probabilistic Sampling
Sampling Distribution
Probablitic Sampling
Central Limit Theorem (CLT)
Sample Size Estimation for Mean of the Population
Estimation of Population Parameters
Method of Moments
Estimation of Parameters using method of moments
Estimation of Parameters using MLE

Week4: Confidence Interval

A Confidence Interval denotes the range within which the value of a population parameter is likely to fall with a certain probability. "The objective of a confidence interval is to indicate both the location and precision of the population parameter"

Confidence Interval for Population Mean
CI for Population Mean when Standard Deviation is Unknown
CI for Population Variance
CI for Population Proportion

Week5: Hypothesis Testing

Setting Up a Hypothesis Test
Type I and Type II Error
One-Tailed and Two-tailed Test
Z-test for Proportion
Paired Sample t-Test
Comparing two Populations: Two-Sample Z- and t-Test
Hypothesis Test for Difference in Population Proportion under Large Samples
Effect Size: cohen's D
Hypothesis Test for Equality of Population Variances
Non-parametric Tests: Chi-Square Tests

Week6: Analysis of Variance

Introduction of ANOVA
Multiple t-Tests for comparing Several Means
One-Way Analysis of Variance
Two-Way Analysis of Variance

Week7: Correlation

Correlation is a statistical measure of an associative relationship between two random variables. Correlation is not necessarily a causal relationship. Correlation is an important concept in analytics, as it helps to identify variables that may be used in model building and is also useful for identifying issues such as multi-collinearity that can destabilize regression-based models. Correlation is also useful for finding proxy variables in analytics model building.

Pearson Correlation Coefficient
Spearman Rank Correlation
Point Bi-Serial Corelation
Phi-Coefficient

Week8: Optimization and Linear applied algebra

Different types of matrices
Eigen Value and Eigen vector
Optimization- Gradient Descent
Maxima and Minima

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
README.md		README.md
Week1: Descriptive_Statistics.md		Week1: Descriptive_Statistics.md
Week2: Probability_Theory.md		Week2: Probability_Theory.md
Week3: Sampling_and_Estimation.md		Week3: Sampling_and_Estimation.md
Week4: Confidence_interval.md		Week4: Confidence_interval.md
Week5: Hypothesis_Testing.md		Week5: Hypothesis_Testing.md
Week6: Analysis_of_Variance.md		Week6: Analysis_of_Variance.md
Week7: Correlation.md		Week7: Correlation.md
Week8: Optimization_and_Linear_algebra.md		Week8: Optimization_and_Linear_algebra.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Week1: Descriptive_Statistics.md

Week1: Descriptive_Statistics.md

Week2: Probability_Theory.md

Week2: Probability_Theory.md

Week3: Sampling_and_Estimation.md

Week3: Sampling_and_Estimation.md

Week4: Confidence_interval.md

Week4: Confidence_interval.md

Week5: Hypothesis_Testing.md

Week5: Hypothesis_Testing.md

Week6: Analysis_of_Variance.md

Week6: Analysis_of_Variance.md

Week7: Correlation.md

Week7: Correlation.md

Week8: Optimization_and_Linear_algebra.md

Week8: Optimization_and_Linear_algebra.md

Repository files navigation

Foundation of Data Science IIMB Swayam Course

Week1: Descriptive Statistics and Data Visualisations

Week2: Probability Theory and Applications

Week3: Sampling and Estimation

Week4: Confidence Interval

Week5: Hypothesis Testing

Week6: Analysis of Variance

Week7: Correlation

Week8: Optimization and Linear applied algebra

About

Releases

Packages

dhirajmahato/Foundation_of_Data_Science_IIMB

Folders and files

Latest commit

History

Repository files navigation

Foundation of Data Science IIMB Swayam Course

Week1: Descriptive Statistics and Data Visualisations

Week2: Probability Theory and Applications

Week3: Sampling and Estimation

Week4: Confidence Interval

Week5: Hypothesis Testing

Week6: Analysis of Variance

Week7: Correlation

Week8: Optimization and Linear applied algebra

About

Resources

Stars

Watchers

Forks