Skip to content

"Learn the fundamental concepts in probability, statistics, optimization and linear algebra which form the foundations for data science"

Notifications You must be signed in to change notification settings

dhirajmahato/Foundation_of_Data_Science_IIMB

Repository files navigation

Foundation of Data Science IIMB Swayam Course

According to Dr Jim Gray, Data Science is the fourth paradigm that drives innovative solutions to organizational problems.

  • basic concepts in probability such as joint and conditional probabilities.
  • ML algorithms for Market Basket Analysis and Recommender Systems.
  • random variables, discrete and continuous probability distributions, sampling, estimation and central limit theorem
  • feature selection to avoid overfitting and underfitting
  • regression and logistic regression use hypothesis testing to select features.
  • optimization techniques, and algorithms such as Gradient Descent

In analytics, we usually deal with problems under uncertainty. Probability is a measure of uncertainty and it becomes essential part of data science. Data is usually reported in a matrix form and thus the knowledge of linear algebra is important for understanding intricacies in analytical model development. In AI and ML, while trying to find the Feature Weights, we end up minimizing a loss function and thus optimization concept becomes the base for data science.

"Probability theory, optimization and linear algebra form the basis of data science on which the entire Artificial Intelligence is built"

Week1: Descriptive Statistics and Data Visualisations

Descriptive Analytics is used to describe data and derive insights using descriptive statistics, data visualization and queries. Descriptive analytics involves finding “what has happened” in a specific business context using the past data. Analyzing past data can provide insights that can assist organizations in taking appropriate decisions.

  1. Data Types and Scale
  2. Population and Sample
  3. Measure of Central Tendency
  4. Percentile, Decile, and Quartile
  5. Measure of variation
  6. Measure of Shape
  7. Data visualisation charts

Week2: Probability Theory and Applications

"Probability theory is the foundation on which descriptive and predictive models are built"

  1. Introduction
  2. Axioms of Probability
  3. Applications of Simple Probability
  4. Bayes' Theorem
  5. Random Variable
  6. PMF and CDF of Discrete Random Variable
  7. Geometric Distribution
  8. Parameters of Continuous Distributions
  9. Uniform Distribution
  10. Poisson Distribution
  11. Binomial Distribution
  12. Normal Distribution

Week3: Sampling and Estimation

"Sampling is necessary when it is difficult or expensive to collect data on the entire population"

  1. Introduction to Sampling
  2. Population Parameters and Sample Statistics
  3. Sampling
  4. Non-Probabilistic Sampling
  5. Sampling Distribution
  6. Probablitic Sampling
  7. Central Limit Theorem (CLT)
  8. Sample Size Estimation for Mean of the Population
  9. Estimation of Population Parameters
  10. Method of Moments
  11. Estimation of Parameters using method of moments
  12. Estimation of Parameters using MLE

Week4: Confidence Interval

A Confidence Interval denotes the range within which the value of a population parameter is likely to fall with a certain probability. "The objective of a confidence interval is to indicate both the location and precision of the population parameter"

  1. Confidence Interval for Population Mean
  2. CI for Population Mean when Standard Deviation is Unknown
  3. CI for Population Variance
  4. CI for Population Proportion

Week5: Hypothesis Testing

  1. Setting Up a Hypothesis Test
  2. Type I and Type II Error
  3. One-Tailed and Two-tailed Test
  4. Z-test for Proportion
  5. Paired Sample t-Test
  6. Comparing two Populations: Two-Sample Z- and t-Test
  7. Hypothesis Test for Difference in Population Proportion under Large Samples
  8. Effect Size: cohen's D
  9. Hypothesis Test for Equality of Population Variances
  10. Non-parametric Tests: Chi-Square Tests

Week6: Analysis of Variance

  1. Introduction of ANOVA
  2. Multiple t-Tests for comparing Several Means
  3. One-Way Analysis of Variance
  4. Two-Way Analysis of Variance

Week7: Correlation

Correlation is a statistical measure of an associative relationship between two random variables. Correlation is not necessarily a causal relationship. Correlation is an important concept in analytics, as it helps to identify variables that may be used in model building and is also useful for identifying issues such as multi-collinearity that can destabilize regression-based models. Correlation is also useful for finding proxy variables in analytics model building.

  1. Pearson Correlation Coefficient
  2. Spearman Rank Correlation
  3. Point Bi-Serial Corelation
  4. Phi-Coefficient

Week8: Optimization and Linear applied algebra

  1. Different types of matrices
  2. Eigen Value and Eigen vector
  3. Optimization- Gradient Descent
  4. Maxima and Minima

About

"Learn the fundamental concepts in probability, statistics, optimization and linear algebra which form the foundations for data science"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published