According to Dr Jim Gray, Data Science is the fourth paradigm that drives innovative solutions to organizational problems.
- basic concepts in probability such as joint and conditional probabilities.
- ML algorithms for Market Basket Analysis and Recommender Systems.
- random variables, discrete and continuous probability distributions, sampling, estimation and central limit theorem
- feature selection to avoid overfitting and underfitting
- regression and logistic regression use hypothesis testing to select features.
- optimization techniques, and algorithms such as Gradient Descent
In analytics, we usually deal with problems under uncertainty. Probability is a measure of uncertainty and it becomes essential part of data science. Data is usually reported in a matrix form and thus the knowledge of linear algebra is important for understanding intricacies in analytical model development. In AI and ML, while trying to find the Feature Weights, we end up minimizing a loss function and thus optimization concept becomes the base for data science.
Descriptive Analytics is used to describe data and derive insights using descriptive statistics, data visualization and queries. Descriptive analytics involves finding “what has happened” in a specific business context using the past data. Analyzing past data can provide insights that can assist organizations in taking appropriate decisions.
- Data Types and Scale
- Population and Sample
- Measure of Central Tendency
- Percentile, Decile, and Quartile
- Measure of variation
- Measure of Shape
- Data visualisation charts
"Probability theory is the foundation on which descriptive and predictive models are built"
- Introduction
- Axioms of Probability
- Applications of Simple Probability
- Bayes' Theorem
- Random Variable
- PMF and CDF of Discrete Random Variable
- Geometric Distribution
- Parameters of Continuous Distributions
- Uniform Distribution
- Poisson Distribution
- Binomial Distribution
- Normal Distribution
"Sampling is necessary when it is difficult or expensive to collect data on the entire population"
- Introduction to Sampling
- Population Parameters and Sample Statistics
- Sampling
- Non-Probabilistic Sampling
- Sampling Distribution
- Probablitic Sampling
- Central Limit Theorem (CLT)
- Sample Size Estimation for Mean of the Population
- Estimation of Population Parameters
- Method of Moments
- Estimation of Parameters using method of moments
- Estimation of Parameters using MLE
A Confidence Interval denotes the range within which the value of a population parameter is likely to fall with a certain probability. "The objective of a confidence interval is to indicate both the location and precision of the population parameter"
- Confidence Interval for Population Mean
- CI for Population Mean when Standard Deviation is Unknown
- CI for Population Variance
- CI for Population Proportion
- Setting Up a Hypothesis Test
- Type I and Type II Error
- One-Tailed and Two-tailed Test
- Z-test for Proportion
- Paired Sample t-Test
- Comparing two Populations: Two-Sample Z- and t-Test
- Hypothesis Test for Difference in Population Proportion under Large Samples
- Effect Size: cohen's D
- Hypothesis Test for Equality of Population Variances
- Non-parametric Tests: Chi-Square Tests
- Introduction of ANOVA
- Multiple t-Tests for comparing Several Means
- One-Way Analysis of Variance
- Two-Way Analysis of Variance
Correlation is a statistical measure of an associative relationship between two random variables. Correlation is not necessarily a causal relationship. Correlation is an important concept in analytics, as it helps to identify variables that may be used in model building and is also useful for identifying issues such as multi-collinearity that can destabilize regression-based models. Correlation is also useful for finding proxy variables in analytics model building.
- Pearson Correlation Coefficient
- Spearman Rank Correlation
- Point Bi-Serial Corelation
- Phi-Coefficient
- Different types of matrices
- Eigen Value and Eigen vector
- Optimization- Gradient Descent
- Maxima and Minima