# DATA SCIENCE USING PYTHON

## ONLINE TRAINING COURSE

### learn

# Data Science

### Using Python

###### The ultimate purpose of this course is to cover all the basics and advance predictive modeling techniques which are a must in today's competitive edge.

###### This course is known for the projects involved in it. This is purely Job Oriented training. You will work on highly exciting projects in the domains of high technology, Retail, Banking, Marketing, Clinical, Manufacturing, and so on.

## Course Content

- Welcome/General Discussion about the expectation from course
- Definition of Data
- Difference between data management and data analytics
- Data Science components

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Python Overview
- Python Data Types
- Python operations using Numbers, String, Logical, Arithmetic and so on
- Python Strings
- Python Lists
- Python Tuple
- Python Dictionary
- FOR and WHILE loops
- IF/THEN/ELSE in Python
- Data Manipulation Using Numpy And Pandas

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Levels of Measurement and Variable types
- Descriptive Statistics and Picturing Distributions
- Confidence Interval for the Mean

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- One-Sample T-Test of Comparing Means
- Two-Sample T-Test of Comparing Means
- One Way ANOVA
- Assumptions of ANOVA Modeling
- N-Way ANOVA
- ANOVA Post Hoc Studies

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Data Exploration by using Scatter Plots
- Pearson and Spearmen Correlations

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Fit Simple Linear Regression Model
- Assumptions of Linear Regression Model
- Analyze the output of the Linear Regression
- Producing Predicted Values
- Difference between Simple Linear Regression and Multiple Linear Regression Models
- Fit Multiple Linear Regression Model
- Stepwise Regression/Model Selection Techniques

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Residual Analysis
- Influential Observation
- Difference between Influential Observation and Outliers
- Collinearity Diagnostics

Model Building Process using Python

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Examining Distributions
- Test of Associations by using the chi-square test
- Fisher's Exact p-values for Pearson Chi-square test

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Odds and Odds Ratio
- Simple Logistic Regression
- Multiple Logistic Regression with categorical predictors
- Analyze the output of Logistic Regression

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Apply the principles of honest assessment to model performance measurement
- Rare event adjustments
- Assess classifier performance using the confusion matrix
- Model selection and validation using training and validation data
- Create and interpret graphs (ROC, lift, and gains charts) for model comparison and selection
- Establish effective decision cut-off values for scoring

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Introduction to Decision Tree Modeling
- Model essential for Decision Tree Models
- Decision Tree Model Development by using CHAID,Entropy/Information Gain, and Gini
- Decision Tree Model Tuning

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Introduction to Boosting
- Example of Boosting
- Regression Decision Tree
- Gradient Boosted Trees Regression.

Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS

- Introduction to Time Series Forecasting
- Component Factors affecting Time Series
- Moving Average (MA)
- Exponential Smoothing
- Trend Fitting Models (Linear trend, Quadratic trend, and Exponential trend)
- Autoregressive Integrated Moving Average (ARIMA) Model
- Vector Autoregression (VAR) Model
- Autoregressive Conditional Heteroskedasticity (ARCH) Model
- Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Model
- Long Short-Term Memory (LSTM) Model

## Project Content

Domain Risk Management

Problem Statement As an analyst, you need to advise your client to decide which mutual fund risk category should invest in.

Topic Descriptive Analytics, Distributions, and Visualization

Domain Manufacturing/Inventory Management

Problem Statement As a manager/supervisor of a company, you need to measure the effectiveness of the production of cereal boxes. The aim is to analyze whether or not the cereal boxes' weight is as per company specifications.

Topic Hypothesis Testing (One-Sample tests)

Domain Marketing/Retail

Problem Statement As a regional sales manager of a company, you need to analyze the mean sales comparison between two types of displays of products in the retail store. The aim is to decide whether or not the Promotional display of the product is more effective than the Normal display of the product. This helps management to decide the display location of the product in a store that will maximize sales

Topic Hypothesis Testing (Two-Sample tests)

Domain Clinical

Problem Statement Before you launch the new drug in the market, you need to analyze the effect of new drug and its different doses on the blood pressure of the human body

Topic Analysis of Variance (ANOVA Models)

Domain Physiology

Problem Statement In exercise physiology, an objective measure of aerobic fitness is how effectively the body can absorb and use oxygen during their 1.5 miles run. Factors affecting oxygen consumption are runtime, age, and gender, run pulse, rest pulse, and so on. The aim is to identify the key factors affecting oxygen consumption during a run.

Topic Analysis of Variance (EDA and Linear Regression Models)

Domain Event Analysis

Problem Statement On the 14th of April, the Titanic hit an iceberg and sank. There were 1517 fatalities from different age groups, class (1, 2, and 3), and gender. The objective is to measure how all these factors are associated with the survival status of passengers.

Topic Odds, Odds Ratio, Chi-Square tests, Ordinal associations, and Logistic Regression Model

Domain Marketing

Problem Statement A target marketing campaign for a bank was undertaken to identify a segment of customers who are likely to respond to an insurance product. Here, the target variable is whether or not the customers bought insurance product and it depends on factors like Product usage in three months, demographics, transaction patterns as like deposit amount, checking account, a branch of the bank, Residential information (like urban, rural) and so on.

Topic Classification, Categorical Data Analysis, Logistic regression, Decision Tree and Gradient Boosting (XGBOOST)

Domain Financial Analyst

Problem Statement Forecast the revenues of three companies (Eastman Kodak, Cabot Corporation, and Wal-Mart) in order to better evaluate investment opportunities for your client.

Topic Moving Average, Exponential Smoothing, Trend Fitting Models and ARIMA

Domain Economist

Problem Statement Is it Money Supply that “causes” the interest rates OR Interest rates that “causes” the Money Supply

Topic Vector Autoregression (VAR) Model

Domain Economist

Problem Statement Volatility Forecasting for U.S./U.K. exchange rates

Topic Volatility Forecasting for U.S./U.K. exchange rates

Domain Financial Analyst

Problem Statement Long-Term forecasting of BitCoin prices

Topic Long-Short Term Memory (LSTM) Model