Predictive Modeling is a process through which a future outcome or behavior is predicted based on the past and current data at hand. It is a statistical analysis technique that enables the evaluation and calculation of the probability of certain results. Predictive modeling works by collecting data, creating a statistical model and applying probabilistic techniques to predict the likely outcome. Was the above useful? Please share with others on social media. If you want to look for more information, check some free online courses available at coursera.org, edx.org or udemy.com. Recommended reading list:

## What is Power Analysis?

Power Analysis is an important aspect of experimental design. It allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence. There are four parameters involved in a power analysis. The research must ‘know’ 3 and solve for the 4th. 1. Alpha: Probability of finding significance where there is none False positive Probability of a Type I error Usually set to.05 2. Power Probability of finding true significance True positive 1 – beta, where beta is: Probability of not finding significance when […]

## What is Paired t-Test?

Paired t-Test has its purpose in the testing is to determine whether there is statistical evidence that the mean difference between paired observations on a particular outcome is significantly different from zero. The Paired-Samples t Test is a parametric test. This test is also known as Dependent t-Test. Was the above useful? Please share with others on social media. If you want to look for more information, check some free online courses available at coursera.org, edx.org or udemy.com. Recommended reading list:

## What is Overfitting?

Overfitting in mathematics and statistics is one of the most common tasks consisting in attempts to fit a “model” to a set of training data, so as to be able to make reliable predictions on generally untrained data. In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfitting has poor predictive performance, as it overreacts to minor fluctuations in the training data. The potential for overfitting depends not only […]

## What is Out-Of-Sample Evaluation?

Out-Of-Sample Evaluation means to withhold some of the sample data from the model identification and estimation process, then use the model to make predictions for the hold-out data in order to see how accurate they are and to determine whether the statistics of their errors are similar to those that the model made within the sample of data that was fitted. Was the above useful? Please share with others on social media. If you want to look for more information, check some free online courses available at coursera.org, edx.org or udemy.com. Recommended reading list:

## What is Outlier?

Outlier is an observation point that is distant from other observations. An outlier may be due to variability in the measurement or it may indicate an experimental error, the latter are sometimes excluded from the data set. Outliers can occur by chance in any distribution, but they often indicate either measurement error or that the population has a heavy-tailed distribution. In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high skewness and that one should be very cautious in using tools […]

## What is Nearest Neighbor Algorithm?

Nearest Neighbor Algorithm was one of the first algorithms used to determine a solution to the traveling salesman problem. In it, the salesman starts in a random city and repeatedly visits the nearest city until all have been visited. It quickly yields a short tour, but usually not the optimal one. The nearest neighbor algorithm is easy to implement and executes quickly, but it can sometimes miss shorter routes which are easily noticed with human insight, due to its “greedy” nature. As a general guide, if the last few stages of the tour are comparable in length to the first […]

## What is Multiple Regression?

Multiple Regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The independent variables can be continuous or categorical (dummy coded as appropriate). Was the above useful? Please share with others on social media. If you want to look for more information, check some free online courses available at coursera.org, edx.org or udemy.com. Recommended reading list:

## What is Multinomial Logistic Regression?

Multinomial Logistic Regression is the linear regression analysis to conduct when the dependent variable is nominal with more than two levels. Thus it is an extension of logistic regression, which analyzes dichotomous (binary) dependents. Since the output of the analysis is somewhat different to the logistic regression’s output, multinomial regression is sometimes used instead. Like all linear regressions, the multinomial regression is a predictive analysis. Multinomial regression is used to describe data and to explain the relationship between one dependent nominal variable and one or more continuous-level(interval or ratio scale) independent variables. Was the above useful? Please share with others […]

## What is Model Fitting ?

Model Fitting is running an algorithm to learn the relationship between predictors and outcome so that you can predict the future values of the outcome. It proceeds in three steps: First, you need a function that takes in a set of parameters and returns a predicted data set. Second you need an ‘error function’ that provides a number representing the difference between your data and the model’s prediction for any given set of model parameters. Third, you need to find the parameters that minimize this difference. Once you set things up properly, this third step is easy. Was the above […]