Resampling is any technique of generating a new sample from an existing dataset. There is a variety of methods for estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping). Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests) Validating models by using random subsets (bootstrapping, cross-validation). Was the above useful? Please share with others on social media. If you want to look for more information, check some free online courses […]

## What is Regularization?

What is Regularization? In Machine Learning, very often the task is to fit a model to a set of training data and use the fitted model to make predictions or classify new (out of sample) data points. Sometimes model fits the training data very well but does not well in predicting out of sample data points. A model may be too complex and overfit or too simple and underfit, either way giving poor predictions. Regularization is a way to avoid overfitting by penalizing high regression coefficients, it can be seen as a way to control the trade-off betweenÂ bias and variance […]

## What is Regression?

Regression is a statistical measure used that attempts to determine the strength of the relationship between one dependent variable and a series of other changing (independent) variables. The two basic types of regression are linear regression and multiple linear regression, although there are non-linear regression methods for more complicated data and analysis. Linear regression uses one independent variable to explain or predict the outcome of the dependent variable, while multiple regression uses two or more independent variables to predict the outcome. Regression can help predict sales for a company based on weather, previous sales, GDP growth or other conditions. Regression […]

## What is Random Sampling?

Random sampling. In this technique, each member of the population has an equal chance of being selected as the subject. The entire process of sampling is done in a single step with each subject selected independently of the other members of the population. There are many methods to proceed with simple random sampling. A sample chosen randomly is meant to be an unbiased representation of the total population. If for some reasons, the sample does not represent the population, the variation is called a sampling error. Was the above useful? Please share with others on social media. If you want […]

## What is Random Forest?

Random Forest or Random Decision Forest are an ensemble learning method for classification, regression, and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees habit of overfitting to their training set. The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which, in Ho’s formulation, is a way to implement the “stochastic discrimination” approach to the classification proposed by Eugene […]

## What is Radial Basis Function(RBF) network?

Radial Basis Function(RBF) network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment. Was the above useful? Please share with others on social media. If you want to look for more information, check some free online courses available at […]

## What is QQ plot?

QQ plots – Quantile-Quantile plots are a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. The 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value. A 45-degree reference line is also plotted. If the two sets come from a population with the same […]

## What is Q-learning?

Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action selection policy for any given (finite) Markov decision process (MDP). It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. A policy is a rule that the agent follows in selecting actions, given the state it is in. When such an action-value function is learned, the optimal policy can be constructed by simply selecting the action with the highest value in each state. One of […]

## What is Pruning?

Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide a little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. One of the questions that arise in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to […]

## What is Probabilistic Neural Network (PNN)?

Probabilistic Neural Network (PNN) is kind of feedforward neural network. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input data is estimated and Bayesâ€™ rule is then employed to allocate the class with highest posterior probability to new input data. By this method, the probability of misclassification is minimized. This type of ANN was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. In a PNN, the operations […]