Bootstrap aggregation, also called bagging, is a random ensemble method designed to increase the stability and accuracy of models. It involves creating a series of models from the same training data set by randomly sampling with replacement the data.
6 secrets of building better models
In this series of videos we look at some often overlooked approaches that can be applied in the same way to a wide variety of algorithms and which may lead to better predictive accuracy. In all of our examples we’ll focus on improving the accuracy of a predictive model applied to a classification prediction problem.
Feature Engineering is really just a fancy term for creating new data. Very often we can help an algorithm build better models by preparing the input data in a way that allows it to detect a clearer signal in the often noisy data. In machine learning variables are often referred to as ‘features’, so feature engineering refers to the transformation of variables or the creation of new ones.
The idea of meta modelling is to build a predictive model using the predictions or scores generated by another model. By adding the predictive scores generated by an initial modelling algorithm to an existing pool of predictor fields, a second algorithm can then exploit these scores in to build a final more accurate model.
Split models or split population modelling is another technique that allows the user to build multiple models which can then be combined to create a single prediction. The idea with split modelling is that if the data represent different populations or contain separate groups that behave in very different ways, assuming that a single model can explain all the inherent variability across these distinct populations might be unreasonable.