6 secrets of building better models Archives

6 secrets of building better models part one: bootstrap aggregation

Bootstrap aggregation, also called bagging, is a random ensemble method designed to increase the stability and accuracy of models. It involves creating a series of models from the same training data set by randomly sampling with replacement the data.

6 secrets of building better models part two: boosting

By Jarlath Quinn

Boosting is another ensemble model-building method that was designed to help develop strong classification models from weak classifiers. Boosting methods focus on error (or misclassifications) that occur in prediction.

6 secrets of building better models part three: feature engineering

By Jarlath Quinn

Feature Engineering is really just a fancy term for creating new data. Very often we can help an algorithm build better models by preparing the input data in a way that allows it to detect a clearer signal in the often noisy data. In machine learning variables are often referred to as ‘features’, so feature engineering refers to the transformation of variables or the creation of new ones.

6 secrets of building better models part four: ensemble modelling

By Jarlath Quinn

Ensemble modelling refers to the practice of combining the predictions of separate models on the old principle that “two heads are better than one”. Ensemble methods can be particularly effective when combining models that have been created using completely different algorithms.

6 secrets of building better models part five: meta models

By Jarlath Quinn

The idea of meta modelling is to build a predictive model using the predictions or scores generated by another model. By adding the predictive scores generated by an initial modelling algorithm to an existing pool of predictor fields, a second algorithm can then exploit these scores in to build a final more accurate model.

6 secrets of building better models part six: split models

By Jarlath Quinn

Split models or split population modelling is another technique that allows the user to build multiple models which can then be combined to create a single prediction. The idea with split modelling is that if the data represent different populations or contain separate groups that behave in very different ways, assuming that a single model can explain all the inherent variability across these distinct populations might be unreasonable.