Back to Lexicon

Recursive partitioning and Regression Tree (‘RPART’)

\ Recursive \ pɑrˈtɪʃənɪŋ \ ænd \ rəˈgrɛʃən \ tri \ (ɑr-pi-eɪ-ɑr-ti) \

A statistical hierarchical clustering algorithm used to group together values of a rating factor into groups exhibiting similar characteristics and so that the differences between groups become statistically significant.

The RPART approach is a ‘top down’ approach, known as Divisive Hierarchical Clustering, where the variable is first split into two groups, then one of these groups is split into two, and so on until there is evidence of limited benefit of further splitting in terms of goodness of fit vs complexity of the model.

At each stage, splittings are identified based upon a minimum required improvement in how much of the variation within the data is explained by an increase in the number of groups, subject to minimum acceptable group sizes. Having constructed a ‘tree’ using this approach, we then ‘prune’ the tree (reducing the number of groups) to avoid overfitting to the data – this uses a technical called k-fold Cross validation.

Keep exploring our Lexicon of Longevity
Back to Lexicon