Summary
We are looking for strong candidates to work on this exciting project described below!
Bayesian classification and regression tree (BCART) and its ensemble version – Bayesian additive regression tree (BART) models – are powerful semiparametric learning techniques for modelling nonlinear regression functions that outperform many other machine learning methods. Classical BCART and BART models were proposed for continuous (Gaussian) and binary response variables (see, [1-3]), and over the years these have been extended to analyse a large class of response variables, including count data (see, [4]). Their excellent empirical performance has also motivated works on their theoretical foundations (see, [5]).
One direction of research on this project is to try to understand the mechanism of the BCART and BART methods from a theoretical point of view. Another direction of research is to explore extended BCART and BART models with applications to areas such as insurance pricing (see [6]) or/and spatial-temporal data analysis (e.g., environmental or climate data modelling). Some key questions to be explored in these applications include feature selection, choice of loss functions, class-imbalance problem with zeros, model stability, and interpretability.
References:
[1] H. A. Chipman, E. I. George, and R. E. McCulloch, “Bayesian CART model search,” Journal of the American Statistical Association, vol. 93, no. 443, pp. 935–948, 1998.
[2] D. G. Denison, B. K. Mallick, and A. F. Smith, “A Bayesian CART algorithm,” Biometrika, vol. 85, no. 2, pp. 363–377, 1998.
[3] H. A. Chipman, E. I. George, R. E. McCulloch, et al., “BART: Bayesian additive regression trees,” The Annals of Applied Statistics, vol. 4, no. 1, pp. 266–298, 2010.
[4] J. S. Murray, “Log-linear Bayesian additive regression trees for multinomial logistic and count regression models,” Journal of the American Statistical Association, vol. 116, no. 534, pp. 756–769, 2021.
[5] V. Rockova, S. Van der Pas, et al., “Posterior concentration for Bayesian regression trees and forests,” Annals of Statistics, vol. 48, no. 4, pp. 2108–2131, 2020.
[6] Y. Zhang, L. Ji, Aivaliotis, and C.C. Taylor, ‘’Bayesian CART models for insurance claims frequency”. 2023. Available at https://arxiv.org/pdf/2303.01923.pdf
