Abstract
Logit models are popular tools for analyzing discrete choice and ranking data. The models assume that judges rate each item with a measurable utility, and the ordering of a judge’s utilities determines the outcome. Logit models have been proven to be powerful tools, but they become difficult to interpret if the models contain nonlinear and interaction terms. We extended the logit models by adding a decision tree structure to overcome this difficulty. We introduced a new method of tree splitting variable selection that distinguishes the nonlinear and linear effects, and the variable with the strongest nonlinear effect will be selected in the view that linear effect is best modeled using the logit model. Decision trees built in this fashion were shown to have smaller sizes than those using loglikelihood-based splitting criteria. In addition, the proposed splitting methods could save computational time and avoid bias in choosing the optimal splitting variable. Issues on variable selection in logit models are also investigated, and forward selection criterion was shown to work well with logit tree models. Focused on ranking data, simulations are carried out and the results showed that our proposed splitting methods are unbiased. Finally, to demonstrate the feasibility of the logit tree models, they were applied to analyze two datasets, one with binary outcome and the other with ranking outcome. Copyright © 2015 Springer-Verlag Berlin Heidelberg.
Original language | English |
---|---|
Pages (from-to) | 799-827 |
Journal | Computational Statistics |
Volume | 31 |
Issue number | 2 |
Early online date | 13 Jun 2015 |
DOIs | |
Publication status | Published - Jun 2016 |
Citation
Yu, P. L. H., Lee, P. H., Cheung, S. F., Lau, E. Y. Y., Mok, D. S. Y., & Hui, H. C. (2016). Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians. Computational Statistics, 31(2), 799-827. doi: 10.1007/s00180-015-0588-4.Keywords
- Binary data
- Decision tree
- Multinomial data
- Ranking data
- Variable selection