next up previous
Next: 2.8 Selected real-world applications Up: 2 Existing work on Previous: 2.6.4 Assumptions and biases

2.7 Comparisons with other exploration methods

 

There exist several alternatives to decision trees for data exploration, such as neural networks, nearest neighbor methods and regression analysis. Several researchers have compared trees to these other methods on specific problems.

An early study comparing machine learning methods for learning from examples can be found in [77]. Comparisons of symbolic and connectionist methods can be found in [373,327]. Quinlan empirically compared decision trees to genetic classifiers [294] and to neural networks [298]. Thrun et al. [349] compared several learning algorithms on simulated Monk's problems. Palvia and Gordon [278] compared decision tables, decision trees and decision rules, to determine which formalism is best for decision analysis.

Multilayer perceptrons and CART (with and without linear combinations) [29] are compared in [9] to find that there is not much difference in accuracy. Similar conclusions were reached in [103] when ID3 [292] and backpropagation were compared. Talmon et al. [345] compared classification trees and neural networks for analyzing electrocardiograms (ECG) and concluded that no technique is superior to the other. In contrast, ID3 is adjudged to be slightly better than connectionist and Bayesian methods in [340]. Brown et al. [33] compared backpropagation neural networks with decision trees on three problems that are known to be multimodal. Their analysis indicated that there was not much difference between both methods, and that neither method performed very well in its ``vanilla'' state. The performance of decision trees improved in [33] when multivariate splits were used, and backpropagation networks did better with feature selection.

Giplin et al. [123] compared stepwise linear discriminant analysis, stepwise logistic regression and CART [29] to three senior cardiologists, for predicting the problem of predicting whether a patient would die within a year of being discharged after an acute myocardial infarction. Their results showed that there was no difference between the physicians and the computers, in terms of the prediction accuracy. Kors and Van Bemmel [191] compared statistical multivariate methods with heuristic decision tree methods, in the domain of electrocardiogram (ECG) analysis. Their comparisons show that decision tree classifiers are more comprehensible and flexible to incorporate or change existing categories. Pizzi and Jackson [288] compare an expert systems developed using traditional knowledge engineering methods to Quinlan's ID3 [292] in the domain of tonsillectormy. Comparisons of CART to multiple linear regression and discriminant analysis can be found in [43] where it is argued that CART is more suitable than the other methods for very noisy domains with lots of missing values.

Comparisons between decision trees and statistical methods like linear discriminant function analysis and automatic interaction detection (AID) are given in [232], where it is argued that machine learning methods sometimes outperform the statistical methods and so should not be ignored. Feng et al. [99] present a comparison of several machine learning methods (including decision trees, neural networks and statistical classifiers) as a part of the European Statlog gif project. Their main conclusions are that (1) no method seems uniformly superior to others, (2) machine learning methods seem to be superior for multimodal distributions, and (3) statistical methods are computationally the most efficient.

Long et al. [217] compared Quinlan's C4 [297] to logistic regression on the problem of diagnosing acute cardiac ischemia, and concluded that both methods came fairly close to the expertise of the physicians. In their experiments, logistic regression outperformed C4. Curram and Mingers [67] compare decision trees, neural networks and discriminant analysis on several real world data sets. Their comparisons reveal that linear discriminant analysis is the fastest of the methods, when the underlying assumptions are met, and that decision trees methods overfit in the presence of noise. Dietterich et al. [75] argue that the inadequacy of trees for certain domains may be due to the fact that trees are unable to take into account some statistical information that is available to other methods like neural networks. They show that decision trees perform significantly better on the text-to-speech conversion problem when extra statistical knowledge is provided.



next up previous
Next: 2.8 Selected real-world applications Up: 2 Existing work on Previous: 2.6.4 Assumptions and biases



Sreerama Murthy
Thu Oct 19 17:40:24 EDT 1995