There exist several alternatives to decision trees for data exploration, such as neural networks, nearest neighbor methods and regression analysis. Several researchers have compared trees to these other methods on specific problems.
An early study comparing machine learning methods for learning from examples can be found in . Comparisons of symbolic and connectionist methods can be found in [373,327]. Quinlan empirically compared decision trees to genetic classifiers  and to neural networks . Thrun et al.  compared several learning algorithms on simulated Monk's problems. Palvia and Gordon  compared decision tables, decision trees and decision rules, to determine which formalism is best for decision analysis.
Multilayer perceptrons and CART (with and without linear combinations)  are compared in  to find that there is not much difference in accuracy. Similar conclusions were reached in  when ID3  and backpropagation were compared. Talmon et al.  compared classification trees and neural networks for analyzing electrocardiograms (ECG) and concluded that no technique is superior to the other. In contrast, ID3 is adjudged to be slightly better than connectionist and Bayesian methods in . Brown et al.  compared backpropagation neural networks with decision trees on three problems that are known to be multimodal. Their analysis indicated that there was not much difference between both methods, and that neither method performed very well in its ``vanilla'' state. The performance of decision trees improved in  when multivariate splits were used, and backpropagation networks did better with feature selection.
Giplin et al.  compared stepwise linear discriminant analysis, stepwise logistic regression and CART  to three senior cardiologists, for predicting the problem of predicting whether a patient would die within a year of being discharged after an acute myocardial infarction. Their results showed that there was no difference between the physicians and the computers, in terms of the prediction accuracy. Kors and Van Bemmel  compared statistical multivariate methods with heuristic decision tree methods, in the domain of electrocardiogram (ECG) analysis. Their comparisons show that decision tree classifiers are more comprehensible and flexible to incorporate or change existing categories. Pizzi and Jackson  compare an expert systems developed using traditional knowledge engineering methods to Quinlan's ID3  in the domain of tonsillectormy. Comparisons of CART to multiple linear regression and discriminant analysis can be found in  where it is argued that CART is more suitable than the other methods for very noisy domains with lots of missing values.
Comparisons between decision trees and statistical methods like linear discriminant function analysis and automatic interaction detection (AID) are given in , where it is argued that machine learning methods sometimes outperform the statistical methods and so should not be ignored. Feng et al.  present a comparison of several machine learning methods (including decision trees, neural networks and statistical classifiers) as a part of the European Statlog project. Their main conclusions are that (1) no method seems uniformly superior to others, (2) machine learning methods seem to be superior for multimodal distributions, and (3) statistical methods are computationally the most efficient.
Long et al.  compared Quinlan's C4  to logistic regression on the problem of diagnosing acute cardiac ischemia, and concluded that both methods came fairly close to the expertise of the physicians. In their experiments, logistic regression outperformed C4. Curram and Mingers  compare decision trees, neural networks and discriminant analysis on several real world data sets. Their comparisons reveal that linear discriminant analysis is the fastest of the methods, when the underlying assumptions are met, and that decision trees methods overfit in the presence of noise. Dietterich et al.  argue that the inadequacy of trees for certain domains may be due to the fact that trees are unable to take into account some statistical information that is available to other methods like neural networks. They show that decision trees perform significantly better on the text-to-speech conversion problem when extra statistical knowledge is provided.