13th ACIS International Conference on Software Engineering,
Artificial Intelligence, Networking and Parallel/Distributed
Computing (SNPD2012), August 0810, 2012, Kyoto, Japan
Studies in Computational Intelligence, Volume 443,
1325, DOI: 10.1007/9783642321726_2,@2013 Springer



The
bump hunting, proposed by Friedman and Fisher, has become important
in many fields. Suppose that we are interested in finding regions
where target points are denser than other regions. Such dense regions
of target points are called the bumps, and finding them is called
bump hunting. By prespecifying a pureness rate in advance, a maximum
capture rate could be obtained. Then, a tradeoff curve between
the two can be constructed. Thus, to find the bump regions is equivalent
to construct the tradeoff curve. When we adopt simpler boundary
shapes for the bumps such as the union of boxes located parallel
to some explanation variable axes, it would be convenient to adopt
the binary decision tree. Since the conventional binary decision
tree, e.g., CART (Classification and Regression Trees), will not
provide the maximum capture rates, we use the genetic algorithm
(GA), specified to the tree structure, the treeGA. So far, we
assessed the accuracy for the tradeoff curve in typical fundamental
cases that may be observed in real customer data cases, and found
that the proposed treeGA can construct the effective tradeoff
curve which is close to the optimal one. In this paper, we further
investigate the prediction accuracy of the treeGA by comparing
the tradeoff curve obtained by using the treeGA with that obtained
by using the PRIM (Patient Rule Induction Method) proposed by Friedman
and Fisher. We have found that the treeGA reveals the superiority
over the PRIM in some cases. 




data
mining, bump hunting, tradeoff curve, genetic algorithm,
CART, PRIM


