The
bump hunting is to find the regions where points we are interested
in are located more densely than elsewhere and are hardly separable
from other points.
By specifying a pureness rate p for the points, a maximum capture rate c of
the points could be obtained. Then, a tradeoff curve between p and c can
be constructed. Thus, to find the bump regions is equivalent to construct the
tradeoff curve.
We adopt simpler boundary shapes for the bumps such as the boxshaped regions
located parallel to variable axes for convenience. We use the genetic algorithm,
specified to the tree structure, called the treeGA, to obtain the maximum capture
rates, because the conventional binary decision tree will not provide the maximum
capture rates. Using the treeGA tendency providing many local maxima for the
capture rates, we can estimate the return period for the tradeoff curve by using
the extremevalue statistics.
We have assessed the accuracy for the tradeoff curve in typical fundamental
cases that may be observed in real customer data cases, and found that the proposed
treeGA can construct the effective tradeoff curve which is close to the optimal
one.





data mining, decision tree, genetic algorithm,
bump hunting, extremevalue statistics, tradeoff curve,
accuracy, return period, evaluation.


