Bump Hunting using the Tree-GA

H. Hirose

Information, Vol.14, No.10, pp.3409-3424 (2011.10)

The bump hunting is to find the regions where points we are interested in are located more densely than elsewhere and are hardly separable from other points.
By specifying a pureness rate p for the points, a maximum capture rate c of the points could be obtained. Then, a trade-off curve between p and c can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve.
We adopt simpler boundary shapes for the bumps such as the box-shaped regions located parallel to variable axes for convenience. We use the genetic algorithm, specified to the tree structure, called the tree-GA, to obtain the maximum capture rates, because the conventional binary decision tree will not provide the maximum capture rates. Using the tree-GA tendency providing many local maxima for the capture rates, we can estimate the return period for the trade-off curve by using the extreme-value statistics.
We have assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases, and found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one.

Key Words
data mining, decision tree, genetic algorithm, bump hunting, extreme-value statistics, trade-off curve, accuracy, return period, evaluation.



Times Cited in Web of Science:

Cited in Books:

Mathematical Review: