Error Based Pruner

Module containing ErrorBasedPruner class.

class pruneabletree.pruner_ebp.ErrorBasedPruner(tree, ebp_confidence)[source]

Pruner for decision trees that uses the Error Based Pruning (EBP) technique [1].

Note that the given tree is modified in place. To keep a copy of the original, clone it first.

Parameters:
tree : Tree object

The underlying tree object of a DecisionTreeClassifier (e.g. clf.tree_).

ebp_confidence : float

The confidence value that determines the upper bound on the training error. It must be in the (0, 0.5] interval.

References

[1](1, 2) J Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

Methods

is_leaf(node_id) Returns True if the given node is a leaf.
leaf_prediction(node_id) Returns the class index of that the node with the given ID would predict.
num_actual_nodes(tree) Returns the actual number of nodes in the given tree after pruning.
num_instances(node_id[, y_idx]) Returns the number of instances in a given node.
num_leaves(tree) Returns the number of leaves.
prune() Prunes the given tree.
to_leaf(node_id, depths) Convert the node with the given ID to a leaf, pruning away its children.
prune()[source]

Prunes the given tree.

pruneabletree.pruner_ebp.add_errors(num_instances, observed_error, confidence)[source]

Computes estimated extra error for given total number of instances and error using normal approximation to binomial distribution (and continuity correction).

pruneabletree.pruner_ebp.normal_inverse(y0)[source]

Returns the value, <tt>x</tt>, for which the area under the Normal (Gaussian) probability density function (integrated from minus infinity to <tt>x</tt>) is equal to the argument <tt>y</tt> (assumes mean is zero, variance is one). <p> For small arguments <tt>0 < y < exp(-2)</tt>, the program computes <tt>z = sqrt( -2.0 * log(y) )</tt>; then the approximation is <tt>x = z - log(z)/z - (1/z) P(1/z) / Q(1/z)</tt>. There are two rational functions P/Q, one for <tt>0 < y < exp(-32)</tt> and the other for <tt>y</tt> up to <tt>exp(-2)</tt>. For larger arguments, <tt>w = y - 0.5</tt>, and <tt>x/sqrt(2pi) = w + w**3 R(w**2)/S(w**2))</tt>.

@param y0 the area under the normal pdf @return the z-value

pruneabletree.pruner_ebp.p1evl(x, coef, N)[source]

Evaluates the given polynomial of degree <tt>N</tt> at <tt>x</tt>. Evaluates polynomial when coefficient of N is 1.0. Otherwise same as <tt>polevl()</tt>.

Coefficients are stored in reverse order.

The function <tt>p1evl()</tt> assumes that <tt>coef[N] = 1.0</tt> and is omitted from the array. Its calling arguments are otherwise the same as <tt>polevl()</tt>. <p>

pruneabletree.pruner_ebp.polevl(x, coef, N)[source]

Evaluates the given polynomial of degree <tt>N</tt> at <tt>x</tt>. Coefficients are stored in reverse order. In the interest of speed, there are no checks for out of bounds arithmetic.