StringTree

StringTree()

A class for binary classification of strings with regular expressions.

Each node is an instance of the PatternNode class. It contains a regular expression and metadata.

Attributes:

Initialize a StringTree object.

build

build(strings, labels, min_precision=0.5, min_token_length=1, max_patterns=None, min_matches_leaf=1, min_strings_leaf=1, verbose=False)

Build a StringTree.

For the StringTree object being used, create nodes and corresponding patterns. Use provided strings and labels.

Parameters:

  • strings (list[str]) –

    List of strings.

  • labels (list[int]) –

    List of labels (0 or 1).

  • min_precision (float, default: 0.5 ) –

    The minimal precision of a pattern in the tree.

  • min_token_length (int, default: 1 ) –

    The initial length of the pattern.

  • max_patterns (int, default: None ) –

    The highest amount of patterns. Once the method finds more, it stops.

  • min_matches_leaf (int, default: 1 ) –

    The minimal amount of matches in one node.

  • min_strings_leaf (int, default: 1 ) –

    The minimal amount of strings in one node.

  • verbose (bool, default: False ) –

    If to provide additinal text output.

filter

filter(strings, return_nodes=False)

Return strings matching the tree and corresponding nodes.

A string matches a tree if it matches at least one node.

Parameters:

  • strings (list[str]) –

    List of strings.

  • return_nodes (bool, default: False ) –

    Flag indicating if to return nodes corresponding to the matched strings. If False, only matched strings are returned.

Returns:

  • matches ( list[int] ) –

    List containing matching strings.

  • matched_nodes ( list[PatternNode] ) –

    List consisting of PatternNodes of matching strings. Returned only if return_nodes is True.

get_leaves

get_leaves()

Get leaves attribute.

get_nodes_by_label

get_nodes_by_label(label)

Get nodes where the label is the most probable.

match

match(strings, return_nodes=False)

Return flags indicating if strings match the tree.

A string matches a tree if it matches at least one node.

Parameters:

  • strings (list[str]) –

    List of strings.

  • return_nodes (bool, default: False ) –

    Flag indicating if to return nodes corresponding to the matched strings. If False, only matched strings are returned.

Returns:

  • matches ( list[int] ) –

    List containing 1 (match) and 0 (no match) for each string.

  • matched_nodes ( list[PatternNode] ) –

    List consisting of PatternNodes of matching strings. If not match found, None is retured. Returned only if return_nodes is True.

precision_score

precision_score(strings, labels)

Calculate a precision score for given strings and labels.

predict_label

predict_label(strings, return_nodes=False)

Predict labels for given strings.

recall_score

recall_score(strings, labels)

Calculate a recall score for given strings and labels.

set_leaves

set_leaves(leaves)

Set leaves attribute.