StringTree
StringTree()
A class for binary classification of strings with regular expressions.
Each node is an instance of the PatternNode class. It contains a regular expression and metadata.
Attributes:
-
root(PatternNode) –The root PatternNode.
-
leaves(list[PatternNode]) –List of all nodes.
Initialize a StringTree object.
build
build(strings, labels, min_precision=0.5, min_token_length=1, max_patterns=None, min_matches_leaf=1, min_strings_leaf=1, verbose=False)
Build a StringTree.
For the StringTree object being used, create nodes and corresponding patterns. Use provided strings and labels.
Parameters:
-
strings(list[str]) –List of strings.
-
labels(list[int]) –List of labels (0 or 1).
-
min_precision(float, default:0.5) –The minimal precision of a pattern in the tree.
-
min_token_length(int, default:1) –The initial length of the pattern.
-
max_patterns(int, default:None) –The highest amount of patterns. Once the method finds more, it stops.
-
min_matches_leaf(int, default:1) –The minimal amount of matches in one node.
-
min_strings_leaf(int, default:1) –The minimal amount of strings in one node.
-
verbose(bool, default:False) –If to provide additinal text output.
filter
filter(strings, return_nodes=False)
Return strings matching the tree and corresponding nodes.
A string matches a tree if it matches at least one node.
Parameters:
-
strings(list[str]) –List of strings.
-
return_nodes(bool, default:False) –Flag indicating if to return nodes corresponding to the matched strings. If False, only matched strings are returned.
Returns:
-
matches(list[int]) –List containing matching strings.
-
matched_nodes(list[PatternNode]) –List consisting of PatternNodes of matching strings. Returned only if return_nodes is True.
get_leaves
get_leaves()
Get leaves attribute.
get_nodes_by_label
get_nodes_by_label(label)
Get nodes where the label is the most probable.
match
match(strings, return_nodes=False)
Return flags indicating if strings match the tree.
A string matches a tree if it matches at least one node.
Parameters:
-
strings(list[str]) –List of strings.
-
return_nodes(bool, default:False) –Flag indicating if to return nodes corresponding to the matched strings. If False, only matched strings are returned.
Returns:
-
matches(list[int]) –List containing 1 (match) and 0 (no match) for each string.
-
matched_nodes(list[PatternNode]) –List consisting of PatternNodes of matching strings. If not match found, None is retured. Returned only if return_nodes is True.
precision_score
precision_score(strings, labels)
Calculate a precision score for given strings and labels.
predict_label
predict_label(strings, return_nodes=False)
Predict labels for given strings.
recall_score
recall_score(strings, labels)
Calculate a recall score for given strings and labels.
set_leaves
set_leaves(leaves)
Set leaves attribute.