jpt.trees
© Copyright 2021-23, Mareike Picklum, Daniel Nyga.
Classes
Wrapper for the nodes of the |
|
Represents an inner (decision) node of the the |
|
Represents a leaf node of the |
|
Implementation of Joint Probability Trees (JPTs). |
Module Contents
- class jpt.trees.Node(idx: int, parent: DecisionNode | None = None)
Wrapper for the nodes of the
jpt.learning.trees.Tree.Create a Node :param idx: the identifier of a node :param parent: the parent of this node
- idx
- parent: DecisionNode = None
- samples = 0
- _path = []
- property path: jpt.variables.VariableMap
- Returns:
the path of this Node as VariableMap
- consistent_with(evidence: jpt.variables.VariableMap) bool
Check if the node is consistent with the variable assignments in evidence.
- Parameters:
evidence – A VariableMap that maps to singular values (numeric or symbolic) or ranges (continuous set, set)
- Returns:
bool
- format_path(fmt: str = None, precision: int = None) str
- abstract number_of_parameters() int
- __str__() str
- __repr__() str
- depth() int
- Returns:
the depth of this node
- contains(samples: numpy.ndarray, variable_index_map: jpt.variables.VariableMap) numpy.array
Check if this node contains the given samples in parallel.
- Parameters:
samples – The samples to check
variable_index_map – A VariableMap mapping to the indices in ‘samples’
- Returns:
numpy array with 0s and 1s
- class jpt.trees.DecisionNode(idx: int | None, variable: jpt.variables.Variable, parent: 'DecisionNode' or None = None)
Bases:
NodeRepresents an inner (decision) node of the the
jpt.learning.trees.Tree.Create a DecisionNode
- Parameters:
idx – The identifier of a node
variable – The split variable
parent – The parent of this node
- _splits = None
- variable
- children: None or List[Node] = None
- __hash__()
- __eq__(o) bool
- to_json() Dict[str, Any]
- Returns:
The DecisionNode as a json serializable dict.
- static from_json(tree: JPT, data: Dict[str, Any]) DecisionNode
Construct a Decision node from a json dict. :param tree: The tree to mount the node in :param data: The data describing the members of the node :return: the constructed and mounted DecisionNode
- property splits: List
- set_child(idx: int, node: Node) None
Set the child at
indexof this Node. Also extend the path of the child node with this nodes’ path. :param idx: the idx of the child (0 for left, 1 for right) :param node: The child
- str_edge(idx_split: int) str
Convert the edge to child at
idxto a string. :param idx_split: The index of the child :return: str
- property str_node: str
- recursive_children()
- Returns:
All children of this node
- __str__() str
- __repr__() str
- number_of_parameters() int
- Returns:
The number of relevant parameters in this decision node. 2 are parameters necessary since it the variable and its splitting value are sufficient to describe this computation unit.
- class jpt.trees.Leaf(idx: int, parent: DecisionNode or None = None, prior: float or None = None)
Bases:
NodeRepresents a leaf node of the
jpt.trees.Tree.Construct a Leaf :param idx: the index of this leaf :param parent: the parent of this leaf :param prior: the prior of this leaf (relative number of samples in this leaf)
- distributions
- prior = None
- s_indices = []
- property str_node: str
- applies(query: jpt.variables.VariableAssignment) bool
Checks whether this leaf is consistent with the given
query. :param query: the query to check :return: bool
- property value
- recursive_children()
- Returns:
All children of this node
- __str__() str
- __repr__() str
- __hash__()
- to_json() Dict[str, Any]
- Returns:
The DecisionNode as a json serializable dict.
- static from_json(tree: JPT, data: Dict[str, Any]) Leaf
Construct a Decision node from a json dict. :param tree: The tree to mount the node in :param data: The data describing the members of the node :return: the constructed and mounted DecisionNode
- __eq__(o) bool
- consistent_with(evidence: jpt.variables.VariableMap) bool
Check if the node is consistent with the variable assignments in evidence.
- Parameters:
evidence – A preprocessed VariableMap that maps to singular values (numeric or symbolic) or ranges (continuous set, set)
- path_consistent_with(evidence: jpt.variables.VariableMap) bool
Check if the path of this node is consistent with the variable assignments in evidence.
- Parameters:
evidence – A preprocessed VariableMap that maps to singular values (numeric or symbolic) or ranges (continuous set, set)
- probability(query: jpt.variables.VariableAssignment, dirac_scaling: float = 2.0, min_distances: jpt.variables.VariableMap = None) float
Calculate the probability of a (partial) query. Exploits the independence assumption.
- Parameters:
query (VariableMap) – A preprocessed VariableMap that maps to singular values (numeric or symbolic) or ranges (continuous set, set)
dirac_scaling (float) – the minimal distance between the samples within a dimension are multiplied by this factor if a durac impulse is used to model the variable.
min_distances (A VariableMap from numeric variables to floats or None) – A dict mapping the variables to the minimal distances between the observations. This can be useful to use the same likelihood parameters for different test sets for example in cross validation processes.
- _numeric_probability(variable: jpt.variables.NumericVariable, value, dirac_scaling: float = 2.0, min_distances: jpt.variables.VariableMap = None)
Calculate the probability of an arbitrary value for a numeric variable.
- Parameters:
variable – A numeric variable
dirac_scaling – the minimal distance between the samples within a dimension are multiplied by this factor if a durac impulse is used to model the variable.
min_distances – A dict mapping the variables to the minimal distances between the observations. This can be useful to use the same likelihood parameters for different test sets for example in cross validation processes.
- likelihood(queries: pandas.DataFrame, dirac_scaling: float = 2.0, min_distances: jpt.variables.VariableMap = None, single_likelihoods: bool = False, variables: Iterable[jpt.variables.Variable | str] = None) numpy.ndarray
Calculate the probability of a (partial) query. Exploits the independence assumption.
- Parameters:
single_likelihoods –
queries – An array-like object that represents variable assignments in value space.
dirac_scaling (float) – the minimal distance between the samples within a dimension are multiplied by this factor if a dirac impulse is used to model the variable.
min_distances (A VariableMap from numeric variables to floats or None) – A dict mapping the variables to the minimal distances between the observations. This can be useful to use the same likelihood parameters for different test sets for example in cross validation processes.
single_likelihoods – whether likelihoods of each variable shall be reported
variables – the variables indices to consider in the likelihood calculation
- copy() Leaf
Create a copy of this leaf. The copy is unaware of the tree and vice versa. Hence, not path or parent etc. is set. The copy only provides querying functionality.
- conditional_leaf(evidence: jpt.variables.VariableAssignment) Leaf
Create a leaf that is cropped to the values described in evidence.
- Parameters:
evidence – A VariableAssignment describing evidence.
- Returns:
The cropped leaf, that hos no parent, path, etc. set.
- mpe(minimal_distances: jpt.variables.VariableMap) tuple[jpt.variables.VariableMap, float]
Calculate the most probable explanation of this leaf as a fully factorized distribution.
- Returns:
the likelihood of the maximum as a float and the configuration as a VariableMap
- k_mpe() Iterator[jpt.variables.LabelAssignment]
Compute the
kmost probable explanations of this leaf. :return:
- number_of_parameters() int
- Returns:
The number of relevant parameters in this decision node. Leafs require 1 + the sum of all distributions parameters. The 1 extra parameter represents the prior.
- sample(amount) numpy.ndarray
Sample amount many samples from the leaf.
- Returns:
A numpy array of size (amount, self.variables) containing the samples.
- class jpt.trees.JPT(variables: list[jpt.variables.Variable], targets: list[str | jpt.variables.Variable] = None, features: list[str | jpt.variables.Variable] = None, min_samples_leaf: float | int = 1, min_impurity_improvement: float | None = None, max_leaves: int | None = None, max_depth: int | None = None, dependencies=None, min_eval_samples: float | int = 0)
Implementation of Joint Probability Trees (JPTs).
Create a JPT.
- Parameters:
variables – The variables represented by this model.
targets – The variables where the information gain will be computed on.
features – The variables where splits are chosen from.
min_samples_leaf – If int, the minimum number of samples required to form a leaf. If float, the minimum fraction of samples.
min_eval_samples – Minimum number of EVALUATION samples required in each child partition when split validation is active in
'evaluation'mode. Only enforced when asplit_validation_maskis passed tolearn()andsplit_validation_mode='evaluation'. If int, the absolute minimum. If a float in(0, 1), the minimum fraction of the total training rows (same convention asmin_samples_leaf).0disables the check (default).min_impurity_improvement – The minimal information gain to justify a split.
max_leaves – The maximum number of leaves (deprecated).
max_depth – The maximum depth the tree may have.
dependencies –
Specifies which targets depend on which features. Accepts three forms:
None: every target depends on every feature (default, fully connected).dict[Variable, list[Variable]]: explicit mapping from features to their dependent targets.A
DependencyDiscoveryinstance: a callable strategy that discovers dependencies from training data duringlearn(). The strategy is re-invoked on each call tolearn()and its configuration is preserved during serialization.
- logger
- _variables
- varnames: collections.OrderedDict[str, jpt.variables.Variable]
- _targets
- innernodes: dict[int, DecisionNode]
- priors: jpt.variables.VariableMap
- min_samples_leaf = 1
- min_eval_samples = 0
- _keep_samples = False
- min_impurity_improvement = 0
- minimal_distances: jpt.variables.VariableMap
- _numsamples = 0
- root = None
- max_leaves = None
- max_depth
- _reset() None
Delete all parameters of this model (not the hyperparameters)
- property variables: tuple[jpt.variables.Variable, Ellipsis]
- property targets: tuple[jpt.variables.Variable, Ellipsis]
- property features: tuple[jpt.variables.Variable, Ellipsis]
- property numeric_variables: tuple[jpt.variables.Variable, Ellipsis]
- property symbolic_variables: tuple[jpt.variables.Variable, Ellipsis]
- property integer_variables: tuple[jpt.variables.Variable, Ellipsis]
- property numeric_targets: tuple[jpt.variables.Variable, Ellipsis]
- property symbolic_targets: tuple[jpt.variables.Variable, Ellipsis]
- property integer_targets: tuple[jpt.variables.Variable, Ellipsis]
- property numeric_features: tuple[jpt.variables.Variable, Ellipsis]
- property symbolic_features: tuple[jpt.variables.Variable, Ellipsis]
- property integer_features: tuple[jpt.variables.Variable, Ellipsis]
- to_json() dict[str, Any]
Convert the tree to a JSON-serializable dictionary.
- static from_json(data: dict[str, Any], variables: Iterable[jpt.variables.Variable] | None = None) JPT
Construct a tree from a json dict.
- Data:
The JSON dictionary holding the serialized JPT data.
- Variables:
(optional) An iterable holding the already de-serialized variables the JPT shall be constructed with.
- __getstate__()
- __setstate__(state)
- __eq__(o) bool
- encode(samples: numpy.ndarray) numpy.ndarray
Get the leaf index that describes the partition of each sample. Only works for fully initialized samples, i. e. a matrix of arbitrary many rows but #variables many columns. :param samples: the samples to evaluate :return: A 1D numpy array of integers containing the leaf index of every sample.
- pdf(values: jpt.variables.VariableAssignment) float
Get the likelihood of one world :param values: A VariableMap mapping some variables to one value. :return: The likelihood as float
- infer(query: dict[jpt.variables.Variable | str, Any] | jpt.variables.VariableAssignment, evidence: dict[jpt.variables.Variable | str, Any] | jpt.variables.VariableAssignment = None, fail_on_unsatisfiability: bool = True) float | None
For each candidate leaf
lcalculate the number of samples in which query is true:(1)\[P(query|evidence) = \frac{p_q}{p_e}\](2)\[p_q = \frac{c}{N}\](3)\[c = \frac{\prod{F}}{x^{n-1}}\]where
Qis the set of variables in query, \(P_{l}\) is the set of variables that occur inl, \(F = \{v | v \in Q \wedge~v \notin P_{l}\}\) is the set of variables in the query that do not occur inl’s path, \(x = |S_{l}|\) is the number of samples inl, \(n = |F|\) is the number of free variables andNis the number of samples represented by the entire tree. reference to (1)- Parameters:
query (dict of {jpt.variables.Variable : jpt.learning.distributions.Distribution.value}) – the event to query for, i.e. the query part of the conditional P(query|evidence) or the prior P(query)
evidence (dict of {jpt.variables.Variable : jpt.learning.distributions.Distribution.value}) – the event conditioned on, i.e. the evidence part of the conditional P(query|evidence)
fail_on_unsatisfiability – whether an error is raised in case of unsatisfiable evidence or not.
- posterior(variables: list[jpt.variables.Variable | str] = None, evidence: dict[jpt.variables.Variable | str, Any] | jpt.variables.VariableAssignment = None, fail_on_unsatisfiability: bool = True, report_inconsistencies: bool = False) jpt.variables.VariableMap | None
Compute the posterior distribution of every variable in
variables. The result contains independent distributions. Be aware that they might not actually be independent.- Parameters:
variables – The query variables of the posterior to be computed
evidence – The evidence given for the posterior to be computed
fail_on_unsatisfiability – Rather or not an
Unsatisfiabilityerror is raised if the likelihood of the evidence is 0.report_inconsistencies – In case of an
Unsatisfiabilityerror, the exception raise will contain information about the variable assignments that caused the inconsistency.
- Returns:
jpt.trees.PosteriorResult containing distributions, candidates and weights
- expectation(variables: Iterable[jpt.variables.Variable] | None = None, evidence: jpt.variables.VariableAssignment | dict[str, numbers.Number | jpt.base.intervals.Interval | str] | None = None, fail_on_unsatisfiability: bool = True) jpt.variables.VariableMap | None
Compute the expected value of all
variables. If novariablesare passed, it defaults to all variables not passed asevidence.- Parameters:
variables – The variables to compute the expectation distributions on
evidence – The raw evidence applied to the tree
fail_on_unsatisfiability – Rather or not an
Unsatisfiabilityerror is raised if the likelihood of the evidence is 0.
- Returns:
VariableMap
- mpe(evidence: Dict[jpt.variables.Variable | str, Any] | jpt.variables.VariableAssignment = None, fail_on_unsatisfiability: bool = True) Tuple[list[jpt.variables.LabelAssignment], float] | None
Calculate the most probable explanation of all variables if the tree given the evidence.
- Parameters:
evidence – The evidence that is applied to the tree
fail_on_unsatisfiability – Rather or not an
Unsatisfiabilityerror is raised if the likelihood of the evidence is 0.
- Returns:
List of LabelAssignments that describes all maxima of the tree given the evidence. Additionally, a float describing the likelihood of all solutions is returned.
- kmpe(evidence: dict[jpt.variables.Variable | str, Any] | jpt.variables.VariableAssignment = None, fail_on_unsatisfiability: bool = True, k: int = 0) Iterator[Tuple[jpt.variables.LabelAssignment, float]] | None
Perform a k-MPE inference on this JPT under the given evidence.
k-MPE yields the
kmost probable explanation states in decreasing order.- Parameters:
evidence – The evidence to apply
fail_on_unsatisfiability – Rather to raise an Unsatisfiability Error on impossible evidence or not.
k – the number of solutions to return
- Returns:
An iterator with states ordered by likelihood.
- _preprocess_query(query: dict | jpt.variables.VariableMap, remove_none: bool = True, skip_unknown_variables: bool = False, allow_singular_values: bool = False, space: Literal['labels', 'values'] = 'labels') jpt.variables.LabelAssignment
Transform a query entered by a user into an internal representation that can be further processed.
- Parameters:
query – the raw query
remove_none – Rather to remove None entries or not
skip_unknown_variables – skip preprocessing for variable that does not exist in tree (may happen in multiple reverse tree inference). If False, an exception is raised; default: False
allow_singular_values – Allow singular values, such that they are transformed to the daomain specification of numeric variables but not transformed to intervals via the PPF.
- Returns:
the preprocessed VariableMap
- _check_variable_assignment(assignment: jpt.variables.VariableAssignment | None)
Check the variable assignment for compatibility with the variables of this JPT.
- apply(query: jpt.variables.VariableAssignment | dict[str, int | jpt.base.intervals.Interval | float | str]) Iterator[Leaf]
Iterator that yields leaves tha are consistent with a
query.A leaf is consistent with a query, if either of the following propositions hold for all constaints expressed by its path to the root node:
the variable is not constrained by the query
the variable is constrained by the query and the query is not consistent with the path
- Parameters:
query – the preprocessed query, either an instance of a subclass of
VariableAssignmentor a dict mapping variables to their respective labels.- Returns:
- __str__() str
- __repr__() str
- to_string() str
- fancy_tree() str
- pfmt() str
- Returns:
a pretty-format string representation of this JPT.
- _pfmt(node: Node, indent: int) str
- Parameters:
node – The starting node
indent – the indentation of each new level
- Returns:
a pretty-format string representation of this JPT from node downward.
- learn(data: pandas.DataFrame | numpy.ndarray, keep_samples: bool = False, close_convex_gaps: bool = False, verbose: bool = False, prune_or_split: Callable[[JPT, Any, numpy.ndarray, numpy.ndarray], bool] | None = None, multicore: int | None = None, split_validation_mask: numpy.ndarray | None = None, split_validation_mode: str = 'both') JPT
Fit the jpt to
data.- Parameters:
data ([[str or float or bool]]; (according to self.variables)) – The training examples (assumed in row-shape)
keep_samples – If true, stores the indices of the original data samples in the leaf nodes. For debugging purposes only. Default is false.
close_convex_gaps –
prune_or_split – A callable
(jpt, partition, indices, data) -> boolthat is invoked before each split. ReturnsTrueto prune (make the node a leaf) orFalseto allow splitting.indicesanddataare numpy arrays.multicore – The number of cores to use for learning. If
None, all available cores are used.verbose –
split_validation_mask – A boolean or uint8 array of length
len(data).True/1marks training samples whose feature values serve as candidate split points;False/0marks evaluation samples whose feature values are excluded from candidates. Target values of all samples always contribute to the impurity score (unlesssplit_validation_moderestricts this).Nonedisables split validation (default).split_validation_mode – Controls which targets contribute to the impurity score:
'both'(default) uses all targets,'training'uses only training targets,'evaluation'uses only evaluation targets.
- Returns:
the fitted model
- fit
- static sample(sample, ft)
- likelihood(data: pandas.DataFrame | numpy.ndarray, dirac_scaling: float = 2.0, min_distances: Dict = None, preprocess: bool = True, multicore: int | None = None, verbose: bool = False, single_likelihoods: bool = False, variables: Iterable[jpt.variables.Variable] = None) numpy.ndarray
Get the probabilities of a list of worlds. The worlds must be fully assigned with scalar values (no intervals or sets).
- Parameters:
variables – Which variables in consider for their likelihood computat
data – An array containing the worlds. The shape is (x, len(variables)).
dirac_scaling – the minimal distance between the samples within a dimension are multiplied by this factor if a durac impulse is used to model the variable.
min_distances – A dict mapping the variables to the minimal distances between the observations. This can be useful to use the same likelihood parameters for different test sets for example in cross validation processes.
verbose – print status information to the console
multicore – how many cores should be used (defaults to all)
preprocess – whether to apply the preprocessing to the data passed.
single_likelihoods – will not only return the overall likelihoods but also the likelihoods per variable
- Returns:
A np.ndarray with shape (x, ) containing the probabilities.
- parallel_likelihood(data: numpy.ndarray | pandas.DataFrame, dirac_scaling: float = 2.0, min_distances: Dict = None, single_likelihoods: bool = False) numpy.ndarray
Get the probabilities of a list of worlds. The worlds must be fully assigned with scalar values (no intervals or sets).
- Parameters:
data – An array containing the worlds. The shape is (x, len(variables)).
dirac_scaling – the minimal distance between the samples within a dimension are multiplied by this factor if a durac impulse is used to model the variable.
min_distances – A dict mapping the variables to the minimal distances between the observations. This can be useful to use the same likelihood parameters for different test sets for example in cross validation processes.
single_likelihoods – will not only return the overall likelihoods but also the likelihoods per variable
- Returns:
An np.array with shape (x, ) containing the probabilities.
- reverse(query: Dict, confidence: float = 0.05) List[tuple]
Determines the leaf nodes that match query best and returns them along with their respective confidence.
- Parameters:
query – a mapping from featurenames to either numeric value intervals or an iterable of categorical values
confidence – the confidence level for this MPE inference
- Returns:
a tuple of probabilities and jpt.trees.Leaf objects that match requirement (representing path to root)
- plot(title: str = 'unnamed', filename: str | None = None, directory: str = None, plotvars: Iterable[jpt.variables.Variable] = None, view: bool = True, max_symb_values: int = 10, nodefill: str = None, leaffill: str = None, alphabet: bool = False, verbose: bool = False, engine=None) str
Generates an SVG representation of the generated regression tree.
- Parameters:
title – title of the plot
filename – the name of the JPT (will also be used as filename; extension will be added automatically)
directory – the location to save the SVG file to
plotvars – the variables to be plotted in the graph
view – whether the generated SVG file will be opened automatically
max_symb_values – limit the maximum number of symbolic values that are plotted to this number
nodefill – the color of the inner nodes in the plot; accepted formats: RGB, RGBA, HSV, HSVA or color name
leaffill – the color of the leaf nodes in the plot; accepted formats: RGB, RGBA, HSV, HSVA or color name
alphabet – whether to plot symbolic variables in alphabetic order, if False, they are sorted by probability (descending); default is False
verbose –
engine – the rendering engine for the distribution plots in the leafs; either ‘matplotlib’ or ‘plotly’;
- Returns:
(str) the path under which the rendered image has been saved.
- pickle(fpath: str) None
Pickles the fitted regression tree to a file at the given location
fpath.- Parameters:
fpath – the location for the pickled file
- static calcnorm(sigma: float, mu: float, intervals)
Computes the CDF for a multivariate normal distribution.
- Parameters:
sigma – the standard deviation
mu – the expected value
intervals (list of matcalo.utils.utils.Interval) – the boundaries of the integral
- Returns:
- conditional_jpt(evidence: jpt.variables.VariableAssignment | None = None, fail_on_unsatisfiability: bool = True) JPT | None
Apply evidence on a JPT and get a new JPT that represent P(x|evidence).
- Parameters:
evidence – A VariableAssignment mapping the observed variables to there observed values
fail_on_unsatisfiability – whether an error is raised in case of unsatisfiable evidence or not
- multiply_by_leaf_prior(prior: dict[int, float]) JPT
Multiply every leafs prior by the given priors. This serves as handling the factor message from factor nodes. Be vary since this method overwrites the JPT in-place.
- Parameters:
prior – The priors, a Dict mapping from leaf indices to float
- Returns:
self
- save(file: str | IO, protocol: Literal['pickle', 'json'] = 'pickle') None
Write this JPT persistently to disk.
- Parameters:
file – either a string or file-like object.
protocol –
- dump
- dumps(protocol: Literal['pickle', 'json'] = 'pickle') bytes
- static load(file: str | IO, protocol: Literal['pickle', 'json'] = 'pickle') JPT
Load a JPT from disk.
- Parameters:
file – either a string or file-like object.
protocol –
- Returns:
the JPT described in
file
- depth() int
- Returns:
the maximal depth of a leaf in the tree.
- total_samples() int
- Returns:
the total number of samples represented by this tree.
- number_of_parameters() int
- Returns:
The number of relevant parameters in the entire tree
- bind(*arg, **kwargs) jpt.variables.LabelAssignment
Returns a
LabelAssignmentobject with the assignments passed.This method accepts one optional positional argument, which – if passed – must be a dictionary of the desired variable assignments.
Keyword arguments may specify additional variable, value pairs.
If a positional argument is passed, the following options may be passed in addition as keyword arguments:
- Parameters:
allow_singular_values – Allow singular values, such that they are transformed to the daomain specification of numeric variables but not transformed to intervals via the PPF.
space – Literal[‘values’, ‘labels’] Whether the variables shall be assigned to terms in value or label space of the JPT.
- moment(order: int = 1, center: jpt.variables.VariableAssignment | None = None, evidence: jpt.variables.VariableAssignment | None = None, fail_on_unsatisfiability: bool = True) jpt.variables.VariableMap | None
Calculate the order of each numeric/integer random variable given the evidence.
- Parameters:
order – The order of the moment
center – A VariableAssignment mapping each numeric/integer variable to some constant. If a variable has a constant, it will be interpreted as ‘c’ for the central moment. If it is not set, 0 will be used by default.
evidence – The evidence given for the posterior to be computed
fail_on_unsatisfiability – Rather or not an
Unsatisfiabilityerror is raised if the likelihood of the evidence is 0.
- get_hyperparameters_dict() dict[str, Any]
Get all hyperparameters as dict that can be used for MLFlow model tracking.
- prune(similarity_threshold: float, approximate: float | dict[jpt.variables.Variable | str, float] | jpt.variables.VariableMap | None = None) JPT
Prune this tree by repeatedly merging leaves with very similiar distributions.
- Parameters:
similarity_threshold – the average similarity of distributions in [0, 1] that two leaves must exhibit in order to be considered for a merge.
approximate –
- Returns: