jpt.distributions

Submodules

Classes

`Multinomial`	Abstract supertype of all symbolic domains and distributions.
`Numeric`	Wrapper class for numeric domains and distributions.
`ScaledNumeric`	Scaled numeric distribution represented by mean and variance.
`Integer`	Abstract supertype of all domains and distributions
`Gaussian`	Extension of `dnutils.stats.Gaussian`
`Bool`	Wrapper class for Boolean domains and distributions.
`Distribution`	Abstract supertype of all domains and distributions

Functions

`SymbolicType`(→ Type[Multinomial])
`NumericType`(→ Type[Numeric])
`IntegerType`(→ Type[Integer])

Package Contents

class jpt.distributions.Multinomial(**settings)

Bases: jpt.distributions.univariate.Distribution

Abstract supertype of all symbolic domains and distributions.

values: MultinomialValueMap = None

labels: MultinomialValueMap = None

_params: numpy.ndarray | None = None

to_json: types.MethodType

classmethod hash()

classmethod value2label(value: int | Iterable[int]) → jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]

classmethod label2value(label: jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]) → int | Collection[int]

classmethod pfmt(max_values=10, labels_or_values='labels') → str

Returns a pretty-formatted string representation of this class.

By default, a set notation with value labels is used. By setting labels_or_values to "values", the internal value representation is used. If the domain comprises more than max_values values, the middle part of the list of values is abbreviated by “…”.

property probabilities

n_values() → int

__contains__(item)

classmethod equiv(other)

static jaccard_similarity(*d: Multinomial) → float

Calculate the similarity of two or more Multinomial distributions.

\[\text{sim}(D_1, \ldots, D_n) = \frac{\sum_{x \in \text{dom}(D)} \min(p_i(x))} {\sum_{x \in \text{dom}(D)} \max(p_i(x))}\]

Adapted from the Jaccard coefficient:

\[\text{sim}(S_1, \ldots, S_n) = \frac{|\bigcap_{i}^{n} S_i|}{|\bigcup_{i}^{n} S_i|}\]

mover_dist(other: Multinomial) → float

similarity(other: Multinomial) → float

distance(other: Multinomial) → float

__getitem__(value)

__setitem__(label, p)

__eq__(other)

__str__()

__repr__()

sorted() → Iterable[Tuple[float, jpt.base.utils.Symbol]]: Generate a sequence of (label, prob) pairs representing this distribution, ordered by descending probability. :return:

_items() → Iterable[Tuple[float, int]]: Generate a sequence of (probability, value) pairs representing this distribution.

items() → Iterable[Tuple[float, jpt.base.utils.Symbol]]: Generate a sequence of (probability, label) pairs representing this distribution.

copy()

_pdf(value: int) → float

pdf(label: jpt.base.utils.Symbol) → float

p(event: jpt.base.utils.Symbol | Set[jpt.base.utils.Symbol] | List[jpt.base.utils.Symbol] | Tuple[jpt.base.utils.Symbol] | numpy.ndarray) → float

Compute the probability of a certain event given this multinomial distribution.

An event can be atomic random event, or a disjunction thereof, e.g. given the domain values {‘Head’, ‘Tail’}, event may be

dist.p(‘Head’) dist.p({‘Tail’}) dist.p({‘Head’, ‘Tail’})

Parameters:: event – the event in label space, the prob’ of which is to be computed.
Returns:: the probability of the event

_p(event: int | Set[int] | List[int] | Tuple[int] | numpy.ndarray) → float

Compute the probability of a certain event given this multinomial distribution.

See also Multinomial.p()

Parameters:: event – the event int value space, the prob’ of which is to be computed.
Returns:: the probability of the event

create_dirac_impulse(value: int) → Multinomial

Create a singular modification of this distribution object, in which the value has probability 1, whereas all other events have prob 0.

Parameters:: value – the singular value to get assigned prob 1.
Returns:: the created distribution object

_sample(n: int) → Iterable[int]: Returns n sample values according to their respective probability

_sample_one() → jpt.base.utils.Symbol: Returns one sample value according to its probability

_expectation() → Set[int]: Returns the value with the highest probability for this variable

expectation() → Set[jpt.base.utils.Symbol]: For symbolic variables the expectation is equal to the mpe. :return: The set of all most likely values

mpe() → Tuple[Set[jpt.base.utils.Symbol], float]

_mpe() → Tuple[Set[int], float]

Calculate the most probable configuration of this distribution in value space.

Returns:: The likelihood of the mpe itself as Set and the likelihood of the mpe as float

_k_mpe(k: int = None) → List[Tuple[Set[jpt.base.utils.Symbol], float]]

k_mpe(k: int | None = None) → List[Tuple[Set[jpt.base.utils.Symbol], float]]

mode() → Set

_mode() → Set

kl_divergence(other: Multinomial) → float: Compute the KL-divergence of this distribution to the other distribution. :param other: :return:

_crop(restriction: int | Collection[int]) → Multinomial

crop(restriction: jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]) → Multinomial

Apply a restriction to this distribution such that all values are in the given set.

Parameters:: restriction – The values to remain
Returns:: Copy of self that is consistent with the restriction

_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) → Multinomial

set(params: Iterable[numbers.Real]) → Multinomial

update(dist: Multinomial, weight: float) → Multinomial

Update this multinomial distribution with dist and weight.

The resulting distribution will be a weighted mean of self and dist, where self will have a weight of (1-weight), and dist will have a weight of weight.

Parameters:

dist – the update distribution
weight – the weight

Returns:

static merge(distributions: Iterable[Multinomial], weights: Iterable[float]) → Multinomial

Merge the distributions under consideration of weights.

Parameters:

distributions –
weights –

Returns:

classmethod type_to_json()

inst_to_json()

static type_from_json(data)

classmethod from_json(data)

is_dirac_impulse()

number_of_parameters() → int

Returns:: The number of relevant parameters in this decision node. 1 if this is a dirac impulse, number of parameters else

plot(engine=None, **kwargs) → Any

Plots the distribution using the given engine.

Parameters:

engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.
kwargs – The keyword arguments to pass to the engine as defined in the .plot_multinomial() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

class jpt.distributions.Numeric(**settings)

Bases: jpt.distributions.univariate.Distribution

Wrapper class for numeric domains and distributions.

PRECISION = 'precision'

values

labels

SETTINGS

_quantile: jpt.distributions.qpd.QuantileDistribution = None

to_json

classmethod hash()

__str__()

__getitem__(value)

__eq__(o: Numeric)

classmethod value2label(value: float | jpt.base.intervals.NumberSet) → float | jpt.base.intervals.NumberSet

classmethod label2value(label: numbers.Real | jpt.base.intervals.NumberSet) → numbers.Real | jpt.base.intervals.NumberSet

classmethod equiv(other)

property cdf

property pdf

property ppf

approximate_fast(eps: float)

_sample(n)

_sample_one()

number_of_parameters() → int

Returns:: The number of relevant parameters in this decision node. 1 if this is a dirac impulse, number of intervals times two else

_expectation() → float

_variance() → float

expectation() → float

variance() → float

quantile(gamma: numbers.Real) → numbers.Real

create_dirac_impulse(value): Create a dirac impulse at the given value aus quantile distribution.

is_dirac_impulse() → bool: Checks if this distribution is a dirac impulse.

mpe()

_mpe(value_transform: Callable | None = None)

Calculate the most probable configuration of this quantile distribution.

Returns:: The mpe itself as UnionSet and the likelihood of the mpe as float

_k_mpe(k: int | None = None) → List[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:: k – The number of solutions to generate, defaults to the maximum possible number.
Returns:: A list containing a tuple containing the likelihood and state in descending order.

k_mpe(k: int | None = None) → List[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:: k – The number of solutions to generate, defaults to the maximum possible number.
Returns:: A list containing a tuple containing the likelihood and state in descending order.

_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) → Numeric

fit

set(params: jpt.distributions.qpd.QuantileDistribution) → Numeric

_p(value: numbers.Number | jpt.base.intervals.NumberSet) → numbers.Real

p(labels: numbers.Number | jpt.base.intervals.NumberSet | List[float]) → numbers.Real

kl_divergence(other: Numeric) → numbers.Real

copy()

static merge(distributions: Iterable[Numeric], weights: Iterable[numbers.Real]) → Numeric

update(dist: Numeric, weight: float) → Numeric

crop(restriction: jpt.base.intervals.NumberSet | numbers.Number) → Numeric

_crop(restriction: jpt.base.intervals.NumberSet | numbers.Number) → Numeric

Apply a restriction to this distribution. The restricted distrubtion will only assign mass to the given range and will preserve the relativity of the pdf.

Parameters:: restriction (float or int or ContinuousSet) – The range to limit this distribution (or singular value)

classmethod type_to_json()

inst_to_json()

static from_json(data)

classmethod type_from_json(data: Dict[str, Any])

insert_convex_fragments(left: jpt.base.intervals.ContinuousSet | None, right: jpt.base.intervals.ContinuousSet | None, number_of_samples: int)

Insert fragments of distributions on the right and left part of this distribution. This should only be used to create a convex hull around the JPTs domain which density is never 0.

Parameters:

right – The right (lower) interval to add on if needed and None else
left – The left (upper) interval to add on if needed and None else
number_of_samples – The number of samples to use as basis for the weight

classmethod cumsum(distributions: Iterable[Numeric], error_max: float = np.inf, n_segments: int = None) → Iterable[Numeric]

Generator yielding the distributions that correspond to the cumulative sums of the passed distributions.

Parameters:

distributions –
error_max –
n_segments –

Returns:

moment(order: int, center: float) → float

_moment(order: int, center: float, value_transform: Callable | None = None) → float

Calculate the central moment of the r-th order almost everywhere.

\[\int (x - c)^{r} p(x)\]

cf. https://en.wikipedia.org/wiki/Central_moment: https://gregorygundersen.com/blog/2020/04/11/moments/

Parameters:

order – The order of the moment to calculate
center – The constant to subtract in the basis of the exponent If center is 0, the result corresponds to the order-th raw moment. If center is set to the distributions mean (ie its expectation, or self._moment(1, 0)) the result is the central moment of the distribution.

__add__(other: Numeric) → Numeric

__sub__(other: Numeric) → Numeric

approximate(error_max: float = None, n_segments: int = None) → Numeric

static wasserstein_distance(d1: Numeric, d2: Numeric) → float

distance(other: Numeric) → float

static jaccard_similarity(d1: Numeric, d2: Numeric) → float

similarity(other: Numeric) → float

entropy() → float

plot(engine=None, **kwargs) → Any

Plots the distribution using the given engine.

Parameters:

engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.
kwargs – The keyword arguments to pass to the engine as defined in the .plot_numeric() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

class jpt.distributions.ScaledNumeric(**settings)

Bases: Numeric

Scaled numeric distribution represented by mean and variance.

classmethod type_to_json()

to_json

static type_from_json(data)

classmethod from_json(data)

jpt.distributions.SymbolicType(name: str, labels: Iterable[Any]) → Type[Multinomial]

jpt.distributions.NumericType(name: str, values: Iterable[float] = None) → Type[Numeric]

class jpt.distributions.Integer(**settings)

Bases: jpt.distributions.univariate.Distribution

Abstract supertype of all domains and distributions

values: IntegerLabelToValueMap | None

labels

OPEN_DOMAIN = 'open_domain'

AUTO_DOMAIN = 'auto_domain'

SETTINGS

min() → int | None

max() → int | None

_min() → int | None

_max() → int | None

_params: Dict[int, float] | None = None

to_json: types.FunctionType

classmethod hash()

__add__(other: Integer) → Integer

__neg__() → Integer

property cdf: jpt.base.functions.PiecewiseFunction

add(other: Integer, name: str | None = None) → Integer

classmethod equiv(other: Type[jpt.distributions.univariate.Distribution]) → bool

classmethod type_to_json() → Dict[str, Any]

inst_to_json() → Dict[str, Any]

static type_from_json(data)

classmethod from_json(data: Dict[str, Any]) → Integer

copy()

property probabilities: Dict[int, float]

n_values() → int | None

classmethod value2label(value: int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet) → int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet

classmethod label2value(label: int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet) → int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet

_sample(n: int) → Iterable[int]

_sample_one() → int

sample(n: int) → Iterable[int]

sample_one() → int

property _pdf: types.FunctionType

property pdf: types.FunctionType

p(labels: int | Iterable[int]) → float

_p(values: int | Iterable[int]) → float

expectation() → float

_expectation() → float

variance() → float

_variance() → float

_k_mpe(k: int | None = None) → Iterable[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:: k – The number of solutions to generate
Returns:: An list containing a tuple containing the likelihood and state in descending order.

k_mpe(k: int = None) → Iterable[Tuple[jpt.base.intervals.NumberSet, float]]

mpe()

_mpe()

mode()

_mode()

crop(restriction: jpt.base.intervals.NumberSet | int) → Integer

_crop(restriction: jpt.base.intervals.NumberSet | int) → Integer

static merge(distributions: Iterable[Integer], weights: Iterable[numbers.Real]) → Integer

update(dist: Integer, weight: int) → Integer

fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) → Integer

_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) → Integer

_set(params: Dict[int, float] or Iterable[float]) → Integer

set(params: Dict[int, float] or Iterable[float]) → Integer

__eq__(other) → bool

__str__()

__repr__()

infinite() → bool

finite() → bool

_sorted(exhaustive: bool = True, reverse: bool = False, max_items: int = None) → Iterable[Tuple[int, float]]

sorted(exhaustive: bool = True, reverse: bool = False, max_items: int = None) → Iterable[Tuple[int, float]]

_items(exhaustive: bool = False, max_items: int = None) → Iterable[Tuple[int, float]]: Return a list of (probability, value) pairs representing this distribution.

items(exhaustive: bool = True, max_items: int = None) → Iterable[Tuple[int, float]]: Return a list of (probability, label) pairs representing this distribution.

kl_divergence(other: Integer) → float

number_of_parameters() → int

moment(order: int = 1, center: float = 0) → float

Calculate the central moment of the r-th order almost everywhere.

\[\int (x-c)^{r} p(x)\]

Parameters:

order – The order of the moment to calculate
center – The constant to subtract in the basis of the exponent

static wasserstein_distance(d1: Integer, d2: Integer) → float

distance(other: Integer) → float

static jaccard_similarity(d1: Integer, d2: Integer) → float

similarity(other: Integer) → float

plot(engine=None, **kwargs) → Any

Plots the distribution using the given engine.

Parameters:

engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.
kwargs – The keyword arguments to pass to the engine as defined in the .plot_integer() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

jpt.distributions.IntegerType(name: str, lmin: int | None = None, lmax: int | None = None) → Type[Integer]

class jpt.distributions.Gaussian(mean=None, cov=None, data=None, weights=None)

Bases: dnutils.stats.Gaussian

Extension of dnutils.stats.Gaussian

Creates a new Gaussian distribution.

Parameters:

mean (float if multivariate else [float] if multivariate) – the mean of the Gaussian
cov (float if multivariate else [[float]] if multivariate) – the covariance of the Gaussian
data ([[float]]) – if mean and cov are not provided, data may be a data set (matrix) from which the parameters of the distribution are estimated.
weights ([float]) – [optional] weights for the data points. The weight do not need to be normalized.

PRECISION = 1e-15

_cl = 'jpt.distributions.univariate.gaussian.Gaussian'

_sum_w = 0

_sum_w_sq = 0

_mean

_cov

data = []

mean()

cov()

var()

property std

deviation(x)

Computes the deviation of x in multiples of the standard deviation.

Parameters:: x –
Returns:

__add__(alpha)

__radd__(other)

__iadd__(other)

__mul__(alpha)

__rmul__(other)

__imul__(other)

static wasserstein_distance(d1: Gaussian, d2: Gaussian) → float

dim()

sample(n)

Return n samples from this Gaussian distribution.

Parameters:: n – number of samples
Returns:: array of shape (n,) for 1-D or (n, d) for d-dimensional Gaussians

property pdf

cdf(*x)

eval(lower, upper)

copy()

__eq__(other)

linreg()

Compute a 4-tuple <m, b, rss, noise> of a linear regression represented by this Gaussian.

Returns:: m - the slope of the line b - the intercept of the line rss - the residual sum-of-squares error noise - the square of the sample correlation coefficient r^2

References:

update_all(data, weights=None): Update the distribution with new data points given in data.

estimate(data, weights=None): Estimate the distribution parameters with subject to the given data points.

update(x, w=1): update the Gaussian distribution with a new data point x and weight w.

retract(x, w=1)

Retract the data point x with weight w from the Gaussian distribution.

In case the data points are being kept in the distribution, it must actually exist and have the right weight associated. Otherwise, a ValueError will be raised.

sym()

plot(engine=None, **kwargs) → Any

Plots the distribution using the given engine.

Parameters:

engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.
kwargs – The keyword arguments to pass to the engine as defined in the .plot_gaussian() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

class jpt.distributions.Bool(**settings)

Bases: Multinomial

Wrapper class for Boolean domains and distributions.

values

labels

set(params: numpy.ndarray | float) → Bool

__setitem__(v, p)

class jpt.distributions.Distribution(**settings)

Abstract supertype of all domains and distributions

values: ValueMap = None

labels: ValueMap = None

SETTINGS

_cl = 'jpt.distributions.univariate.distribution.Distribution'

settings

__getattr__(name)

classmethod hash()

Abstractmethod:

__hash__()

__getitem__(value)

classmethod value2label(value)

Abstractmethod:

classmethod label2value(label)

Abstractmethod:

abstract _sample(n: int) → Iterable

abstract _sample_one()

sample(n: int) → Iterable

sample_one() → Any

abstract p(value) → float

abstract _p(value) → float

abstract mpe()

abstract crop(restriction: Set) → Distribution

abstract _crop(restriction: Set) → Distribution

abstract entropy() → float

static merge(distributions: Iterable[Distribution], weights: Iterable[numbers.Real]) → Distribution

Abstractmethod:

abstract update(dist: Distribution, weight: float) → Distribution

abstract fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) → Distribution

abstract _fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) → Distribution

abstract set(params: Any) → Distribution

abstract kl_divergence(other: Distribution)

abstract number_of_parameters() → int

static jaccard_similarity(d1: Distribution, d2: Distribution) → float

Abstractmethod:

abstract plot(engine: str, title: str = None, fname: str = None, directory: str = '/tmp', view: bool = False, **kwargs) → Any

Generates a plot of the distribution.

Parameters:

title – the name of the variable this distribution represents
fname – the name of the file to be stored. Available file formats: png, svg, jpeg, webp, html
directory – the directory to store the generated plot files
view – whether to display generated plots, default False (only stores files)

Returns:

the figure object of the plotting engine

abstract to_json()

__reduce__()

static type_from_json(data: Dict[str, Any]) → Type[Distribution]

static from_json(dtype: Dict[str, Any], dinst: Dict[str, Any] = None) → Distribution | Type[Distribution]