jpt.distributions

Submodules

Classes

Multinomial

Abstract supertype of all symbolic domains and distributions.

Numeric

Wrapper class for numeric domains and distributions.

ScaledNumeric

Scaled numeric distribution represented by mean and variance.

Integer

Abstract supertype of all domains and distributions

Gaussian

Extension of dnutils.stats.Gaussian

Bool

Wrapper class for Boolean domains and distributions.

Distribution

Abstract supertype of all domains and distributions

Functions

SymbolicType(→ Type[Multinomial])

NumericType(→ Type[Numeric])

IntegerType(→ Type[Integer])

Package Contents

class jpt.distributions.Multinomial(**settings)

Bases: jpt.distributions.univariate.Distribution

Abstract supertype of all symbolic domains and distributions.

values: MultinomialValueMap = None
labels: MultinomialValueMap = None
_params: numpy.ndarray | None = None
to_json: types.MethodType
classmethod hash()
classmethod value2label(value: int | Iterable[int]) jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]
classmethod label2value(label: jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]) int | Collection[int]
classmethod pfmt(max_values=10, labels_or_values='labels') str

Returns a pretty-formatted string representation of this class.

By default, a set notation with value labels is used. By setting labels_or_values to "values", the internal value representation is used. If the domain comprises more than max_values values, the middle part of the list of values is abbreviated by “…”.

property probabilities
n_values() int
__contains__(item)
classmethod equiv(other)
static jaccard_similarity(*d: Multinomial) float

Calculate the similarity of two or more Multinomial distributions.

\[\text{sim}(D_1, \ldots, D_n) = \frac{\sum_{x \in \text{dom}(D)} \min(p_i(x))} {\sum_{x \in \text{dom}(D)} \max(p_i(x))}\]

Adapted from the Jaccard coefficient:

\[\text{sim}(S_1, \ldots, S_n) = \frac{|\bigcap_{i}^{n} S_i|}{|\bigcup_{i}^{n} S_i|}\]
mover_dist(other: Multinomial) float
similarity(other: Multinomial) float
distance(other: Multinomial) float
__getitem__(value)
__setitem__(label, p)
__eq__(other)
__str__()
__repr__()
sorted() Iterable[Tuple[float, jpt.base.utils.Symbol]]

Generate a sequence of (label, prob) pairs representing this distribution, ordered by descending probability. :return:

_items() Iterable[Tuple[float, int]]

Generate a sequence of (probability, value) pairs representing this distribution.

items() Iterable[Tuple[float, jpt.base.utils.Symbol]]

Generate a sequence of (probability, label) pairs representing this distribution.

copy()
_pdf(value: int) float
pdf(label: jpt.base.utils.Symbol) float
p(event: jpt.base.utils.Symbol | Set[jpt.base.utils.Symbol] | List[jpt.base.utils.Symbol] | Tuple[jpt.base.utils.Symbol] | numpy.ndarray) float

Compute the probability of a certain event given this multinomial distribution.

An event can be atomic random event, or a disjunction thereof, e.g. given the domain values {‘Head’, ‘Tail’}, event may be

dist.p(‘Head’) dist.p({‘Tail’}) dist.p({‘Head’, ‘Tail’})

Parameters:

event – the event in label space, the prob’ of which is to be computed.

Returns:

the probability of the event

_p(event: int | Set[int] | List[int] | Tuple[int] | numpy.ndarray) float

Compute the probability of a certain event given this multinomial distribution.

See also Multinomial.p()

Parameters:

event – the event int value space, the prob’ of which is to be computed.

Returns:

the probability of the event

create_dirac_impulse(value: int) Multinomial

Create a singular modification of this distribution object, in which the value has probability 1, whereas all other events have prob 0.

Parameters:

value – the singular value to get assigned prob 1.

Returns:

the created distribution object

_sample(n: int) Iterable[int]

Returns n sample values according to their respective probability

_sample_one() jpt.base.utils.Symbol

Returns one sample value according to its probability

_expectation() Set[int]

Returns the value with the highest probability for this variable

expectation() Set[jpt.base.utils.Symbol]

For symbolic variables the expectation is equal to the mpe. :return: The set of all most likely values

mpe() Tuple[Set[jpt.base.utils.Symbol], float]
_mpe() Tuple[Set[int], float]

Calculate the most probable configuration of this distribution in value space.

Returns:

The likelihood of the mpe itself as Set and the likelihood of the mpe as float

_k_mpe(k: int = None) List[Tuple[Set[jpt.base.utils.Symbol], float]]
k_mpe(k: int | None = None) List[Tuple[Set[jpt.base.utils.Symbol], float]]
mode() Set
_mode() Set
kl_divergence(other: Multinomial) float

Compute the KL-divergence of this distribution to the other distribution. :param other: :return:

_crop(restriction: int | Collection[int]) Multinomial
crop(restriction: jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]) Multinomial

Apply a restriction to this distribution such that all values are in the given set.

Parameters:

restriction – The values to remain

Returns:

Copy of self that is consistent with the restriction

_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) Multinomial
set(params: Iterable[numbers.Real]) Multinomial
update(dist: Multinomial, weight: float) Multinomial

Update this multinomial distribution with dist and weight.

The resulting distribution will be a weighted mean of self and dist, where self will have a weight of (1-weight), and dist will have a weight of weight.

Parameters:
  • dist – the update distribution

  • weight – the weight

Returns:

static merge(distributions: Iterable[Multinomial], weights: Iterable[float]) Multinomial

Merge the distributions under consideration of weights.

Parameters:
  • distributions

  • weights

Returns:

classmethod type_to_json()
inst_to_json()
static type_from_json(data)
classmethod from_json(data)
is_dirac_impulse()
number_of_parameters() int
Returns:

The number of relevant parameters in this decision node. 1 if this is a dirac impulse, number of parameters else

plot(engine=None, **kwargs) Any

Plots the distribution using the given engine.

Parameters:
  • engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.

  • kwargs – The keyword arguments to pass to the engine as defined in the .plot_multinomial() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

class jpt.distributions.Numeric(**settings)

Bases: jpt.distributions.univariate.Distribution

Wrapper class for numeric domains and distributions.

PRECISION = 'precision'
values
labels
SETTINGS
_quantile: jpt.distributions.qpd.QuantileDistribution = None
to_json
classmethod hash()
__str__()
__getitem__(value)
__eq__(o: Numeric)
classmethod value2label(value: float | jpt.base.intervals.NumberSet) float | jpt.base.intervals.NumberSet
classmethod label2value(label: numbers.Real | jpt.base.intervals.NumberSet) numbers.Real | jpt.base.intervals.NumberSet
classmethod equiv(other)
property cdf
property pdf
property ppf
approximate_fast(eps: float)
_sample(n)
_sample_one()
number_of_parameters() int
Returns:

The number of relevant parameters in this decision node. 1 if this is a dirac impulse, number of intervals times two else

_expectation() float
_variance() float
expectation() float
variance() float
quantile(gamma: numbers.Real) numbers.Real
create_dirac_impulse(value)

Create a dirac impulse at the given value aus quantile distribution.

is_dirac_impulse() bool

Checks if this distribution is a dirac impulse.

mpe()
_mpe(value_transform: Callable | None = None)

Calculate the most probable configuration of this quantile distribution.

Returns:

The mpe itself as UnionSet and the likelihood of the mpe as float

_k_mpe(k: int | None = None) List[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:

k – The number of solutions to generate, defaults to the maximum possible number.

Returns:

A list containing a tuple containing the likelihood and state in descending order.

k_mpe(k: int | None = None) List[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:

k – The number of solutions to generate, defaults to the maximum possible number.

Returns:

A list containing a tuple containing the likelihood and state in descending order.

_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) Numeric
fit
set(params: jpt.distributions.qpd.QuantileDistribution) Numeric
_p(value: numbers.Number | jpt.base.intervals.NumberSet) numbers.Real
p(labels: numbers.Number | jpt.base.intervals.NumberSet | List[float]) numbers.Real
kl_divergence(other: Numeric) numbers.Real
copy()
static merge(distributions: Iterable[Numeric], weights: Iterable[numbers.Real]) Numeric
update(dist: Numeric, weight: float) Numeric
crop(restriction: jpt.base.intervals.NumberSet | numbers.Number) Numeric
_crop(restriction: jpt.base.intervals.NumberSet | numbers.Number) Numeric

Apply a restriction to this distribution. The restricted distrubtion will only assign mass to the given range and will preserve the relativity of the pdf.

Parameters:

restriction (float or int or ContinuousSet) – The range to limit this distribution (or singular value)

classmethod type_to_json()
inst_to_json()
static from_json(data)
classmethod type_from_json(data: Dict[str, Any])
insert_convex_fragments(left: jpt.base.intervals.ContinuousSet | None, right: jpt.base.intervals.ContinuousSet | None, number_of_samples: int)

Insert fragments of distributions on the right and left part of this distribution. This should only be used to create a convex hull around the JPTs domain which density is never 0.

Parameters:
  • right – The right (lower) interval to add on if needed and None else

  • left – The left (upper) interval to add on if needed and None else

  • number_of_samples – The number of samples to use as basis for the weight

classmethod cumsum(distributions: Iterable[Numeric], error_max: float = np.inf, n_segments: int = None) Iterable[Numeric]

Generator yielding the distributions that correspond to the cumulative sums of the passed distributions.

Parameters:
  • distributions

  • error_max

  • n_segments

Returns:

moment(order: int, center: float) float
_moment(order: int, center: float, value_transform: Callable | None = None) float

Calculate the central moment of the r-th order almost everywhere.

\[\int (x - c)^{r} p(x)\]
cf. https://en.wikipedia.org/wiki/Central_moment

https://gregorygundersen.com/blog/2020/04/11/moments/

Parameters:
  • order – The order of the moment to calculate

  • center – The constant to subtract in the basis of the exponent If center is 0, the result corresponds to the order-th raw moment. If center is set to the distributions mean (ie its expectation, or self._moment(1, 0)) the result is the central moment of the distribution.

__add__(other: Numeric) Numeric
__sub__(other: Numeric) Numeric
approximate(error_max: float = None, n_segments: int = None) Numeric
static wasserstein_distance(d1: Numeric, d2: Numeric) float
distance(other: Numeric) float
static jaccard_similarity(d1: Numeric, d2: Numeric) float
similarity(other: Numeric) float
entropy() float
plot(engine=None, **kwargs) Any

Plots the distribution using the given engine.

Parameters:
  • engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.

  • kwargs – The keyword arguments to pass to the engine as defined in the .plot_numeric() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

class jpt.distributions.ScaledNumeric(**settings)

Bases: Numeric

Scaled numeric distribution represented by mean and variance.

classmethod type_to_json()
to_json
static type_from_json(data)
classmethod from_json(data)
jpt.distributions.SymbolicType(name: str, labels: Iterable[Any]) Type[Multinomial]
jpt.distributions.NumericType(name: str, values: Iterable[float] = None) Type[Numeric]
class jpt.distributions.Integer(**settings)

Bases: jpt.distributions.univariate.Distribution

Abstract supertype of all domains and distributions

values: IntegerLabelToValueMap | None
labels
OPEN_DOMAIN = 'open_domain'
AUTO_DOMAIN = 'auto_domain'
SETTINGS
min() int | None
max() int | None
_min() int | None
_max() int | None
_params: Dict[int, float] | None = None
to_json: types.FunctionType
classmethod hash()
__add__(other: Integer) Integer
__neg__() Integer
property cdf: jpt.base.functions.PiecewiseFunction
add(other: Integer, name: str | None = None) Integer
classmethod equiv(other: Type[jpt.distributions.univariate.Distribution]) bool
classmethod type_to_json() Dict[str, Any]
inst_to_json() Dict[str, Any]
static type_from_json(data)
classmethod from_json(data: Dict[str, Any]) Integer
copy()
property probabilities: Dict[int, float]
n_values() int | None
classmethod value2label(value: int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet) int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet
classmethod label2value(label: int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet) int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet
_sample(n: int) Iterable[int]
_sample_one() int
sample(n: int) Iterable[int]
sample_one() int
property _pdf: types.FunctionType
property pdf: types.FunctionType
p(labels: int | Iterable[int]) float
_p(values: int | Iterable[int]) float
expectation() float
_expectation() float
variance() float
_variance() float
_k_mpe(k: int | None = None) Iterable[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:

k – The number of solutions to generate

Returns:

An list containing a tuple containing the likelihood and state in descending order.

k_mpe(k: int = None) Iterable[Tuple[jpt.base.intervals.NumberSet, float]]
mpe()
_mpe()
mode()
_mode()
crop(restriction: jpt.base.intervals.NumberSet | int) Integer
_crop(restriction: jpt.base.intervals.NumberSet | int) Integer
static merge(distributions: Iterable[Integer], weights: Iterable[numbers.Real]) Integer
update(dist: Integer, weight: int) Integer
fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) Integer
_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) Integer
_set(params: Dict[int, float] or Iterable[float]) Integer
set(params: Dict[int, float] or Iterable[float]) Integer
__eq__(other) bool
__str__()
__repr__()
infinite() bool
finite() bool
_sorted(exhaustive: bool = True, reverse: bool = False, max_items: int = None) Iterable[Tuple[int, float]]
sorted(exhaustive: bool = True, reverse: bool = False, max_items: int = None) Iterable[Tuple[int, float]]
_items(exhaustive: bool = False, max_items: int = None) Iterable[Tuple[int, float]]

Return a list of (probability, value) pairs representing this distribution.

items(exhaustive: bool = True, max_items: int = None) Iterable[Tuple[int, float]]

Return a list of (probability, label) pairs representing this distribution.

kl_divergence(other: Integer) float
number_of_parameters() int
moment(order: int = 1, center: float = 0) float

Calculate the central moment of the r-th order almost everywhere.

\[\int (x-c)^{r} p(x)\]
Parameters:
  • order – The order of the moment to calculate

  • center – The constant to subtract in the basis of the exponent

static wasserstein_distance(d1: Integer, d2: Integer) float
distance(other: Integer) float
static jaccard_similarity(d1: Integer, d2: Integer) float
similarity(other: Integer) float
plot(engine=None, **kwargs) Any

Plots the distribution using the given engine.

Parameters:
  • engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.

  • kwargs – The keyword arguments to pass to the engine as defined in the .plot_integer() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

jpt.distributions.IntegerType(name: str, lmin: int | None = None, lmax: int | None = None) Type[Integer]
class jpt.distributions.Gaussian(mean=None, cov=None, data=None, weights=None)

Bases: dnutils.stats.Gaussian

Extension of dnutils.stats.Gaussian

Creates a new Gaussian distribution.

Parameters:
  • mean (float if multivariate else [float] if multivariate) – the mean of the Gaussian

  • cov (float if multivariate else [[float]] if multivariate) – the covariance of the Gaussian

  • data ([[float]]) – if mean and cov are not provided, data may be a data set (matrix) from which the parameters of the distribution are estimated.

  • weights ([float]) – [optional] weights for the data points. The weight do not need to be normalized.

PRECISION = 1e-15
_cl = 'jpt.distributions.univariate.gaussian.Gaussian'
_sum_w = 0
_sum_w_sq = 0
_mean
_cov
data = []
mean()
cov()
var()
property std
deviation(x)

Computes the deviation of x in multiples of the standard deviation.

Parameters:

x

Returns:

__add__(alpha)
__radd__(other)
__iadd__(other)
__mul__(alpha)
__rmul__(other)
__imul__(other)
static wasserstein_distance(d1: Gaussian, d2: Gaussian) float
dim()
sample(n)

Return n samples from this Gaussian distribution.

Parameters:

n – number of samples

Returns:

array of shape (n,) for 1-D or (n, d) for d-dimensional Gaussians

property pdf
cdf(*x)
eval(lower, upper)
copy()
__eq__(other)
linreg()

Compute a 4-tuple <m, b, rss, noise> of a linear regression represented by this Gaussian.

Returns:

m - the slope of the line b - the intercept of the line rss - the residual sum-of-squares error noise - the square of the sample correlation coefficient r^2

References:
update_all(data, weights=None)

Update the distribution with new data points given in data.

estimate(data, weights=None)

Estimate the distribution parameters with subject to the given data points.

update(x, w=1)

update the Gaussian distribution with a new data point x and weight w.

retract(x, w=1)

Retract the data point x with weight w from the Gaussian distribution.

In case the data points are being kept in the distribution, it must actually exist and have the right weight associated. Otherwise, a ValueError will be raised.

sym()
plot(engine=None, **kwargs) Any

Plots the distribution using the given engine.

Parameters:
  • engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.

  • kwargs – The keyword arguments to pass to the engine as defined in the .plot_gaussian() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

class jpt.distributions.Bool(**settings)

Bases: Multinomial

Wrapper class for Boolean domains and distributions.

values
labels
set(params: numpy.ndarray | float) Bool
__setitem__(v, p)
class jpt.distributions.Distribution(**settings)

Abstract supertype of all domains and distributions

values: ValueMap = None
labels: ValueMap = None
SETTINGS
_cl = 'jpt.distributions.univariate.distribution.Distribution'
settings
__getattr__(name)
classmethod hash()
Abstractmethod:

__hash__()
__getitem__(value)
classmethod value2label(value)
Abstractmethod:

classmethod label2value(label)
Abstractmethod:

abstract _sample(n: int) Iterable
abstract _sample_one()
sample(n: int) Iterable
sample_one() Any
abstract p(value) float
abstract _p(value) float
abstract mpe()
abstract crop(restriction: Set) Distribution
abstract _crop(restriction: Set) Distribution
abstract entropy() float
static merge(distributions: Iterable[Distribution], weights: Iterable[numbers.Real]) Distribution
Abstractmethod:

abstract update(dist: Distribution, weight: float) Distribution
abstract fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) Distribution
abstract _fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) Distribution
abstract set(params: Any) Distribution
abstract kl_divergence(other: Distribution)
abstract number_of_parameters() int
static jaccard_similarity(d1: Distribution, d2: Distribution) float
Abstractmethod:

abstract plot(engine: str, title: str = None, fname: str = None, directory: str = '/tmp', view: bool = False, **kwargs) Any

Generates a plot of the distribution.

Parameters:
  • title – the name of the variable this distribution represents

  • fname – the name of the file to be stored. Available file formats: png, svg, jpeg, webp, html

  • directory – the directory to store the generated plot files

  • view – whether to display generated plots, default False (only stores files)

Returns:

the figure object of the plotting engine

abstract to_json()
__reduce__()
static type_from_json(data: Dict[str, Any]) Type[Distribution]
static from_json(dtype: Dict[str, Any], dinst: Dict[str, Any] = None) Distribution | Type[Distribution]