jpt.distributions.univariate

Submodules

Classes

`Distribution`	Abstract supertype of all domains and distributions
`Numeric`	Wrapper class for numeric domains and distributions.
`ScaledNumeric`	Scaled numeric distribution represented by mean and variance.
`Integer`	Abstract supertype of all domains and distributions
`Multinomial`	Abstract supertype of all symbolic domains and distributions.
`Bool`	Wrapper class for Boolean domains and distributions.
`Gaussian`	Extension of `dnutils.stats.Gaussian`

Functions

`NumericType`(→ Type[Numeric])
`IntegerType`(→ Type[Integer])
`SymbolicType`(→ Type[Multinomial])

Package Contents

class jpt.distributions.univariate.Distribution(**settings)

Abstract supertype of all domains and distributions

values: ValueMap = None

labels: ValueMap = None

SETTINGS

_cl = 'jpt.distributions.univariate.distribution.Distribution'

settings

__getattr__(name)

classmethod hash()

Abstractmethod:

__hash__()

__getitem__(value)

classmethod value2label(value)

Abstractmethod:

classmethod label2value(label)

Abstractmethod:

abstract _sample(n: int) → Iterable

abstract _sample_one()

sample(n: int) → Iterable

sample_one() → Any

abstract p(value) → float

abstract _p(value) → float

abstract mpe()

abstract crop(restriction: Set) → Distribution

abstract _crop(restriction: Set) → Distribution

abstract entropy() → float

static merge(distributions: Iterable[Distribution], weights: Iterable[numbers.Real]) → Distribution

Abstractmethod:

abstract update(dist: Distribution, weight: float) → Distribution

abstract fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) → Distribution

abstract _fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) → Distribution

abstract set(params: Any) → Distribution

abstract kl_divergence(other: Distribution)

abstract number_of_parameters() → int

static jaccard_similarity(d1: Distribution, d2: Distribution) → float

Abstractmethod:

abstract plot(engine: str, title: str = None, fname: str = None, directory: str = '/tmp', view: bool = False, **kwargs) → Any

Generates a plot of the distribution.

Parameters:

title – the name of the variable this distribution represents
fname – the name of the file to be stored. Available file formats: png, svg, jpeg, webp, html
directory – the directory to store the generated plot files
view – whether to display generated plots, default False (only stores files)

Returns:

the figure object of the plotting engine

abstract to_json()

__reduce__()

static type_from_json(data: Dict[str, Any]) → Type[Distribution]

static from_json(dtype: Dict[str, Any], dinst: Dict[str, Any] = None) → Distribution | Type[Distribution]

class jpt.distributions.univariate.Numeric(**settings)

Bases: jpt.distributions.univariate.Distribution

Wrapper class for numeric domains and distributions.

PRECISION = 'precision'

values

labels

SETTINGS

_quantile: jpt.distributions.qpd.QuantileDistribution = None

to_json

classmethod hash()

__str__()

__getitem__(value)

__eq__(o: Numeric)

classmethod value2label(value: float | jpt.base.intervals.NumberSet) → float | jpt.base.intervals.NumberSet

classmethod label2value(label: numbers.Real | jpt.base.intervals.NumberSet) → numbers.Real | jpt.base.intervals.NumberSet

classmethod equiv(other)

property cdf

property pdf

property ppf

approximate_fast(eps: float)

_sample(n)

_sample_one()

number_of_parameters() → int

Returns:: The number of relevant parameters in this decision node. 1 if this is a dirac impulse, number of intervals times two else

_expectation() → float

_variance() → float

expectation() → float

variance() → float

quantile(gamma: numbers.Real) → numbers.Real

create_dirac_impulse(value): Create a dirac impulse at the given value aus quantile distribution.

is_dirac_impulse() → bool: Checks if this distribution is a dirac impulse.

mpe()

_mpe(value_transform: Callable | None = None)

Calculate the most probable configuration of this quantile distribution.

Returns:: The mpe itself as UnionSet and the likelihood of the mpe as float

_k_mpe(k: int | None = None) → List[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:: k – The number of solutions to generate, defaults to the maximum possible number.
Returns:: A list containing a tuple containing the likelihood and state in descending order.

k_mpe(k: int | None = None) → List[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:: k – The number of solutions to generate, defaults to the maximum possible number.
Returns:: A list containing a tuple containing the likelihood and state in descending order.

_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: numbers.Integral = None) → Numeric

fit

set(params: jpt.distributions.qpd.QuantileDistribution) → Numeric

_p(value: numbers.Number | jpt.base.intervals.NumberSet) → numbers.Real

p(labels: numbers.Number | jpt.base.intervals.NumberSet | List[float]) → numbers.Real

kl_divergence(other: Numeric) → numbers.Real

copy()

static merge(distributions: Iterable[Numeric], weights: Iterable[numbers.Real]) → Numeric

update(dist: Numeric, weight: float) → Numeric

crop(restriction: jpt.base.intervals.NumberSet | numbers.Number) → Numeric

_crop(restriction: jpt.base.intervals.NumberSet | numbers.Number) → Numeric

Apply a restriction to this distribution. The restricted distrubtion will only assign mass to the given range and will preserve the relativity of the pdf.

Parameters:: restriction (float or int or ContinuousSet) – The range to limit this distribution (or singular value)

classmethod type_to_json()

inst_to_json()

static from_json(data)

classmethod type_from_json(data: Dict[str, Any])

insert_convex_fragments(left: jpt.base.intervals.ContinuousSet | None, right: jpt.base.intervals.ContinuousSet | None, number_of_samples: int)

Insert fragments of distributions on the right and left part of this distribution. This should only be used to create a convex hull around the JPTs domain which density is never 0.

Parameters:

right – The right (lower) interval to add on if needed and None else
left – The left (upper) interval to add on if needed and None else
number_of_samples – The number of samples to use as basis for the weight

classmethod cumsum(distributions: Iterable[Numeric], error_max: float = np.inf, n_segments: int = None) → Iterable[Numeric]

Generator yielding the distributions that correspond to the cumulative sums of the passed distributions.

Parameters:

distributions –
error_max –
n_segments –

Returns:

moment(order: int, center: float) → float

_moment(order: int, center: float, value_transform: Callable | None = None) → float

Calculate the central moment of the r-th order almost everywhere.

\[\int (x - c)^{r} p(x)\]

cf. https://en.wikipedia.org/wiki/Central_moment: https://gregorygundersen.com/blog/2020/04/11/moments/

Parameters:

order – The order of the moment to calculate
center – The constant to subtract in the basis of the exponent If center is 0, the result corresponds to the order-th raw moment. If center is set to the distributions mean (ie its expectation, or self._moment(1, 0)) the result is the central moment of the distribution.

__add__(other: Numeric) → Numeric

__sub__(other: Numeric) → Numeric

approximate(error_max: float = None, n_segments: int = None) → Numeric

static wasserstein_distance(d1: Numeric, d2: Numeric) → float

distance(other: Numeric) → float

static jaccard_similarity(d1: Numeric, d2: Numeric) → float

similarity(other: Numeric) → float

entropy() → float

plot(engine=None, **kwargs) → Any

Plots the distribution using the given engine.

Parameters:

engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.
kwargs – The keyword arguments to pass to the engine as defined in the .plot_numeric() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

class jpt.distributions.univariate.ScaledNumeric(**settings)

Bases: Numeric

Scaled numeric distribution represented by mean and variance.

classmethod type_to_json()

to_json

static type_from_json(data)

classmethod from_json(data)

jpt.distributions.univariate.NumericType(name: str, values: Iterable[float] = None) → Type[Numeric]

class jpt.distributions.univariate.Integer(**settings)

Bases: jpt.distributions.univariate.Distribution

Abstract supertype of all domains and distributions

values: IntegerLabelToValueMap | None

labels

OPEN_DOMAIN = 'open_domain'

AUTO_DOMAIN = 'auto_domain'

SETTINGS

min() → int | None

max() → int | None

_min() → int | None

_max() → int | None

_params: Dict[int, float] | None = None

to_json: types.FunctionType

classmethod hash()

__add__(other: Integer) → Integer

__neg__() → Integer

property cdf: jpt.base.functions.PiecewiseFunction

add(other: Integer, name: str | None = None) → Integer

classmethod equiv(other: Type[jpt.distributions.univariate.Distribution]) → bool

classmethod type_to_json() → Dict[str, Any]

inst_to_json() → Dict[str, Any]

static type_from_json(data)

classmethod from_json(data: Dict[str, Any]) → Integer

copy()

property probabilities: Dict[int, float]

n_values() → int | None

classmethod value2label(value: int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet) → int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet

classmethod label2value(label: int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet) → int | Iterable[int] | jpt.base.intervals.IntSet | jpt.base.intervals.UnionSet

_sample(n: int) → Iterable[int]

_sample_one() → int

sample(n: int) → Iterable[int]

sample_one() → int

property _pdf: types.FunctionType

property pdf: types.FunctionType

p(labels: int | Iterable[int]) → float

_p(values: int | Iterable[int]) → float

expectation() → float

_expectation() → float

variance() → float

_variance() → float

_k_mpe(k: int | None = None) → Iterable[Tuple[jpt.base.intervals.NumberSet, float]]

Calculate the k most probable explanation states.

Parameters:: k – The number of solutions to generate
Returns:: An list containing a tuple containing the likelihood and state in descending order.

k_mpe(k: int = None) → Iterable[Tuple[jpt.base.intervals.NumberSet, float]]

mpe()

_mpe()

mode()

_mode()

crop(restriction: jpt.base.intervals.NumberSet | int) → Integer

_crop(restriction: jpt.base.intervals.NumberSet | int) → Integer

static merge(distributions: Iterable[Integer], weights: Iterable[numbers.Real]) → Integer

update(dist: Integer, weight: int) → Integer

fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) → Integer

_fit(data: numpy.ndarray, rows: numpy.ndarray = None, col: int = None) → Integer

_set(params: Dict[int, float] or Iterable[float]) → Integer

set(params: Dict[int, float] or Iterable[float]) → Integer

__eq__(other) → bool

__str__()

__repr__()

infinite() → bool

finite() → bool

_sorted(exhaustive: bool = True, reverse: bool = False, max_items: int = None) → Iterable[Tuple[int, float]]

sorted(exhaustive: bool = True, reverse: bool = False, max_items: int = None) → Iterable[Tuple[int, float]]

_items(exhaustive: bool = False, max_items: int = None) → Iterable[Tuple[int, float]]: Return a list of (probability, value) pairs representing this distribution.

items(exhaustive: bool = True, max_items: int = None) → Iterable[Tuple[int, float]]: Return a list of (probability, label) pairs representing this distribution.

kl_divergence(other: Integer) → float

number_of_parameters() → int

moment(order: int = 1, center: float = 0) → float

Calculate the central moment of the r-th order almost everywhere.

\[\int (x-c)^{r} p(x)\]

Parameters:

order – The order of the moment to calculate
center – The constant to subtract in the basis of the exponent

static wasserstein_distance(d1: Integer, d2: Integer) → float

distance(other: Integer) → float

static jaccard_similarity(d1: Integer, d2: Integer) → float

similarity(other: Integer) → float

plot(engine=None, **kwargs) → Any

Plots the distribution using the given engine.

Parameters:

engine – Can be either one of ["plotly", "matplotlib"], or an instance of a rendering engine subclassing DistributionRendering.
kwargs – The keyword arguments to pass to the engine as defined in the .plot_integer() function of DistributionRendering or its respective subclass defined by engine.

Returns:

the figure object of the plotting engine

jpt.distributions.univariate.IntegerType(name: str, lmin: int | None = None, lmax: int | None = None) → Type[Integer]

class jpt.distributions.univariate.Multinomial(**settings)

Bases: jpt.distributions.univariate.Distribution

Abstract supertype of all symbolic domains and distributions.

values: MultinomialValueMap = None

labels: MultinomialValueMap = None

_params: numpy.ndarray | None = None

to_json: types.MethodType

classmethod hash()

classmethod value2label(value: int | Iterable[int]) → jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]

classmethod label2value(label: jpt.base.utils.Symbol | Collection[jpt.base.utils.Symbol]) → int | Collection[int]

classmethod pfmt(max_values=10, labels_or_values='labels') → str

Returns a pretty-formatted string representation of this class.

By default, a set notation with value labels is used. By setting labels_or_values to "values", the internal value representation is used. If the domain comprises more than max_values values, the middle part of the list of values is abbreviated by “…”.

property probabilities

n_values() → int

__contains__(item)

classmethod equiv(other)

static jaccard_similarity(*d: Multinomial) → float

Calculate the similarity of two or more Multinomial distributions.

\[\text{sim}(D_1, \ldots, D_n) = \frac{\sum_{x \in \text{dom}(D)} \min(p_i(x))} {\sum_{x \in \text{dom}(D)} \max(p_i(x))}\]

Adapted from the Jaccard coefficient:

\[\text{sim}(S_1, \ldots, S_n) = \frac{|\bigcap_{i}^{n} S_i|}{|\bigcup_{i}^{n} S_i|}\]

mover_dist(other: Multinomial) → float

similarity(other: Multinomial) → float

distance(other: Multinomial) → float

__getitem__(value)

__setitem__(label, p)

__eq__(other)

__str__()

__repr__()

sorted() → Iterable[Tuple[float, jpt.base.utils.Symbol]]: Generate a sequence of (label, prob) pairs representing this distribution, ordered by descending probability. :return:

_items() → Iterable[Tuple[float, int]]: Generate a sequence of (probability, value) pairs representing this distribution.

items() → Iterable[Tuple[float, jpt.base.utils.Symbol]]: Generate a sequence of (probability, label) pairs representing this distribution.

copy()

_pdf(value: int) → float

pdf(label: jpt.base.utils.Symbol) → float

p(event: jpt.base.utils.Symbol | Set[jpt.base.utils.Symbol] | List[jpt.base.utils.Symbol] | Tuple[jpt.base.utils.Symbol] | numpy.ndarray) → float

Compute the probability of a certain event given this multinomial distribution.

An event can be atomic random event, or a disjunction thereof, e.g. given the domain values {‘Head’, ‘Tail’}, event may be

dist.p(‘Head’) dist.p({‘Tail’}) dist.p({‘Head’, ‘Tail’})

Parameters:: event – the event in label space, the prob’ of which is to be computed.
Returns:: the probability of the event

_p(event: int | Set[int] | List[int] | Tuple[int] | numpy.ndarray) → float

Compute the probability of a certain event given this multinomial distribution.