Changelog

1.3.4

Patch release — impurity aggregator fix and example cleanup.

Bug Fixes

  • Fixed incorrect impurity averaging in compute_var_improvements (learning/impurity/impurity.pyx): the function now accepts a sentinel-terminated dependent_columns buffer and iterates only over active dependent targets, skipping targets with non-positive parent variance. Previously it used a skip_idx scalar that caused incorrect averaging when some modalities were inactive.

  • Removed the deprecated columns= keyword argument from examples/banana.py and examples/federal.py.

1.3.3

Patch release — bug fix in QuantileDistribution deserialisation.

Bug Fixes

  • QuantileDistribution.from_json no longer raises AssertionError when deserialising a CDF with a single function segment. _assert_consistency now raises ValueError with a descriptive message instead of relying on a bare assert statement, so callers can catch and handle the error as a ValueError.

1.3.2

Patch release — bug fix in evidence formatting for integer variables.

Bug Fixes

  • JPT.posterior() now correctly raises Unsatisfiability when unsatisfiable evidence contains a UnionSet for an integer variable. Previously, IntegerVariable.str() only handled IntSet and numbers.Number and raised TypeError on a UnionSet; that TypeError escaped from posterior() while it was building the Unsatisfiability message via format_path(evidence), masking the intended exception so callers could not catch and recover. IntegerVariable.str() now mirrors the UnionSet handling already present in NumericVariable.str(): it normalises IntSet inputs into a one-element UnionSet, joins intervals with for fmt='set' and with for fmt='logic', and raises ValueError (instead of TypeError) for an unknown fmt.

1.3.1

Patch release — CI fix only, no library changes.

Bug Fixes

  • Fixed the manylinux wheel build step on GitHub Actions: the RalfG/python-wheels-manylinux-build step now runs git config --global --add safe.directory /github/workspace before pip wheel so that version derivation through setuptools-scm / vcs-versioning is not blocked by Git’s “dubious ownership” safety check inside the bind-mounted container.

1.3.0

This release rewrites the numeric-distribution fit backend on top of a new greedy, L∞-optimal simplifier, deprecates the previous CDFRegressor, and adds two learning-time features for controlling generalisation: split validation and min_eval_samples.

New Features

Numeric distribution fit backend

  • Added jpt.distributions.qpd.vwcdfreg.VWCDFRegressor — a Cython implementation of a greedy bottom-up L∞-optimal piecewise-linear regressor for empirical CDFs, based on the Visvalingam-Whyatt line simplification algorithm adapted from 2D cartographic generalisation to the 1D CDF setting by replacing the triangular-area cost with the max absolute residual. QuantileDistribution.fit() now uses this backend internally. Every original data point is now guaranteed to lie within eps of the fitted piecewise-linear CDF (sup-norm bound), a strictly stronger guarantee than the previous approximate MSE-driven fit which only bounded error against subsampled breakpoints.

  • Added readable fit_xs, fit_ys, and support_points properties to VWCDFRegressor for inspecting the simplified knot set after fit().

Learning — split validation

  • Added split_validation_mask and split_validation_mode parameters to JPT.fit() / JPT.learn(). Passing a boolean or uint8 mask marks each training row as either a training sample (used as a split-candidate feature value) or an evaluation sample (used to score impurity but not to propose splits). Modes 'both' (default), 'training' and 'evaluation' select which subset of targets contributes to the impurity score at each split.

  • Added min_eval_samples hyperparameter to JPT. When split_validation_mode='evaluation', rejects any candidate split where either child partition contains fewer than min_eval_samples evaluation rows. Accepts int (absolute) or float-in-(0, 1) (fraction of total rows), same convention as min_samples_leaf. Serialised via to_json() / from_json().

Deprecations

  • jpt.distributions.qpd.cdfreg.CDFRegressor is deprecated and emits a DeprecationWarning on instantiation. The class remains functional and callable for backward compatibility, but is no longer used by QuantileDistribution; new code should use VWCDFRegressor directly.

Bug Fixes

Distributions

  • Fixed LinearFunction.from_points on subnormal dx: rather than crashing on assertion failures, returns a ConstantFunction that preserves the jump-segment convention of the CDF.

  • Fixed Integer distribution merging: probabilities are now normalised after weighted accumulation, preventing mass loss on merged mixtures.

  • Fixed CDF monotonicity enforcement in QuantileDistribution and PPF monotonicity in the from_json path.

  • Ensured np.ascontiguousarray on data buffers fed into the Cython fit routines to avoid silent precision loss from stride mismatches.

Learning

  • Fixed a regression in the symbolic-impurity normalisation that caused invert_impurity=True to prefer pure leaves (the opposite of its semantics).

  • tqdm progress bars now write to stderr, avoiding collisions with stdout-configured logging.

Infrastructure

  • Switched symbolic impurity normalisation from a global symbol-count denominator to a local count (number of symbols actually present in the current partition). This gives adaptive regularisation that behaves consistently across leaves with different symbolic support sizes.

Test Suite

  • Added ~50 new test cases covering VWCDFRegressor (class API, numerical invariants, Cython-vs-Python-reference cross-check, performance canaries), split validation end-to-end and at the impurity level, and min_eval_samples resolution / enforcement.

  • test_k3_mpe hardcoded values and test_moment delta tolerances updated to match the new fitter’s output precision.

Plotting / Engine Tests

  • Added 500+ lines of tests for the matplotlib and plotly rendering engines; completed the cdfreg test stubs.

Known Issues

  • QuantileDistribution.merge() does not propagate embedded probability-mass jumps from multi-sample clusters when both input distributions have independently fitted clusters at different x-values. In the single-contributor case (one weight set to 1), results are identical to the input. Fix scheduled for 1.4.0 together with jump-aware likelihood evaluation.

1.1.0

This release adds dependency discovery and xi-based pruning to the JPT learning pipeline.

New Features

Dependency discovery

  • Added jpt.base.correlation package with a standalone implementation of Chatterjee’s xi correlation coefficient (xi_correlation, xi_correlation_matrix).

  • Added jpt.learning.dependency package with the DependencyDiscovery abstract base class and XiDependencyDiscovery, which computes xi for all feature-target pairs and retains only statistically significant dependencies.

  • The dependencies parameter of JPT.__init__ now accepts DependencyDiscovery instances in addition to None and explicit dictionaries. Discovery strategies are re-invoked on each learn() call and preserved during JSON serialization.

Pruning

  • Added jpt.learning.pruning package with XiPruningCriterion, a prune_or_split callback that stops splitting when no feature-target pair shows significant functional dependence in the current partition.

  • The prune_or_split callback signature is extended from (jpt, partition, indices) to (jpt, partition, indices, data), eliminating the need to access process-local state.

Documentation

  • Added how-to guide for dependency discovery and xi pruning with mathematical background, worked examples, and extensibility guide.

  • Added Chatterjee (2021), Dalitz et al. (2024), and Shi et al. (2022) to the bibliography.

  • Added xi_pruning.py example demonstrating both features.

Bug Fixes

  • Fixed outdated important_datastructures.ipynb tutorial notebook: replaced removed list2interval and RealSet with current ContinuousSet and UnionSet API; corrected infer_from_dataframe import path.

Test Suite

  • Added 24 test cases covering xi correlation properties, dependency discovery (structure recovery, serialization, JPT integration), and pruning behavior (noise sensitivity, alpha monotonicity).

1.0.0

This release contains substantial new features, bug fixes, and infrastructure improvements relative to the last 0.1.x series (0.1.41).

New Features

Inference

  • Corrected k-MPE implementation: max-heap extraction, proper leaf.prior scaling, and quadratic pruned-node requeue for multi-leaf correctness.

  • Added support for specifying numeric query intervals as Python lists in jpt.distributions.univariate.Numeric.p().

Learning

  • Added optional progress bar for monitoring the learning progress.

  • Added support for custom pruning criteria during JPT construction.

  • Runaway tree growth is now prevented when no max_std constraints are set.

Parallel processing

  • Added multicore module with a customised process-pool class that inherits thread-local state in child processes.

  • Added support for parallel likelihood computation over multiple cores.

  • Added support for parallel data preprocessing.

  • Added support for parallel learning of prior distributions per leaf.

  • Added support for parallel rendering of JPT leaves.

Serialization

  • Added JPT.dump() / JPT.load() and JPT.dumps() / JPT.loads() for JSON-based model persistence.

  • Added __getstate__() and __setstate__() to IntSet and NumberSet for pickle support.

Data structures

  • Added IntSet: a new Cython interval type for integer domains, replacing ad-hoc set arithmetic in integer distributions.

  • Added RealSet.min property.

  • Added QuadraticFunction support in PiecewiseFunction, including vertex-form construction and maximisation.

  • Added PiecewiseFunction.__neg__() and corresponding Function.__neg__() interface method.

  • Added PLFApproximator test coverage.

Visualisation

  • Added plotting support for Gaussian distributions in the Matplotlib and Plotly rendering engines.

  • Added support for fancy tree printing with Unicode box-drawing characters via anytree.

Bug Fixes

Inference and distributions

  • Fixed numeric imprecision in jpt.trees.JPT.posterior().

  • Fixed jpt.trees.JPT.expectation() method signature.

  • Fixed jpt.trees.JPT.encode() function.

  • Fixed error tolerance in Numeric.pdf_to_cdf() and corrected handling of Dirac impulse contributions.

  • Fixed sampling bug in UnionSet._sample (memoryview assignment).

  • Fixed ContinuousSet._sample memoryview assignment causing TypeError.

  • Fixed plotting of unbounded integer distributions.

Learning

  • Fixed memory leak and logging errors in the C4.5 learning algorithm.

  • Fixed infer_from_dataframe() variable type checks and suppressed erroneous deduplication of domain values.

  • Fixed IntegerVariable.assignment2set() value conversion and compatibility with the new IntSet class.

  • Fixed JPT._preprocess_data() DataFrame value transformation.

  • Fixed df.copy() before in-place manipulation to prevent side-effects on the caller’s DataFrame.

Parallelism and concurrency

  • Fixed multicore likelihood computation and multicore=None handling.

  • Fixed signal handling to main thread only.

  • Fixed repetitive pickling of the JPT instance in parallel leaf plotting.

  • Fixed ImportError when importing the custom Pool class.

  • Fixed thread-local JPT storage for worker processes.

Build and packaging

  • Fixed Cython version detection in pyximporter.py.

  • Fixed import issue with Cython >= 3.0.11.

  • Fixed concurrent Cython compilation.

  • Fixed relative data paths in tests and examples.

  • Pinned kaleido < 1.0 to prevent incompatible API changes.

Infrastructure

  • Migrated version management to setuptools-scm; version is now derived automatically from git tags.

  • Consolidated all build and dependency configuration into pyproject.toml.

  • Made graphviz, fglib, factorgraph, and mlflow optional dependencies with lazy imports.

  • Added typing-extensions as an explicit dependency.

  • Set Python 3.11 as the default build target.

  • Modernised type-hint syntax throughout trees.py and variables.py.

  • Migrated all print-based logging to Python’s standard logging module.

  • Lazy-import plotting engines in distribution modules to avoid importing heavy optional dependencies at module load time.

  • Updated GitHub Actions CI workflows.

Test Suite

  • Restructured the test suite into per-module files under test/distributions/, test/base/, test/variables/, and test/learning/.

  • Added docstrings to all test methods.

  • Added placeholder test cases for the plotting engine.

Previous Releases

For changes in the 0.1.x series please refer to the git history:

git log 0.1.41 --oneline