Changelog
1.3.4
Patch release — impurity aggregator fix and example cleanup.
Bug Fixes
Fixed incorrect impurity averaging in
compute_var_improvements(learning/impurity/impurity.pyx): the function now accepts a sentinel-terminateddependent_columnsbuffer and iterates only over active dependent targets, skipping targets with non-positive parent variance. Previously it used askip_idxscalar that caused incorrect averaging when some modalities were inactive.Removed the deprecated
columns=keyword argument fromexamples/banana.pyandexamples/federal.py.
1.3.3
Patch release — bug fix in QuantileDistribution deserialisation.
Bug Fixes
QuantileDistribution.from_jsonno longer raisesAssertionErrorwhen deserialising a CDF with a single function segment._assert_consistencynow raisesValueErrorwith a descriptive message instead of relying on a bareassertstatement, so callers can catch and handle the error as aValueError.
1.3.2
Patch release — bug fix in evidence formatting for integer variables.
Bug Fixes
JPT.posterior()now correctly raisesUnsatisfiabilitywhen unsatisfiable evidence contains aUnionSetfor an integer variable. Previously,IntegerVariable.str()only handledIntSetandnumbers.Numberand raisedTypeErroron aUnionSet; thatTypeErrorescaped fromposterior()while it was building theUnsatisfiabilitymessage viaformat_path(evidence), masking the intended exception so callers could not catch and recover.IntegerVariable.str()now mirrors theUnionSethandling already present inNumericVariable.str(): it normalisesIntSetinputs into a one-elementUnionSet, joins intervals with∪forfmt='set'and with∨forfmt='logic', and raisesValueError(instead ofTypeError) for an unknownfmt.
1.3.1
Patch release — CI fix only, no library changes.
Bug Fixes
Fixed the manylinux wheel build step on GitHub Actions: the
RalfG/python-wheels-manylinux-buildstep now runsgit config --global --add safe.directory /github/workspacebeforepip wheelso that version derivation throughsetuptools-scm/vcs-versioningis not blocked by Git’s “dubious ownership” safety check inside the bind-mounted container.
1.3.0
This release rewrites the numeric-distribution fit backend on top
of a new greedy, L∞-optimal simplifier, deprecates the previous
CDFRegressor, and adds two learning-time features for
controlling generalisation: split validation and
min_eval_samples.
New Features
Numeric distribution fit backend
Added
jpt.distributions.qpd.vwcdfreg.VWCDFRegressor— a Cython implementation of a greedy bottom-up L∞-optimal piecewise-linear regressor for empirical CDFs, based on the Visvalingam-Whyatt line simplification algorithm adapted from 2D cartographic generalisation to the 1D CDF setting by replacing the triangular-area cost with the max absolute residual.QuantileDistribution.fit()now uses this backend internally. Every original data point is now guaranteed to lie withinepsof the fitted piecewise-linear CDF (sup-norm bound), a strictly stronger guarantee than the previous approximate MSE-driven fit which only bounded error against subsampled breakpoints.Added readable
fit_xs,fit_ys, andsupport_pointsproperties toVWCDFRegressorfor inspecting the simplified knot set afterfit().
Learning — split validation
Added
split_validation_maskandsplit_validation_modeparameters toJPT.fit()/JPT.learn(). Passing a boolean oruint8mask marks each training row as either a training sample (used as a split-candidate feature value) or an evaluation sample (used to score impurity but not to propose splits). Modes'both'(default),'training'and'evaluation'select which subset of targets contributes to the impurity score at each split.Added
min_eval_sampleshyperparameter toJPT. Whensplit_validation_mode='evaluation', rejects any candidate split where either child partition contains fewer thanmin_eval_samplesevaluation rows. Accepts int (absolute) or float-in-(0, 1) (fraction of total rows), same convention asmin_samples_leaf. Serialised viato_json()/from_json().
Deprecations
jpt.distributions.qpd.cdfreg.CDFRegressoris deprecated and emits aDeprecationWarningon instantiation. The class remains functional and callable for backward compatibility, but is no longer used byQuantileDistribution; new code should useVWCDFRegressordirectly.
Bug Fixes
Distributions
Fixed
LinearFunction.from_pointson subnormaldx: rather than crashing on assertion failures, returns aConstantFunctionthat preserves the jump-segment convention of the CDF.Fixed
Integerdistribution merging: probabilities are now normalised after weighted accumulation, preventing mass loss on merged mixtures.Fixed CDF monotonicity enforcement in
QuantileDistributionand PPF monotonicity in thefrom_jsonpath.Ensured
np.ascontiguousarrayon data buffers fed into the Cython fit routines to avoid silent precision loss from stride mismatches.
Learning
Fixed a regression in the symbolic-impurity normalisation that caused
invert_impurity=Trueto prefer pure leaves (the opposite of its semantics).tqdmprogress bars now write to stderr, avoiding collisions with stdout-configured logging.
Infrastructure
Switched symbolic impurity normalisation from a global symbol-count denominator to a local count (number of symbols actually present in the current partition). This gives adaptive regularisation that behaves consistently across leaves with different symbolic support sizes.
Test Suite
Added ~50 new test cases covering
VWCDFRegressor(class API, numerical invariants, Cython-vs-Python-reference cross-check, performance canaries), split validation end-to-end and at the impurity level, andmin_eval_samplesresolution / enforcement.test_k3_mpehardcoded values andtest_momentdelta tolerances updated to match the new fitter’s output precision.
Plotting / Engine Tests
Added 500+ lines of tests for the matplotlib and plotly rendering engines; completed the
cdfregtest stubs.
Known Issues
QuantileDistribution.merge()does not propagate embedded probability-mass jumps from multi-sample clusters when both input distributions have independently fitted clusters at different x-values. In the single-contributor case (one weight set to 1), results are identical to the input. Fix scheduled for 1.4.0 together with jump-aware likelihood evaluation.
1.1.0
This release adds dependency discovery and xi-based pruning to the JPT learning pipeline.
New Features
Dependency discovery
Added
jpt.base.correlationpackage with a standalone implementation of Chatterjee’s xi correlation coefficient (xi_correlation,xi_correlation_matrix).Added
jpt.learning.dependencypackage with theDependencyDiscoveryabstract base class andXiDependencyDiscovery, which computes xi for all feature-target pairs and retains only statistically significant dependencies.The
dependenciesparameter ofJPT.__init__now acceptsDependencyDiscoveryinstances in addition toNoneand explicit dictionaries. Discovery strategies are re-invoked on eachlearn()call and preserved during JSON serialization.
Pruning
Added
jpt.learning.pruningpackage withXiPruningCriterion, aprune_or_splitcallback that stops splitting when no feature-target pair shows significant functional dependence in the current partition.The
prune_or_splitcallback signature is extended from(jpt, partition, indices)to(jpt, partition, indices, data), eliminating the need to access process-local state.
Documentation
Added how-to guide for dependency discovery and xi pruning with mathematical background, worked examples, and extensibility guide.
Added Chatterjee (2021), Dalitz et al. (2024), and Shi et al. (2022) to the bibliography.
Added
xi_pruning.pyexample demonstrating both features.
Bug Fixes
Fixed outdated
important_datastructures.ipynbtutorial notebook: replaced removedlist2intervalandRealSetwith currentContinuousSetandUnionSetAPI; correctedinfer_from_dataframeimport path.
Test Suite
Added 24 test cases covering xi correlation properties, dependency discovery (structure recovery, serialization, JPT integration), and pruning behavior (noise sensitivity, alpha monotonicity).
1.0.0
This release contains substantial new features, bug fixes, and infrastructure
improvements relative to the last 0.1.x series (0.1.41).
New Features
Inference
Corrected k-MPE implementation: max-heap extraction, proper
leaf.priorscaling, and quadratic pruned-node requeue for multi-leaf correctness.Added support for specifying numeric query intervals as Python lists in
jpt.distributions.univariate.Numeric.p().
Learning
Added optional progress bar for monitoring the learning progress.
Added support for custom pruning criteria during JPT construction.
Runaway tree growth is now prevented when no
max_stdconstraints are set.
Parallel processing
Added multicore module with a customised process-pool class that inherits thread-local state in child processes.
Added support for parallel likelihood computation over multiple cores.
Added support for parallel data preprocessing.
Added support for parallel learning of prior distributions per leaf.
Added support for parallel rendering of JPT leaves.
Serialization
Added
JPT.dump()/JPT.load()andJPT.dumps()/JPT.loads()for JSON-based model persistence.Added
__getstate__()and__setstate__()toIntSetandNumberSetfor pickle support.
Data structures
Added
IntSet: a new Cython interval type for integer domains, replacing ad-hoc set arithmetic in integer distributions.Added
RealSet.minproperty.Added
QuadraticFunctionsupport inPiecewiseFunction, including vertex-form construction and maximisation.Added
PiecewiseFunction.__neg__()and correspondingFunction.__neg__()interface method.Added
PLFApproximatortest coverage.
Visualisation
Added plotting support for Gaussian distributions in the Matplotlib and Plotly rendering engines.
Added support for fancy tree printing with Unicode box-drawing characters via
anytree.
Bug Fixes
Inference and distributions
Fixed numeric imprecision in
jpt.trees.JPT.posterior().Fixed
jpt.trees.JPT.expectation()method signature.Fixed
jpt.trees.JPT.encode()function.Fixed error tolerance in
Numeric.pdf_to_cdf()and corrected handling of Dirac impulse contributions.Fixed sampling bug in
UnionSet._sample(memoryview assignment).Fixed
ContinuousSet._samplememoryview assignment causingTypeError.Fixed plotting of unbounded integer distributions.
Learning
Fixed memory leak and logging errors in the C4.5 learning algorithm.
Fixed
infer_from_dataframe()variable type checks and suppressed erroneous deduplication of domain values.Fixed
IntegerVariable.assignment2set()value conversion and compatibility with the newIntSetclass.Fixed
JPT._preprocess_data()DataFrame value transformation.Fixed
df.copy()before in-place manipulation to prevent side-effects on the caller’s DataFrame.
Parallelism and concurrency
Fixed multicore likelihood computation and
multicore=Nonehandling.Fixed signal handling to main thread only.
Fixed repetitive pickling of the JPT instance in parallel leaf plotting.
Fixed
ImportErrorwhen importing the custom Pool class.Fixed thread-local JPT storage for worker processes.
Build and packaging
Fixed Cython version detection in
pyximporter.py.Fixed import issue with Cython >= 3.0.11.
Fixed concurrent Cython compilation.
Fixed relative data paths in tests and examples.
Pinned
kaleido < 1.0to prevent incompatible API changes.
Infrastructure
Migrated version management to
setuptools-scm; version is now derived automatically from git tags.Consolidated all build and dependency configuration into
pyproject.toml.Made
graphviz,fglib,factorgraph, andmlflowoptional dependencies with lazy imports.Added
typing-extensionsas an explicit dependency.Set Python 3.11 as the default build target.
Modernised type-hint syntax throughout
trees.pyandvariables.py.Migrated all
print-based logging to Python’s standardloggingmodule.Lazy-import plotting engines in distribution modules to avoid importing heavy optional dependencies at module load time.
Updated GitHub Actions CI workflows.
Test Suite
Restructured the test suite into per-module files under
test/distributions/,test/base/,test/variables/, andtest/learning/.Added docstrings to all test methods.
Added placeholder test cases for the plotting engine.
Previous Releases
For changes in the 0.1.x series please refer to the git history:
git log 0.1.41 --oneline