Changelog
=========

1.3.4
-----

Patch release — impurity aggregator fix and example cleanup.

Bug Fixes
~~~~~~~~~

- Fixed incorrect impurity averaging in ``compute_var_improvements``
  (``learning/impurity/impurity.pyx``): the function now accepts a
  sentinel-terminated ``dependent_columns`` buffer and iterates only
  over active dependent targets, skipping targets with non-positive
  parent variance. Previously it used a ``skip_idx`` scalar that caused
  incorrect averaging when some modalities were inactive.

- Removed the deprecated ``columns=`` keyword argument from
  ``examples/banana.py`` and ``examples/federal.py``.


1.3.3
-----

Patch release — bug fix in ``QuantileDistribution`` deserialisation.

Bug Fixes
~~~~~~~~~

- ``QuantileDistribution.from_json`` no longer raises ``AssertionError``
  when deserialising a CDF with a single function segment.
  ``_assert_consistency`` now raises ``ValueError`` with a descriptive
  message instead of relying on a bare ``assert`` statement, so callers
  can catch and handle the error as a ``ValueError``.


1.3.2
-----

Patch release — bug fix in evidence formatting for integer variables.

Bug Fixes
~~~~~~~~~

- ``JPT.posterior()`` now correctly raises ``Unsatisfiability`` when
  unsatisfiable evidence contains a ``UnionSet`` for an integer
  variable. Previously, ``IntegerVariable.str()`` only handled
  ``IntSet`` and ``numbers.Number`` and raised ``TypeError`` on a
  ``UnionSet``; that ``TypeError`` escaped from ``posterior()`` while
  it was building the ``Unsatisfiability`` message via
  ``format_path(evidence)``, masking the intended exception so callers
  could not catch and recover. ``IntegerVariable.str()`` now mirrors
  the ``UnionSet`` handling already present in
  ``NumericVariable.str()``: it normalises ``IntSet`` inputs into a
  one-element ``UnionSet``, joins intervals with ``∪`` for
  ``fmt='set'`` and with ``∨`` for ``fmt='logic'``, and raises
  ``ValueError`` (instead of ``TypeError``) for an unknown ``fmt``.


1.3.1
-----

Patch release — CI fix only, no library changes.

Bug Fixes
~~~~~~~~~

- Fixed the manylinux wheel build step on GitHub Actions: the
  ``RalfG/python-wheels-manylinux-build`` step now runs
  ``git config --global --add safe.directory /github/workspace``
  before ``pip wheel`` so that version derivation through
  ``setuptools-scm`` / ``vcs-versioning`` is not blocked by Git's
  "dubious ownership" safety check inside the bind-mounted
  container.


1.3.0
-----

This release rewrites the numeric-distribution fit backend on top
of a new greedy, L∞-optimal simplifier, deprecates the previous
``CDFRegressor``, and adds two learning-time features for
controlling generalisation: *split validation* and
``min_eval_samples``.

New Features
~~~~~~~~~~~~

*Numeric distribution fit backend*

- Added ``jpt.distributions.qpd.vwcdfreg.VWCDFRegressor`` — a
  Cython implementation of a greedy bottom-up L∞-optimal
  piecewise-linear regressor for empirical CDFs, based on the
  Visvalingam-Whyatt line simplification algorithm adapted from 2D
  cartographic generalisation to the 1D CDF setting by replacing
  the triangular-area cost with the max absolute residual.
  ``QuantileDistribution.fit()`` now uses this backend
  internally. Every original data point is now guaranteed to lie
  within ``eps`` of the fitted piecewise-linear CDF
  (sup-norm bound), a strictly stronger guarantee than the
  previous approximate MSE-driven fit which only bounded error
  against subsampled breakpoints.
- Added readable ``fit_xs``, ``fit_ys``, and ``support_points``
  properties to ``VWCDFRegressor`` for inspecting the simplified
  knot set after ``fit()``.

*Learning — split validation*

- Added ``split_validation_mask`` and ``split_validation_mode``
  parameters to ``JPT.fit()`` / ``JPT.learn()``. Passing a boolean
  or ``uint8`` mask marks each training row as either a training
  sample (used as a split-candidate feature value) or an
  *evaluation* sample (used to score impurity but not to propose
  splits). Modes ``'both'`` (default), ``'training'`` and
  ``'evaluation'`` select which subset of targets contributes to
  the impurity score at each split.
- Added ``min_eval_samples`` hyperparameter to ``JPT``. When
  ``split_validation_mode='evaluation'``, rejects any candidate
  split where either child partition contains fewer than
  ``min_eval_samples`` evaluation rows. Accepts int (absolute) or
  float-in-(0, 1) (fraction of total rows), same convention as
  ``min_samples_leaf``. Serialised via ``to_json()`` /
  ``from_json()``.

Deprecations
~~~~~~~~~~~~

- ``jpt.distributions.qpd.cdfreg.CDFRegressor`` is deprecated and
  emits a ``DeprecationWarning`` on instantiation. The class
  remains functional and callable for backward compatibility, but
  is no longer used by ``QuantileDistribution``; new code should
  use ``VWCDFRegressor`` directly.

Bug Fixes
~~~~~~~~~

*Distributions*

- Fixed ``LinearFunction.from_points`` on subnormal ``dx``: rather
  than crashing on assertion failures, returns a
  ``ConstantFunction`` that preserves the jump-segment convention
  of the CDF.
- Fixed ``Integer`` distribution merging: probabilities are now
  normalised after weighted accumulation, preventing mass loss on
  merged mixtures.
- Fixed CDF monotonicity enforcement in ``QuantileDistribution``
  and PPF monotonicity in the ``from_json`` path.
- Ensured ``np.ascontiguousarray`` on data buffers fed into the
  Cython fit routines to avoid silent precision loss from stride
  mismatches.

*Learning*

- Fixed a regression in the symbolic-impurity normalisation that
  caused ``invert_impurity=True`` to prefer pure leaves (the
  opposite of its semantics).
- ``tqdm`` progress bars now write to stderr, avoiding collisions
  with stdout-configured logging.

Infrastructure
~~~~~~~~~~~~~~

- Switched symbolic impurity normalisation from a global
  symbol-count denominator to a *local* count (number of symbols
  actually present in the current partition). This gives
  adaptive regularisation that behaves consistently across leaves
  with different symbolic support sizes.

Test Suite
~~~~~~~~~~

- Added ~50 new test cases covering ``VWCDFRegressor`` (class API,
  numerical invariants, Cython-vs-Python-reference cross-check,
  performance canaries), split validation end-to-end and at the
  impurity level, and ``min_eval_samples`` resolution / enforcement.
- ``test_k3_mpe`` hardcoded values and ``test_moment`` delta
  tolerances updated to match the new fitter's output precision.

Plotting / Engine Tests
~~~~~~~~~~~~~~~~~~~~~~~

- Added 500+ lines of tests for the matplotlib and plotly
  rendering engines; completed the ``cdfreg`` test stubs.

Known Issues
~~~~~~~~~~~~

- ``QuantileDistribution.merge()`` does not propagate embedded
  probability-mass jumps from multi-sample clusters when both
  input distributions have independently fitted clusters at
  different x-values. In the single-contributor case (one weight
  set to 1), results are identical to the input. Fix scheduled
  for 1.4.0 together with jump-aware likelihood evaluation.


1.1.0
-----

This release adds dependency discovery and xi-based pruning to the JPT
learning pipeline.

New Features
~~~~~~~~~~~~

*Dependency discovery*

- Added ``jpt.base.correlation`` package with a standalone
  implementation of Chatterjee's xi correlation coefficient
  (``xi_correlation``, ``xi_correlation_matrix``).
- Added ``jpt.learning.dependency`` package with the
  ``DependencyDiscovery`` abstract base class and
  ``XiDependencyDiscovery``, which computes xi for all feature-target
  pairs and retains only statistically significant dependencies.
- The ``dependencies`` parameter of ``JPT.__init__`` now accepts
  ``DependencyDiscovery`` instances in addition to ``None`` and
  explicit dictionaries. Discovery strategies are re-invoked on each
  ``learn()`` call and preserved during JSON serialization.

*Pruning*

- Added ``jpt.learning.pruning`` package with
  ``XiPruningCriterion``, a ``prune_or_split`` callback that stops
  splitting when no feature-target pair shows significant functional
  dependence in the current partition.
- The ``prune_or_split`` callback signature is extended from
  ``(jpt, partition, indices)`` to ``(jpt, partition, indices, data)``,
  eliminating the need to access process-local state.

*Documentation*

- Added how-to guide for dependency discovery and xi pruning with
  mathematical background, worked examples, and extensibility guide.
- Added Chatterjee (2021), Dalitz et al. (2024), and Shi et al. (2022)
  to the bibliography.
- Added ``xi_pruning.py`` example demonstrating both features.

Bug Fixes
~~~~~~~~~

- Fixed outdated ``important_datastructures.ipynb`` tutorial notebook:
  replaced removed ``list2interval`` and ``RealSet`` with current
  ``ContinuousSet`` and ``UnionSet`` API; corrected
  ``infer_from_dataframe`` import path.

Test Suite
~~~~~~~~~~

- Added 24 test cases covering xi correlation properties, dependency
  discovery (structure recovery, serialization, JPT integration), and
  pruning behavior (noise sensitivity, alpha monotonicity).


1.0.0
-----

This release contains substantial new features, bug fixes, and infrastructure
improvements relative to the last ``0.1.x`` series (``0.1.41``).

New Features
~~~~~~~~~~~~

*Inference*

- Corrected k-MPE implementation: max-heap extraction, proper
  ``leaf.prior`` scaling, and quadratic pruned-node requeue for
  multi-leaf correctness.
- Added support for specifying numeric query intervals as Python lists
  in :py:meth:`jpt.distributions.univariate.Numeric.p`.

*Learning*

- Added optional progress bar for monitoring the learning progress.
- Added support for custom pruning criteria during JPT construction.
- Runaway tree growth is now prevented when no ``max_std`` constraints
  are set.

*Parallel processing*

- Added multicore module with a customised process-pool class that
  inherits thread-local state in child processes.
- Added support for parallel likelihood computation over multiple cores.
- Added support for parallel data preprocessing.
- Added support for parallel learning of prior distributions per leaf.
- Added support for parallel rendering of JPT leaves.

*Serialization*

- Added ``JPT.dump()`` / ``JPT.load()`` and ``JPT.dumps()`` /
  ``JPT.loads()`` for JSON-based model persistence.
- Added ``__getstate__()`` and ``__setstate__()`` to ``IntSet`` and
  ``NumberSet`` for pickle support.

*Data structures*

- Added ``IntSet``: a new Cython interval type for integer domains,
  replacing ad-hoc set arithmetic in integer distributions.
- Added ``RealSet.min`` property.
- Added ``QuadraticFunction`` support in ``PiecewiseFunction``,
  including vertex-form construction and maximisation.
- Added ``PiecewiseFunction.__neg__()`` and corresponding
  ``Function.__neg__()`` interface method.
- Added ``PLFApproximator`` test coverage.

*Visualisation*

- Added plotting support for Gaussian distributions in the Matplotlib
  and Plotly rendering engines.
- Added support for fancy tree printing with Unicode box-drawing
  characters via ``anytree``.

Bug Fixes
~~~~~~~~~

*Inference and distributions*

- Fixed numeric imprecision in :py:meth:`jpt.trees.JPT.posterior`.
- Fixed :py:meth:`jpt.trees.JPT.expectation` method signature.
- Fixed :py:meth:`jpt.trees.JPT.encode` function.
- Fixed error tolerance in ``Numeric.pdf_to_cdf()`` and corrected
  handling of Dirac impulse contributions.
- Fixed sampling bug in ``UnionSet._sample`` (memoryview assignment).
- Fixed ``ContinuousSet._sample`` memoryview assignment causing
  ``TypeError``.
- Fixed plotting of unbounded integer distributions.

*Learning*

- Fixed memory leak and logging errors in the C4.5 learning algorithm.
- Fixed ``infer_from_dataframe()`` variable type checks and suppressed
  erroneous deduplication of domain values.
- Fixed ``IntegerVariable.assignment2set()`` value conversion and
  compatibility with the new ``IntSet`` class.
- Fixed ``JPT._preprocess_data()`` DataFrame value transformation.
- Fixed ``df.copy()`` before in-place manipulation to prevent
  side-effects on the caller's DataFrame.

*Parallelism and concurrency*

- Fixed multicore likelihood computation and ``multicore=None``
  handling.
- Fixed signal handling to main thread only.
- Fixed repetitive pickling of the JPT instance in parallel leaf
  plotting.
- Fixed ``ImportError`` when importing the custom Pool class.
- Fixed thread-local JPT storage for worker processes.

*Build and packaging*

- Fixed Cython version detection in ``pyximporter.py``.
- Fixed import issue with Cython >= 3.0.11.
- Fixed concurrent Cython compilation.
- Fixed relative data paths in tests and examples.
- Pinned ``kaleido < 1.0`` to prevent incompatible API changes.

Infrastructure
~~~~~~~~~~~~~~

- Migrated version management to ``setuptools-scm``; version is now
  derived automatically from git tags.
- Consolidated all build and dependency configuration into
  ``pyproject.toml``.
- Made ``graphviz``, ``fglib``, ``factorgraph``, and ``mlflow``
  optional dependencies with lazy imports.
- Added ``typing-extensions`` as an explicit dependency.
- Set Python 3.11 as the default build target.
- Modernised type-hint syntax throughout ``trees.py`` and
  ``variables.py``.
- Migrated all ``print``-based logging to Python's standard
  ``logging`` module.
- Lazy-import plotting engines in distribution modules to avoid
  importing heavy optional dependencies at module load time.
- Updated GitHub Actions CI workflows.

Test Suite
~~~~~~~~~~

- Restructured the test suite into per-module files under
  ``test/distributions/``, ``test/base/``, ``test/variables/``, and
  ``test/learning/``.
- Added docstrings to all test methods.
- Added placeholder test cases for the plotting engine.

Previous Releases
-----------------

For changes in the ``0.1.x`` series please refer to the git history::

    git log 0.1.41 --oneline