{ "cells": [ { "cell_type": "markdown", "id": "d57e1959", "metadata": {}, "source": [ "# Working with Intervals\n", "\n", "`pyjpt` ships a first-class interval arithmetic library used internally for\n", "representing variable domains, evidence ranges, and query results. The same\n", "classes are fully public and useful on their own.\n", "\n", "This tutorial covers:\n", "\n", "1. `ContinuousSet` — a single real-valued interval with inclusive/exclusive bounds\n", "2. `IntSet` — a contiguous range of integers\n", "3. `UnionSet` — a disjoint union of any of the above\n", "4. Set operations — intersection, union, difference, complement\n", "5. Sampling, chopping, and transforming intervals\n", "6. Using intervals in JPT queries" ] }, { "cell_type": "markdown", "id": "f11412e3", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "id": "04b3d049", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.767545Z", "iopub.status.busy": "2026-03-17T09:17:41.767026Z", "iopub.status.idle": "2026-03-17T09:17:41.889930Z", "shell.execute_reply": "2026-03-17T09:17:41.889076Z" } }, "outputs": [], "source": [ "from jpt.base.intervals import (\n", " ContinuousSet, IntSet, UnionSet,\n", " INC, EXC, # boundary type constants\n", " R, Z, EMPTY, # predefined sets: all reals, all integers, empty set\n", ")\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "d562d58e", "metadata": {}, "source": [ "## ContinuousSet\n", "\n", "Represents a single real-valued interval. Bounds can be\n", "**inclusive** (`INC`, written `[` / `]`) or **exclusive** (`EXC`, written `(` / `)`).\n", "\n", "### Creating intervals" ] }, { "cell_type": "code", "execution_count": 2, "id": "06cac94b", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.891450Z", "iopub.status.busy": "2026-03-17T09:17:41.891307Z", "iopub.status.idle": "2026-03-17T09:17:41.895041Z", "shell.execute_reply": "2026-03-17T09:17:41.894469Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.0,1.0)\n", "[2.0,5.0]\n", "(0.0,1.0)\n" ] } ], "source": [ "# Constructor: ContinuousSet(lower, upper, left_bound, right_bound)\n", "half_open = ContinuousSet(0, 1, INC, EXC) # [0, 1)\n", "closed = ContinuousSet(2, 5, INC, INC) # [2, 5]\n", "open_ = ContinuousSet(0, 1, EXC, EXC) # (0, 1)\n", "\n", "print(half_open) # [0.0, 1.0)\n", "print(closed) # [2.0, 5.0]\n", "print(open_) # (0.0, 1.0)" ] }, { "cell_type": "markdown", "id": "75639cfc", "metadata": {}, "source": [ "### Parsing from strings\n", "\n", "The most convenient way to build intervals is to parse a string:" ] }, { "cell_type": "code", "execution_count": 3, "id": "e8dddb53", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.896355Z", "iopub.status.busy": "2026-03-17T09:17:41.896236Z", "iopub.status.idle": "2026-03-17T09:17:41.899541Z", "shell.execute_reply": "2026-03-17T09:17:41.899041Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.0,2.0]\n", "(1.5,4.0)\n", "[-∞,0.0)\n", "[3.0,∞]\n" ] } ], "source": [ "i1 = ContinuousSet.parse('[0, 2]')\n", "i2 = ContinuousSet.parse('(1.5, 4)')\n", "i3 = ContinuousSet.parse('[-inf, 0)')\n", "i4 = ContinuousSet.parse('[3, inf]')\n", "\n", "for i in [i1, i2, i3, i4]:\n", " print(i)" ] }, { "cell_type": "markdown", "id": "55dd4b87", "metadata": {}, "source": [ "### Basic properties" ] }, { "cell_type": "code", "execution_count": 4, "id": "eddeb027", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.901133Z", "iopub.status.busy": "2026-03-17T09:17:41.901038Z", "iopub.status.idle": "2026-03-17T09:17:41.903855Z", "shell.execute_reply": "2026-03-17T09:17:41.903353Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "min: 1.0\n", "max: 5.0\n", "width: 4.0\n", "empty: 0\n", "contains 3: 1\n", "contains 6: 0\n" ] } ], "source": [ "i = ContinuousSet.parse('[1, 5]')\n", "\n", "print(\"min: \", i.min)\n", "print(\"max: \", i.max)\n", "print(\"width: \", i.width)\n", "print(\"empty: \", i.isempty())\n", "print(\"contains 3:\", i.contains_value(3))\n", "print(\"contains 6:\", i.contains_value(6))" ] }, { "cell_type": "markdown", "id": "b8b80e0a", "metadata": {}, "source": [ "### Predefined constants" ] }, { "cell_type": "code", "execution_count": 5, "id": "9e40c86a", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.905078Z", "iopub.status.busy": "2026-03-17T09:17:41.904985Z", "iopub.status.idle": "2026-03-17T09:17:41.907827Z", "shell.execute_reply": "2026-03-17T09:17:41.907294Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All reals R: (-∞,∞)\n", "Empty set: ∅\n", "R is empty? 0\n", "EMPTY is empty? 1\n" ] } ], "source": [ "print(\"All reals R:\", R)\n", "print(\"Empty set: \", EMPTY)\n", "print(\"R is empty?\", R.isempty())\n", "print(\"EMPTY is empty?\", EMPTY.isempty())" ] }, { "cell_type": "markdown", "id": "5f9da90d", "metadata": {}, "source": [ "## Set Operations" ] }, { "cell_type": "code", "execution_count": 6, "id": "3ff64ad3", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.909189Z", "iopub.status.busy": "2026-03-17T09:17:41.909100Z", "iopub.status.idle": "2026-03-17T09:17:41.912289Z", "shell.execute_reply": "2026-03-17T09:17:41.911756Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a: [0.0,3.0]\n", "b: [2.0,5.0]\n", "a ∩ b: [2.0,3.0]\n", "a ∪ b: [0.0,5.0]\n", "a \\ b: [0.0,2.0)\n", "intersects? 1\n", "a ⊆ b? 0\n" ] } ], "source": [ "a = ContinuousSet.parse('[0, 3]')\n", "b = ContinuousSet.parse('[2, 5]')\n", "\n", "print(\"a: \", a)\n", "print(\"b: \", b)\n", "print(\"a ∩ b: \", a.intersection(b))\n", "print(\"a ∪ b: \", a.union(b))\n", "print(\"a \\ b: \", a.difference(b))\n", "print(\"intersects? \", a.intersects(b))\n", "print(\"a ⊆ b? \", a.issuperseteq(b))" ] }, { "cell_type": "markdown", "id": "d4e841f1", "metadata": {}, "source": [ "Operator shortcuts `&`, `|`, `-` work too:" ] }, { "cell_type": "code", "execution_count": 7, "id": "2a384c43", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.913518Z", "iopub.status.busy": "2026-03-17T09:17:41.913433Z", "iopub.status.idle": "2026-03-17T09:17:41.915881Z", "shell.execute_reply": "2026-03-17T09:17:41.915460Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.0,3.0]\n", "[0.0,4.0]\n", "[0.0,1.0) ∪ (3.0,4.0]\n" ] } ], "source": [ "a = ContinuousSet.parse('[0, 4]')\n", "b = ContinuousSet.parse('[1, 3]')\n", "\n", "print(a & b) # intersection\n", "print(a | b) # union\n", "print(a - b) # difference" ] }, { "cell_type": "markdown", "id": "0e10e162", "metadata": {}, "source": [ "### Complement\n", "\n", "The complement of an interval with respect to ℝ:" ] }, { "cell_type": "code", "execution_count": 8, "id": "4589cdef", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.917486Z", "iopub.status.busy": "2026-03-17T09:17:41.917385Z", "iopub.status.idle": "2026-03-17T09:17:41.919869Z", "shell.execute_reply": "2026-03-17T09:17:41.919306Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "complement of [1.0,3.0] = (-∞,1.0) ∪ (3.0,∞)\n" ] } ], "source": [ "i = ContinuousSet.parse('[1, 3]')\n", "comp = i.complement()\n", "print(f\"complement of {i} = {comp}\")" ] }, { "cell_type": "markdown", "id": "c26055ac", "metadata": {}, "source": [ "## Sampling\n", "\n", "Draw uniform random samples from any interval:" ] }, { "cell_type": "code", "execution_count": 9, "id": "6b70dc76", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.921134Z", "iopub.status.busy": "2026-03-17T09:17:41.921038Z", "iopub.status.idle": "2026-03-17T09:17:41.928119Z", "shell.execute_reply": "2026-03-17T09:17:41.927595Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 samples from [2, 5]: [2.22827182 3.55108613 2.26035635 2.79801308 3.46849101 4.29623478\n", " 2.50564093 2.48938792 4.46170836 3.60877701]\n", "all in range: True\n" ] } ], "source": [ "i = ContinuousSet.parse('[2, 5]')\n", "samples = i.sample(10)\n", "print(\"10 samples from [2, 5]:\", samples)\n", "print(\"all in range:\", all(2 <= x <= 5 for x in samples))" ] }, { "cell_type": "markdown", "id": "37a7e71d", "metadata": {}, "source": [ "## IntSet — Integer Intervals\n", "\n", "`IntSet` represents a contiguous range of integers. Boundaries are always\n", "inclusive." ] }, { "cell_type": "code", "execution_count": 10, "id": "117f1275", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.929357Z", "iopub.status.busy": "2026-03-17T09:17:41.929233Z", "iopub.status.idle": "2026-03-17T09:17:41.932478Z", "shell.execute_reply": "2026-03-17T09:17:41.931887Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{1..10}\n", "{5..15}\n", "intersection: {5..10}\n", "union: {1..15}\n", "size: 10.0\n", "All integers: ℤ\n" ] } ], "source": [ "from jpt.base.intervals import Z\n", "\n", "z1 = IntSet(1, 10) # {1, 2, ..., 10}\n", "z2 = IntSet.parse('{5..15}')\n", "\n", "print(z1)\n", "print(z2)\n", "print(\"intersection:\", z1.intersection(z2))\n", "print(\"union: \", z1.union(z2))\n", "print(\"size: \", z1.size())\n", "print(\"All integers:\", Z)" ] }, { "cell_type": "markdown", "id": "a1d46d33", "metadata": {}, "source": [ "Iterate directly over an `IntSet`:" ] }, { "cell_type": "code", "execution_count": 11, "id": "3e85921b", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.933724Z", "iopub.status.busy": "2026-03-17T09:17:41.933631Z", "iopub.status.idle": "2026-03-17T09:17:41.936135Z", "shell.execute_reply": "2026-03-17T09:17:41.935542Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 2 3 4 5 \n" ] } ], "source": [ "for n in IntSet(1, 5):\n", " print(n, end=\" \")\n", "print()" ] }, { "cell_type": "markdown", "id": "dc2d0c08", "metadata": {}, "source": [ "Sample integers:" ] }, { "cell_type": "code", "execution_count": 12, "id": "d8292f49", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.937302Z", "iopub.status.busy": "2026-03-17T09:17:41.937213Z", "iopub.status.idle": "2026-03-17T09:17:41.939723Z", "shell.execute_reply": "2026-03-17T09:17:41.939306Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 random integers from {0..100}: [59. 65. 57. 6. 22.]\n" ] } ], "source": [ "z = IntSet(0, 100)\n", "print(\"5 random integers from {0..100}:\", z.sample(5))" ] }, { "cell_type": "markdown", "id": "34f4eb43", "metadata": {}, "source": [ "## UnionSet — Disjoint Unions\n", "\n", "When two intervals do not overlap, their union is a `UnionSet`:" ] }, { "cell_type": "code", "execution_count": 13, "id": "c365bbc6", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.941238Z", "iopub.status.busy": "2026-03-17T09:17:41.941146Z", "iopub.status.idle": "2026-03-17T09:17:41.943885Z", "shell.execute_reply": "2026-03-17T09:17:41.943449Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "UnionSet [0.0,1.0] ∪ [3.0,5.0]\n", "contains 0.5: 1\n", "contains 2.0: 0\n", "contains 4.0: 1\n" ] } ], "source": [ "a = ContinuousSet.parse('[0, 1]')\n", "b = ContinuousSet.parse('[3, 5]')\n", "u = a.union(b)\n", "print(type(u).__name__, u)\n", "print(\"contains 0.5:\", u.contains_value(0.5))\n", "print(\"contains 2.0:\", u.contains_value(2.0))\n", "print(\"contains 4.0:\", u.contains_value(4.0))" ] }, { "cell_type": "markdown", "id": "cf1a14d7", "metadata": {}, "source": [ "Build a `UnionSet` directly from a list of intervals:" ] }, { "cell_type": "code", "execution_count": 14, "id": "b810b828", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.945007Z", "iopub.status.busy": "2026-03-17T09:17:41.944922Z", "iopub.status.idle": "2026-03-17T09:17:41.947654Z", "shell.execute_reply": "2026-03-17T09:17:41.947256Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-2.0,-1.0] ∪ [0.0,1.0] ∪ [2.0,3.0]\n", "min: -2.0 max: 3.0\n", "sample 6 points: [0.23038297 0.46269014 2.43488776 2.30432159 0.00416605 0.46396241]\n" ] } ], "source": [ "pieces = [\n", " ContinuousSet.parse('[-2, -1]'),\n", " ContinuousSet.parse('[0, 1]'),\n", " ContinuousSet.parse('[2, 3]'),\n", "]\n", "u = UnionSet(pieces)\n", "print(u)\n", "print(\"min:\", u.min, \" max:\", u.max)\n", "print(\"sample 6 points:\", u.sample(6))" ] }, { "cell_type": "markdown", "id": "54cfe9fe", "metadata": {}, "source": [ "### Simplifying a UnionSet\n", "\n", "If contiguous intervals were added separately, `simplify()` merges them:" ] }, { "cell_type": "code", "execution_count": 15, "id": "9922b091", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.948935Z", "iopub.status.busy": "2026-03-17T09:17:41.948816Z", "iopub.status.idle": "2026-03-17T09:17:41.951628Z", "shell.execute_reply": "2026-03-17T09:17:41.951195Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "before simplify: [0.0,1.0] ∪ [1.0,2.0] ∪ [5.0,6.0]\n", "after simplify: [0.0,2.0] ∪ [5.0,6.0]\n" ] } ], "source": [ "u = UnionSet([\n", " ContinuousSet.parse('[0, 1]'),\n", " ContinuousSet.parse('[1, 2]'), # contiguous — will be merged\n", " ContinuousSet.parse('[5, 6]'),\n", "])\n", "print(\"before simplify:\", u)\n", "print(\"after simplify:\", u.simplify())" ] }, { "cell_type": "markdown", "id": "8947abd4", "metadata": {}, "source": [ "## Chopping Intervals\n", "\n", "Split an interval at a list of points — useful for discretisation:" ] }, { "cell_type": "code", "execution_count": 16, "id": "10158f5d", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.952867Z", "iopub.status.busy": "2026-03-17T09:17:41.952733Z", "iopub.status.idle": "2026-03-17T09:17:41.955205Z", "shell.execute_reply": "2026-03-17T09:17:41.954627Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.0,2.0)\n", "[2.0,4.0)\n", "[4.0,7.0)\n", "[7.0,10.0]\n" ] } ], "source": [ "i = ContinuousSet.parse('[0, 10]')\n", "chops = list(i.chop([2, 4, 7]))\n", "for c in chops:\n", " print(c)" ] }, { "cell_type": "markdown", "id": "98525e13", "metadata": {}, "source": [ "## Transforming Boundaries\n", "\n", "Apply an arbitrary function to both boundaries while keeping the bound types:" ] }, { "cell_type": "code", "execution_count": 17, "id": "3ab07146", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.956462Z", "iopub.status.busy": "2026-03-17T09:17:41.956376Z", "iopub.status.idle": "2026-03-17T09:17:41.958610Z", "shell.execute_reply": "2026-03-17T09:17:41.958202Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "transform [1.0,3.0] by x² → [1.0,9.0]\n" ] } ], "source": [ "i = ContinuousSet.parse('[1, 3]')\n", "squared = i.transform(lambda x: x**2)\n", "print(f\"transform {i} by x² → {squared}\")" ] }, { "cell_type": "markdown", "id": "9a975ffc", "metadata": {}, "source": [ "## Using Intervals in JPT Queries\n", "\n", "Intervals are the native language for specifying evidence ranges and\n", "reading back posterior domains in `pyjpt`.\n", "\n", "### Evidence as intervals" ] }, { "cell_type": "code", "execution_count": 18, "id": "475ee42e", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:41.959940Z", "iopub.status.busy": "2026-03-17T09:17:41.959856Z", "iopub.status.idle": "2026-03-17T09:17:43.168338Z", "shell.execute_reply": "2026-03-17T09:17:43.167761Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "JPT#innernodes = 6, #leaves = 7 (13 total)\n" ] } ], "source": [ "import pandas as pd, sklearn.datasets\n", "from jpt.variables import infer_from_dataframe\n", "from jpt.trees import JPT\n", "\n", "iris = sklearn.datasets.load_iris()\n", "df = pd.DataFrame(iris.data, columns=iris.feature_names)\n", "df['species'] = [iris.target_names[t] for t in iris.target]\n", "\n", "variables = infer_from_dataframe(df)\n", "vnames = {v.name: v for v in variables}\n", "model = JPT(variables, min_samples_leaf=0.1)\n", "model.fit(df)\n", "print(model)" ] }, { "cell_type": "markdown", "id": "556fb71f", "metadata": {}, "source": [ "Pass a `[lo, hi]` list or a `ContinuousSet` directly as evidence:" ] }, { "cell_type": "code", "execution_count": 19, "id": "db2807e2", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:43.170027Z", "iopub.status.busy": "2026-03-17T09:17:43.169847Z", "iopub.status.idle": "2026-03-17T09:17:43.175292Z", "shell.execute_reply": "2026-03-17T09:17:43.174760Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(virginica | petal length ∈ [5,7]) = 0.9756\n", "Same via list shorthand: 0.9756\n" ] } ], "source": [ "# P(species=virginica | petal length ∈ [5, 7])\n", "p = model.infer(\n", " query={'species': 'virginica'},\n", " evidence={'petal length (cm)': ContinuousSet.parse('[5, 7]')}\n", ")\n", "print(f\"P(virginica | petal length ∈ [5,7]) = {p:.4f}\")\n", "\n", "# Equivalent shorthand — list is auto-converted\n", "p2 = model.infer(\n", " query={'species': 'virginica'},\n", " evidence={'petal length (cm)': [5, 7]}\n", ")\n", "print(f\"Same via list shorthand: {p2:.4f}\")" ] }, { "cell_type": "markdown", "id": "6ac222d9", "metadata": {}, "source": [ "### Posterior domains are intervals\n", "\n", "The posterior distribution over a numeric variable carries its support as\n", "a `ContinuousSet` (or `UnionSet` for multi-modal posteriors):" ] }, { "cell_type": "code", "execution_count": 20, "id": "57001359", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:43.176666Z", "iopub.status.busy": "2026-03-17T09:17:43.176548Z", "iopub.status.idle": "2026-03-17T09:17:43.180881Z", "shell.execute_reply": "2026-03-17T09:17:43.180287Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "petal length (cm): 99% range ≈ [0.46, 1.79], mean ≈ 5.17\n", "petal width (cm): 99% range ≈ [0.34, 1.71], mean ≈ 1.95\n" ] } ], "source": [ "post = model.posterior(\n", " variables=[vnames['petal length (cm)'], vnames['petal width (cm)']],\n", " evidence={'species': 'virginica'},\n", ")\n", "\n", "for vname, dist in post.items():\n", " lo, hi = dist.ppf(.01), dist.ppf(.99)\n", " print(f\"{vname.name}: 99% range ≈ [{lo:.2f}, {hi:.2f}], mean ≈ {dist.expectation():.2f}\")" ] }, { "cell_type": "markdown", "id": "c3b6b428", "metadata": {}, "source": [ "### Building multi-range evidence with UnionSet\n", "\n", "Exclude a band from the evidence by passing a `UnionSet`:" ] }, { "cell_type": "code", "execution_count": 21, "id": "f0c64c64", "metadata": { "execution": { "iopub.execute_input": "2026-03-17T09:17:43.182130Z", "iopub.status.busy": "2026-03-17T09:17:43.182041Z", "iopub.status.idle": "2026-03-17T09:17:43.186587Z", "shell.execute_reply": "2026-03-17T09:17:43.186027Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "evidence range: [4.0,5.0) ∪ (6.0,8.0]\n", "P(setosa | sepal length ∉ [5,6]) = 0.3041\n" ] } ], "source": [ "# Sepal length outside [5, 6] — i.e., either < 5 or > 6\n", "exclude_mid = ContinuousSet.parse('[4, 5)').union(ContinuousSet.parse('(6, 8]'))\n", "print(\"evidence range:\", exclude_mid)\n", "\n", "p = model.infer(\n", " query={'species': 'setosa'},\n", " evidence={'sepal length (cm)': exclude_mid}\n", ")\n", "print(f\"P(setosa | sepal length ∉ [5,6]) = {p:.4f}\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.15" } }, "nbformat": 4, "nbformat_minor": 5 }