Working with Intervals

pyjpt ships a first-class interval arithmetic library used internally for representing variable domains, evidence ranges, and query results. The same classes are fully public and useful on their own.

This tutorial covers:

ContinuousSet — a single real-valued interval with inclusive/exclusive bounds
IntSet — a contiguous range of integers
UnionSet — a disjoint union of any of the above
Set operations — intersection, union, difference, complement
Sampling, chopping, and transforming intervals
Using intervals in JPT queries

Imports

[1]:

from jpt.base.intervals import (
    ContinuousSet, IntSet, UnionSet,
    INC, EXC,           # boundary type constants
    R, Z, EMPTY,        # predefined sets: all reals, all integers, empty set
)
import numpy as np

ContinuousSet

Represents a single real-valued interval. Bounds can be inclusive (INC, written [ / ]) or exclusive (EXC, written ( / )).

Creating intervals

[2]:

# Constructor: ContinuousSet(lower, upper, left_bound, right_bound)
half_open = ContinuousSet(0, 1, INC, EXC)   # [0, 1)
closed    = ContinuousSet(2, 5, INC, INC)   # [2, 5]
open_     = ContinuousSet(0, 1, EXC, EXC)   # (0, 1)

print(half_open)   # [0.0, 1.0)
print(closed)      # [2.0, 5.0]
print(open_)       # (0.0, 1.0)

[0.0,1.0)
[2.0,5.0]
(0.0,1.0)

Parsing from strings

The most convenient way to build intervals is to parse a string:

[3]:

i1 = ContinuousSet.parse('[0, 2]')
i2 = ContinuousSet.parse('(1.5, 4)')
i3 = ContinuousSet.parse('[-inf, 0)')
i4 = ContinuousSet.parse('[3, inf]')

for i in [i1, i2, i3, i4]:
    print(i)

[0.0,2.0]
(1.5,4.0)
[-∞,0.0)
[3.0,∞]

Basic properties

[4]:

i = ContinuousSet.parse('[1, 5]')

print("min:   ", i.min)
print("max:   ", i.max)
print("width: ", i.width)
print("empty: ", i.isempty())
print("contains 3:", i.contains_value(3))
print("contains 6:", i.contains_value(6))

min:    1.0
max:    5.0
width:  4.0
empty:  0
contains 3: 1
contains 6: 0

Predefined constants

[5]:

print("All reals R:", R)
print("Empty set:  ", EMPTY)
print("R is empty?", R.isempty())
print("EMPTY is empty?", EMPTY.isempty())

All reals R: (-∞,∞)
Empty set:   ∅
R is empty? 0
EMPTY is empty? 1

Set Operations

[6]:

a = ContinuousSet.parse('[0, 3]')
b = ContinuousSet.parse('[2, 5]')

print("a:            ", a)
print("b:            ", b)
print("a ∩ b:        ", a.intersection(b))
print("a ∪ b:        ", a.union(b))
print("a \ b:        ", a.difference(b))
print("intersects?   ", a.intersects(b))
print("a ⊆ b?        ", a.issuperseteq(b))

a:             [0.0,3.0]
b:             [2.0,5.0]
a ∩ b:         [2.0,3.0]
a ∪ b:         [0.0,5.0]
a \ b:         [0.0,2.0)
intersects?    1
a ⊆ b?         0

Operator shortcuts &, |, - work too:

[7]:

a = ContinuousSet.parse('[0, 4]')
b = ContinuousSet.parse('[1, 3]')

print(a & b)   # intersection
print(a | b)   # union
print(a - b)   # difference

[1.0,3.0]
[0.0,4.0]
[0.0,1.0) ∪ (3.0,4.0]

Complement

The complement of an interval with respect to ℝ:

[8]:

i = ContinuousSet.parse('[1, 3]')
comp = i.complement()
print(f"complement of {i} = {comp}")

complement of [1.0,3.0] = (-∞,1.0) ∪ (3.0,∞)

Sampling

Draw uniform random samples from any interval:

[9]:

i = ContinuousSet.parse('[2, 5]')
samples = i.sample(10)
print("10 samples from [2, 5]:", samples)
print("all in range:", all(2 <= x <= 5 for x in samples))

10 samples from [2, 5]: [2.22827182 3.55108613 2.26035635 2.79801308 3.46849101 4.29623478
 2.50564093 2.48938792 4.46170836 3.60877701]
all in range: True

IntSet — Integer Intervals

IntSet represents a contiguous range of integers. Boundaries are always inclusive.

[10]:

from jpt.base.intervals import Z

z1 = IntSet(1, 10)          # {1, 2, ..., 10}
z2 = IntSet.parse('{5..15}')

print(z1)
print(z2)
print("intersection:", z1.intersection(z2))
print("union:       ", z1.union(z2))
print("size:        ", z1.size())
print("All integers:", Z)

{1..10}
{5..15}
intersection: {5..10}
union:        {1..15}
size:         10.0
All integers: ℤ

Iterate directly over an IntSet:

[11]:

for n in IntSet(1, 5):
    print(n, end="  ")
print()

1  2  3  4  5

Sample integers:

[12]:

z = IntSet(0, 100)
print("5 random integers from {0..100}:", z.sample(5))

5 random integers from {0..100}: [59. 65. 57.  6. 22.]

UnionSet — Disjoint Unions

When two intervals do not overlap, their union is a UnionSet:

[13]:

a = ContinuousSet.parse('[0, 1]')
b = ContinuousSet.parse('[3, 5]')
u = a.union(b)
print(type(u).__name__, u)
print("contains 0.5:", u.contains_value(0.5))
print("contains 2.0:", u.contains_value(2.0))
print("contains 4.0:", u.contains_value(4.0))

UnionSet [0.0,1.0] ∪ [3.0,5.0]
contains 0.5: 1
contains 2.0: 0
contains 4.0: 1

Build a UnionSet directly from a list of intervals:

[14]:

pieces = [
    ContinuousSet.parse('[-2, -1]'),
    ContinuousSet.parse('[0, 1]'),
    ContinuousSet.parse('[2, 3]'),
]
u = UnionSet(pieces)
print(u)
print("min:", u.min, "  max:", u.max)
print("sample 6 points:", u.sample(6))

[-2.0,-1.0] ∪ [0.0,1.0] ∪ [2.0,3.0]
min: -2.0   max: 3.0
sample 6 points: [0.23038297 0.46269014 2.43488776 2.30432159 0.00416605 0.46396241]

Simplifying a UnionSet

If contiguous intervals were added separately, simplify() merges them:

[15]:

u = UnionSet([
    ContinuousSet.parse('[0, 1]'),
    ContinuousSet.parse('[1, 2]'),   # contiguous — will be merged
    ContinuousSet.parse('[5, 6]'),
])
print("before simplify:", u)
print("after  simplify:", u.simplify())

before simplify: [0.0,1.0] ∪ [1.0,2.0] ∪ [5.0,6.0]
after  simplify: [0.0,2.0] ∪ [5.0,6.0]

Chopping Intervals

Split an interval at a list of points — useful for discretisation:

[16]:

i = ContinuousSet.parse('[0, 10]')
chops = list(i.chop([2, 4, 7]))
for c in chops:
    print(c)

[0.0,2.0)
[2.0,4.0)
[4.0,7.0)
[7.0,10.0]

Transforming Boundaries

Apply an arbitrary function to both boundaries while keeping the bound types:

[17]:

i = ContinuousSet.parse('[1, 3]')
squared = i.transform(lambda x: x**2)
print(f"transform {i} by x² → {squared}")

transform [1.0,3.0] by x² → [1.0,9.0]

Using Intervals in JPT Queries

Intervals are the native language for specifying evidence ranges and reading back posterior domains in pyjpt.

Evidence as intervals

[18]:

import pandas as pd, sklearn.datasets
from jpt.variables import infer_from_dataframe
from jpt.trees import JPT

iris = sklearn.datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = [iris.target_names[t] for t in iris.target]

variables = infer_from_dataframe(df)
vnames = {v.name: v for v in variables}
model = JPT(variables, min_samples_leaf=0.1)
model.fit(df)
print(model)

JPT#innernodes = 6, #leaves = 7 (13 total)

Pass a [lo, hi] list or a ContinuousSet directly as evidence:

[19]:

# P(species=virginica | petal length ∈ [5, 7])
p = model.infer(
    query={'species': 'virginica'},
    evidence={'petal length (cm)': ContinuousSet.parse('[5, 7]')}
)
print(f"P(virginica | petal length ∈ [5,7]) = {p:.4f}")

# Equivalent shorthand — list is auto-converted
p2 = model.infer(
    query={'species': 'virginica'},
    evidence={'petal length (cm)': [5, 7]}
)
print(f"Same via list shorthand:              {p2:.4f}")

P(virginica | petal length ∈ [5,7]) = 0.9756
Same via list shorthand:              0.9756

Posterior domains are intervals

The posterior distribution over a numeric variable carries its support as a ContinuousSet (or UnionSet for multi-modal posteriors):

[20]:

post = model.posterior(
    variables=[vnames['petal length (cm)'], vnames['petal width (cm)']],
    evidence={'species': 'virginica'},
)

for vname, dist in post.items():
    lo, hi = dist.ppf(.01), dist.ppf(.99)
    print(f"{vname.name}: 99% range ≈ [{lo:.2f}, {hi:.2f}], mean ≈ {dist.expectation():.2f}")

petal length (cm): 99% range ≈ [0.46, 1.79], mean ≈ 5.17
petal width (cm): 99% range ≈ [0.34, 1.71], mean ≈ 1.95

Building multi-range evidence with UnionSet

Exclude a band from the evidence by passing a UnionSet:

[21]:

# Sepal length outside [5, 6] — i.e., either < 5 or > 6
exclude_mid = ContinuousSet.parse('[4, 5)').union(ContinuousSet.parse('(6, 8]'))
print("evidence range:", exclude_mid)

p = model.infer(
    query={'species': 'setosa'},
    evidence={'sepal length (cm)': exclude_mid}
)
print(f"P(setosa | sepal length ∉ [5,6]) = {p:.4f}")

evidence range: [4.0,5.0) ∪ (6.0,8.0]
P(setosa | sepal length ∉ [5,6]) = 0.3041