Important Datastructures

In this tutorial we will walk through important datastructures that users will encounter while using pyjpt.

Sets

As sets are ubiquitous objects of interest in almost every mathematical theory pyjpt provides fast and flexible implementations of all kinds of sets.

Discrete Sets

Domains of jpt.variables.SymbolicVariable and jpt.variables.IntegerVariable are ordinary python sets. These can be constructed by calling the python set constructor.

[76]:
symbolic_set = {"Dog", "Cat", "Mouse"}
integer_set = {1, 2, 3}

For jpt.variables.SymbolicVariable a set of strings is usable and for jpt.variables.IntegerVariable a set of integers is required.

Continuous Sets

As real world applications often contain variables with a continuous domain pyjpt implements jpt.base.intervals.ContinuousSet and jpt.base.intervals.UnionSet as domain for numeric random variables. Continuous sets represent intervals on \(\mathbb{R}\) and work very similar to python sets. A continuous set can be created by importing the package and

  • calling the constructor

  • parsing it from string

[ ]:
from jpt.base.intervals import ContinuousSet

a = ContinuousSet(0, 1)
b = ContinuousSet.fromstring("[1, 2)")
c = ContinuousSet(-1, 1)

a, b, c

The usual set operators are also applicable on continuous sets.

[78]:
a_union_b = a.union(b)
a_difference_b = a.difference(b)
a_intersection_c = a.intersection(c)

a_union_b, a_difference_b, a_intersection_c
[78]:
(<ContinuousSet=[0.000,2.000[>,
 <ContinuousSet=[0.000,1.000[>,
 <ContinuousSet=[0.000,1.000]>)

It should be noted that sets can also be empty or contain only one single element.

[ ]:
from jpt.base.intervals import EMPTY
d = EMPTY
print("Empty set through construction (%s) and intersection (%s)" % (d, b.intersection(ContinuousSet(3, 100))))

single_element_set = b.intersection(c)
print("Set with only one element %s" % single_element_set)

Applying arbitrary operations on continuous sets can produce union sets. These are disjoint unions of continuous sets. Additionally union sets can be constructed by their constructors or from strings.

[ ]:
from jpt.base.intervals import UnionSet

c_union_b_difference_a = c.union(b).difference(a)

print("UnionSet from set operations %s" % c_union_b_difference_a)

e = UnionSet([c, ContinuousSet(100, 200)])
print("UnionSet from construction %s" % e)

Union sets can also be simplified. The simplification ensures that all sets are disjoint.

[ ]:
joint_union_set = UnionSet([a, b])
print("Not simplified UnionSet %s; Simplified UnionSet %s" % (joint_union_set, joint_union_set.simplify()))

Variable Assignments

All kinds of information that is passed to JPTs is stored in VariableAssignments. VariableAssignments are either LabelAssignments or ValueAssignments. For users, LabelAssignments are the more interesting datastructure. LabelAssignments are extensions of dictionaries in python that map variables to values. Semantically they describe the (partial) information that an agent provides to the probability distributions. The easiest method to create them, is by binding python dictionaries through the jpt.trees.JPT.bind method. Additionally they can be created through

  • their constructor

  • from ValueAssignments

  • through the jpt.trees.JPT._preprocess_query method. The latter should only be used by developers, as indicated by the _ in the beginning of the function name. Also, dictionary like updating is supported.

To create LabelAssignments through JPTs we first have to fit one. For that we will use the iris toy-datasets.

[ ]:
import pandas as pd
import jpt.trees
import jpt.variables
from jpt.variables import infer_from_dataframe
import sklearn.datasets

dataset, y = sklearn.datasets.load_iris(as_frame=True, return_X_y=True)

for idx, name in enumerate(['setosa', 'versicolor', 'virginica']):
    y[y==idx] = name

dataset["leaf"] = y

model = jpt.trees.JPT(infer_from_dataframe(dataset), min_samples_leaf=0.1)
model.fit(dataset)

# create the LabelAssignment through binding
query = {"leaf" : {"setosa", "versicolor"},
         "sepal length (cm)" : [5,6]}

bounded = model.bind(query)
print("Bounded query from python dictionary %s" % bounded)

# create it through direct constructor calling
query_ = jpt.variables.LabelAssignment({model.varnames["leaf"]: {"setosa", "versicolor"}}.items())
query_[model.varnames["sepal length (cm)"]] = ContinuousSet(5, 6)
print("Direct construction of a LabelAssignment %s" % query_)

ValueAssignments are very similar to LabelAssignments. However, they use representation of variables inside of trees, i.e. every discrete value is replaced by its index in distributions and continuous sets are scaled with respect to the preprocessing of the variables. ValueAssignments can be created like LabelAssignments; they also can be converted from one to the other by calling the respective method.

[83]:
print("Intern Representation of the query from the previous example %s", bounded.value_assignment())
print("Extern Representation of the query from the previous example %s", bounded.value_assignment().label_assignment())
Intern Representation of the query from the previous example %s <ValueAssignment {leaf: {0, 1}, sepal length (cm): <ContinuousSet=[-0.983,-0.011]>}>
Extern Representation of the query from the previous example %s <LabelAssignment {leaf: {'setosa', 'versicolor'}, sepal length (cm): <ContinuousSet=[5.000,6.000]>}>