{ "cells": [ { "cell_type": "markdown", "source": [ "# Important Datastructures\n", "\n", "In this tutorial we will walk through important datastructures that users will encounter while using ``pyjpt``.\n", "\n", "## Sets\n", "\n", "As sets are ubiquitous objects of interest in almost every mathematical theory ``pyjpt`` provides fast and flexible implementations of all kinds of sets.\n", "\n", "### Discrete Sets\n", "\n", "Domains of [jpt.variables.SymbolicVariable](../autoapi/jpt/variables/index.html#jpt.variables.SymbolicVariable) and [jpt.variables.IntegerVariable](../autoapi/jpt/variables/index.html#jpt.variables.IntegerVariable) are ordinary python sets. These can be constructed by calling the python set constructor." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 76, "outputs": [], "source": [ "symbolic_set = {\"Dog\", \"Cat\", \"Mouse\"}\n", "integer_set = {1, 2, 3}" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": "For [jpt.variables.SymbolicVariable](../autoapi/jpt/variables/index.html#jpt.variables.SymbolicVariable) a set of strings is usable and for [jpt.variables.IntegerVariable](../autoapi/jpt/variables/index.html#jpt.variables.IntegerVariable) a set of integers is required.\n\n### Continuous Sets\n\nAs real world applications often contain variables with a continuous domain ``pyjpt`` implements [jpt.base.intervals.ContinuousSet](../autoapi/jpt/base/intervals.html#jpt.base.intervals.ContinuousSet) and [jpt.base.intervals.UnionSet](../autoapi/jpt/base/intervals.html#jpt.base.intervals.UnionSet) as domain for\n[numeric random variables](../autoapi/jpt/variables/index.html#jpt.variables.NumericVariable).\nContinuous sets represent intervals on $\\mathbb{R}$ and work very similar to python sets. A continuous set can be created by importing the package and\n * calling the constructor\n * parsing it from string", "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": "from jpt.base.intervals import ContinuousSet\n\na = ContinuousSet(0, 1)\nb = ContinuousSet.fromstring(\"[1, 2)\")\nc = ContinuousSet(-1, 1)\n\na, b, c", "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "The usual set operators are also applicable on continuous sets." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 78, "outputs": [ { "data": { "text/plain": "(,\n ,\n )" }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a_union_b = a.union(b)\n", "a_difference_b = a.difference(b)\n", "a_intersection_c = a.intersection(c)\n", "\n", "a_union_b, a_difference_b, a_intersection_c" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "It should be noted that sets can also be empty or contain only one single element." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": "from jpt.base.intervals import EMPTY\nd = EMPTY\nprint(\"Empty set through construction (%s) and intersection (%s)\" % (d, b.intersection(ContinuousSet(3, 100))))\n\nsingle_element_set = b.intersection(c)\nprint(\"Set with only one element %s\" % single_element_set)", "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": "Applying arbitrary operations on continuous sets can produce [union sets](../autoapi/jpt/base/intervals/index.html#jpt.base.intervals.UnionSet). These are disjoint unions of continuous sets.\nAdditionally union sets can be constructed by their constructors or from strings.", "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": "from jpt.base.intervals import UnionSet\n\nc_union_b_difference_a = c.union(b).difference(a)\n\nprint(\"UnionSet from set operations %s\" % c_union_b_difference_a)\n\ne = UnionSet([c, ContinuousSet(100, 200)])\nprint(\"UnionSet from construction %s\" % e)", "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": "Union sets can also be simplified. The simplification ensures that all sets are disjoint.", "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": "joint_union_set = UnionSet([a, b])\nprint(\"Not simplified UnionSet %s; Simplified UnionSet %s\" % (joint_union_set, joint_union_set.simplify()))", "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": "## Variable Assignments\n\nAll kinds of information that is passed to JPTs is stored in VariableAssignments. VariableAssignments are either LabelAssignments or ValueAssignments. For users, LabelAssignments are the more interesting datastructure. LabelAssignments are extensions of dictionaries in python that map variables to values. Semantically they describe the (partial) information that an agent provides to the probability distributions. The easiest method to create them, is by binding python dictionaries through the jpt.trees.JPT.bind method. Additionally they can be created through\n * their constructor\n * from ValueAssignments\n * through the jpt.trees.JPT._preprocess_query method.\nThe latter should only be used by developers, as indicated by the _ in the beginning of the function name.\nAlso, dictionary like updating is supported.\n\nTo create LabelAssignments through JPTs we first have to fit one. For that we will use the iris toy-datasets.", "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": "import pandas as pd\nimport jpt.trees\nimport jpt.variables\nfrom jpt.variables import infer_from_dataframe\nimport sklearn.datasets\n\ndataset, y = sklearn.datasets.load_iris(as_frame=True, return_X_y=True)\n\nfor idx, name in enumerate(['setosa', 'versicolor', 'virginica']):\n y[y==idx] = name\n\ndataset[\"leaf\"] = y\n\nmodel = jpt.trees.JPT(infer_from_dataframe(dataset), min_samples_leaf=0.1)\nmodel.fit(dataset)\n\n# create the LabelAssignment through binding\nquery = {\"leaf\" : {\"setosa\", \"versicolor\"},\n \"sepal length (cm)\" : [5,6]}\n\nbounded = model.bind(query)\nprint(\"Bounded query from python dictionary %s\" % bounded)\n\n# create it through direct constructor calling\nquery_ = jpt.variables.LabelAssignment({model.varnames[\"leaf\"]: {\"setosa\", \"versicolor\"}}.items())\nquery_[model.varnames[\"sepal length (cm)\"]] = ContinuousSet(5, 6)\nprint(\"Direct construction of a LabelAssignment %s\" % query_)", "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "ValueAssignments are very similar to LabelAssignments. However, they use representation of variables inside of trees, i.e. every discrete value is replaced by its index in distributions and continuous sets are scaled with respect to the preprocessing of the variables. ValueAssignments can be created like LabelAssignments; they also can be converted from one to the other by calling the respective method." ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 83, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Intern Representation of the query from the previous example %s }>\n", "Extern Representation of the query from the previous example %s }>\n" ] } ], "source": [ "print(\"Intern Representation of the query from the previous example %s\", bounded.value_assignment())\n", "print(\"Extern Representation of the query from the previous example %s\", bounded.value_assignment().label_assignment())" ], "metadata": { "collapsed": false } } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }