{ "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 0, "cells": [ { "cell_type": "markdown", "source": "# Learning of Joint Probability Distributions\n\nThis tutorial introduces the basics of learning joint probability distributions\nwith `jpt.trees.JPT`.\nA JPT is trained on tabular data and learns a compact tree-structured\nrepresentation of the joint distribution $P(\\mathcal{X})$ over all variables\nin the dataset.\n\nWe use the [Iris dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset)\nthroughout this tutorial as a small, well-understood example with both numeric\nand symbolic variables.", "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "## Preparing the Data\n", "\n", "JPTs consume plain `pandas.DataFrame` objects.\n", "We load the Iris dataset and add the class label as a string column so that\n", "``pyjpt`` can recognise it as a symbolic variable." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "import sklearn.datasets\n", "import pandas as pd\n", "\n", "dataset = sklearn.datasets.load_iris()\n", "df = pd.DataFrame(columns=dataset.feature_names, data=dataset.data)\n", "\n", "target = dataset.target.astype(object)\n", "for idx, name in enumerate(dataset.target_names):\n", " target[target == idx] = name\n", "\n", "df['plant'] = target\n", "df.head()" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2026-03-16T20:28:33.003554968Z", "start_time": "2026-03-16T20:28:32.154203436Z" } }, "outputs": [ { "data": { "text/plain": [ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n", "0 5.1 3.5 1.4 0.2 \n", "1 4.9 3.0 1.4 0.2 \n", "2 4.7 3.2 1.3 0.2 \n", "3 4.6 3.1 1.5 0.2 \n", "4 5.0 3.6 1.4 0.2 \n", "\n", " plant \n", "0 setosa \n", "1 setosa \n", "2 setosa \n", "3 setosa \n", "4 setosa " ], "text/html": [ "
| \n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "plant | \n", "
|---|---|---|---|---|---|
| 0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "setosa | \n", "
| 1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "setosa | \n", "
| 2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "setosa | \n", "
| 3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "setosa | \n", "
| 4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "setosa | \n", "