{ "cells": [ { "cell_type": "markdown", "source": [ "# Integration with MLflow\n", "\n", "In this tutorial we will investigate the integration of JPTs in an MLOps framework such as mlflow.\n", "For this tutorial you need an mlflow server instance that is reachable via your system variable MLFLOW_TRACKING_URI.\n", "\n", "First we will load the necessary modules. As a toy dataset we will use the sklearn wine dataset." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 1, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:root:Version file not found at /home/tom_sch/jpt-dev/src/jpt/.version\n" ] } ], "source": [ "import mlflow\n", "\n", "import jpt\n", "from jpt.mlflow_wrapper import JPTWrapper, Schema\n", "import sklearn.datasets\n", "import pandas as pd\n", "import numpy as np\n", "import os\n", "\n", "dataset = sklearn.datasets.load_wine()\n", "df = pd.DataFrame(columns=dataset.feature_names, data=dataset.data)\n", "\n", "target = dataset.target.astype(object)\n", "for idx, target_name in enumerate(dataset.target_names):\n", " target[target == idx] = target_name\n", "\n", "df[\"wine\"] = target" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-05-12T16:52:24.991342Z", "end_time": "2023-05-12T16:52:25.827146Z" } } }, { "cell_type": "markdown", "source": [ "Next we will fit a small tree to the dataset using the mlflfow run workflow." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 2, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:/jpt:('Preprocessing data...',)\n", "INFO:/jpt:('Data transformation... 178 x 14',)\n", "INFO:/jpt:('Learning prior distributions...',)\n", "INFO:/jpt:('14 prior distributions learnt in 0:00:00.002458.',)\n", "INFO:/jpt:('Started learning of 178 x 14 at 2023-05-12 16:52:26.005500 requiring at least 17 samples per leaf',)\n", "INFO:/jpt:('Learning is generative. ',)\n", "INFO:/jpt:('Learning took 0:00:00.011257',)\n" ] }, { "data": { "text/plain": "" }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# from dotenv import load_dotenv\n", "# load_dotenv(dotenv_path=os.path.join(os.path.expanduser(\"~\"), \".bashrc\"))\n", "\n", "run = mlflow.start_run(run_name=\"Wine\")\n", "model = jpt.JPT(jpt.infer_from_dataframe(df, scale_numeric_types=False), min_samples_leaf=0.1)\n", "model.fit(df)\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-05-12T16:52:25.829008Z", "end_time": "2023-05-12T16:52:26.037914Z" } } }, { "cell_type": "markdown", "source": [ "After the training we will record the hyperparameters of the tree and some metrics." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 3, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:/jpt:('Preprocessing data...',)\n" ] } ], "source": [ "mlflow.log_params(model.get_hyperparameters_dict())\n", "average_log_likelihood = np.average(np.log(model.likelihood(df)))\n", "mlflow.log_metric(\"average_log_likelihood\", average_log_likelihood)\n", "mlflow.log_metric(\"number_of_parameters\", model.number_of_parameters())" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-05-12T16:52:26.025551Z", "end_time": "2023-05-12T16:52:26.130258Z" } } }, { "cell_type": "markdown", "source": [ "Next, we will log the model to mlflow, such that it can be used for model registry, experiment tracking, etc." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 4, "outputs": [], "source": [ "model_path = os.path.join(os.path.expanduser(\"~\"), \"Documents\", \"wine.jpt\")\n", "model.save(model_path)\n", "mlflow.pyfunc.log_model(\n", " artifact_path=\"wine\",\n", " python_model=JPTWrapper(),\n", " code_path=[os.path.join(os.getcwd(), \"tutorial_mlflow.ipynb\")],\n", " artifacts={\"jpt_model_path\": model_path},\n", " signature=mlflow.models.ModelSignature(Schema(model.variables))\n", ")\n", "mlflow.end_run()" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-05-12T16:52:26.132645Z", "end_time": "2023-05-12T16:52:27.863456Z" } } }, { "cell_type": "markdown", "source": [ "Finally, we will load the model from the mlflow model storage to verify that it worked." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 5, "outputs": [], "source": [ "loaded_model = mlflow.pyfunc.load_model(model_uri=run.info.artifact_uri+\"/wine\")\n", "loaded_model = loaded_model.unwrap_python_model().model" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-05-12T16:52:27.866064Z", "end_time": "2023-05-12T16:52:27.945472Z" } } } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }