Integration with MLflow
In this tutorial we will investigate the integration of JPTs in an MLOps framework such as mlflow. For this tutorial you need an mlflow server instance that is reachable via your system variable MLFLOW_TRACKING_URI.
First we will load the necessary modules. As a toy dataset we will use the sklearn wine dataset.
[1]:
import mlflow
import jpt
from jpt.mlflow_wrapper import JPTWrapper, Schema
import sklearn.datasets
import pandas as pd
import numpy as np
import os
dataset = sklearn.datasets.load_wine()
df = pd.DataFrame(columns=dataset.feature_names, data=dataset.data)
target = dataset.target.astype(object)
for idx, target_name in enumerate(dataset.target_names):
target[target == idx] = target_name
df["wine"] = target
WARNING:root:Version file not found at /home/tom_sch/jpt-dev/src/jpt/.version
Next we will fit a small tree to the dataset using the mlflfow run workflow.
[2]:
# from dotenv import load_dotenv
# load_dotenv(dotenv_path=os.path.join(os.path.expanduser("~"), ".bashrc"))
run = mlflow.start_run(run_name="Wine")
model = jpt.JPT(jpt.infer_from_dataframe(df, scale_numeric_types=False), min_samples_leaf=0.1)
model.fit(df)
INFO:/jpt:('Preprocessing data...',)
INFO:/jpt:('Data transformation... 178 x 14',)
INFO:/jpt:('Learning prior distributions...',)
INFO:/jpt:('14 prior distributions learnt in 0:00:00.002458.',)
INFO:/jpt:('Started learning of 178 x 14 at 2023-05-12 16:52:26.005500 requiring at least 17 samples per leaf',)
INFO:/jpt:('Learning is generative. ',)
INFO:/jpt:('Learning took 0:00:00.011257',)
[2]:
<JPT #innernodes = 7, #leaves = 8 (15 total)>
After the training we will record the hyperparameters of the tree and some metrics.
[3]:
mlflow.log_params(model.get_hyperparameters_dict())
average_log_likelihood = np.average(np.log(model.likelihood(df)))
mlflow.log_metric("average_log_likelihood", average_log_likelihood)
mlflow.log_metric("number_of_parameters", model.number_of_parameters())
INFO:/jpt:('Preprocessing data...',)
Next, we will log the model to mlflow, such that it can be used for model registry, experiment tracking, etc.
[4]:
model_path = os.path.join(os.path.expanduser("~"), "Documents", "wine.jpt")
model.save(model_path)
mlflow.pyfunc.log_model(
artifact_path="wine",
python_model=JPTWrapper(),
code_path=[os.path.join(os.getcwd(), "tutorial_mlflow.ipynb")],
artifacts={"jpt_model_path": model_path},
signature=mlflow.models.ModelSignature(Schema(model.variables))
)
mlflow.end_run()
Finally, we will load the model from the mlflow model storage to verify that it worked.
[5]:
loaded_model = mlflow.pyfunc.load_model(model_uri=run.info.artifact_uri+"/wine")
loaded_model = loaded_model.unwrap_python_model().model