Regression

This example will guide through a very noisy regression problem. First we have to create the regression data. The data will originate from the function \(x \cdot sin(x)\).

[5]:
import pandas as pd
import numpy as np
import logging

logging.getLogger("/jpt").setLevel(0)
def f(x):
    return x * np.sin(x)

def generate_data(func, x_lower, x_upper, n):
    '''
    Generate a ``DataFrame`` of ``n`` data samples with additive noise from the function ``func``.
    '''
    X = np.atleast_2d(np.random.uniform(-20, 0.0, size=int(n / 2))).T
    X = np.vstack((np.atleast_2d(np.random.uniform(0, 10.0, size=int(n / 2))).T, X))
    X = X.astype(np.float32)
    X = np.array(list(sorted(X)))

    # Observations
    y = func(X).ravel()

    # Add some noise
    dy = 1.5 + .5 * np.random.random(y.shape)
    y += np.random.normal(0, dy)
    y = y.astype(np.float32)

    return pd.DataFrame(data={'x': X.ravel(), 'y': y})

df = generate_data(f, -20, 10, 1000)
df
[5]:
x y
0 -19.944950 18.102259
1 -19.905794 16.782999
2 -19.874876 16.939438
3 -19.783342 12.793908
4 -19.721155 18.745640
... ... ...
995 9.941216 -4.190035
996 9.944221 -7.070674
997 9.953140 -1.843960
998 9.957382 -3.120459
999 9.965003 -2.542398

1000 rows × 2 columns

Lets have a look at the data visually:

[6]:
import matplotlib
import matplotlib.pyplot as plt

plt.plot(df['x'].values, df['y'].values, '.', markersize=5, label='Training data')
plt.show()
../_images/notebooks_tutorial_regression_3_0.png

We can clearly see the original function, and we can observe that there is more variance towards higher x. In general, we don’t want a model that assumes independent noise, and we don’t want to specify much prior knowledge since it is only available rarely. Let’s specify a JPT that focuses on predicting y accurately.

[7]:
from jpt.variables import NumericVariable, VariableMap
from jpt.trees import JPT

# Construct the variables, soften up x via blur
varx = NumericVariable('x', blur=.05)
vary = NumericVariable('y')

# enable a focus on the parameters for y with the targets keyword
jpt = JPT(variables=[varx, vary], targets=[vary], min_samples_leaf=.01, min_impurity_improvement=0.01)

jpt.learn(data=df)
[7]:
JPT
<DecisionNode #0 x = [x < -15.944; -15.944 ≤ x]; parent-#: None; #children: 2>
    <DecisionNode #1 x = [x < -18.815; -18.815 ≤ x]; parent-#: 0; #children: 2>
        <DecisionNode #3 x = [x < -19.231; -19.231 ≤ x]; parent-#: 1; #children: 2>
            <Leaf #7; parent: <DecisionNode #3>>
            <Leaf #8; parent: <DecisionNode #3>>
        <DecisionNode #4 x = [x < -18.336; -18.336 ≤ x]; parent-#: 1; #children: 2>
            <Leaf #9; parent: <DecisionNode #4>>
            <DecisionNode #10 x = [x < -16.596; -16.596 ≤ x]; parent-#: 4; #children: 2>
                <DecisionNode #15 x = [x < -17.973; -17.973 ≤ x]; parent-#: 10; #children: 2>
                    <Leaf #23; parent: <DecisionNode #15>>
                    <DecisionNode #24 x = [x < -17.113; -17.113 ≤ x]; parent-#: 15; #children: 2>
                        <DecisionNode #33 x = [x < -17.637; -17.637 ≤ x]; parent-#: 24; #children: 2>
                            <Leaf #47; parent: <DecisionNode #33>>
                            <Leaf #48; parent: <DecisionNode #33>>
                        <Leaf #34; parent: <DecisionNode #24>>
                <Leaf #16; parent: <DecisionNode #10>>
    <DecisionNode #2 x = [x < -12.830; -12.830 ≤ x]; parent-#: 0; #children: 2>
        <DecisionNode #5 x = [x < -15.458; -15.458 ≤ x]; parent-#: 2; #children: 2>
            <Leaf #11; parent: <DecisionNode #5>>
            <DecisionNode #12 x = [x < -13.342; -13.342 ≤ x]; parent-#: 5; #children: 2>
                <DecisionNode #17 x = [x < -15.038; -15.038 ≤ x]; parent-#: 12; #children: 2>
                    <Leaf #25; parent: <DecisionNode #17>>
                    <DecisionNode #26 x = [x < -14.795; -14.795 ≤ x]; parent-#: 17; #children: 2>
                        <Leaf #35; parent: <DecisionNode #26>>
                        <DecisionNode #36 x = [x < -13.710; -13.710 ≤ x]; parent-#: 26; #children: 2>
                            <DecisionNode #49 x = [x < -14.202; -14.202 ≤ x]; parent-#: 36; #children: 2>
                                <Leaf #65; parent: <DecisionNode #49>>
                                <Leaf #66; parent: <DecisionNode #49>>
                            <Leaf #50; parent: <DecisionNode #36>>
                <Leaf #18; parent: <DecisionNode #12>>
        <DecisionNode #6 x = [x < -9.694; -9.694 ≤ x]; parent-#: 2; #children: 2>
            <DecisionNode #13 x = [x < -12.219; -12.219 ≤ x]; parent-#: 6; #children: 2>
                <Leaf #19; parent: <DecisionNode #13>>
                <DecisionNode #20 x = [x < -11.989; -11.989 ≤ x]; parent-#: 13; #children: 2>
                    <Leaf #27; parent: <DecisionNode #20>>
                    <DecisionNode #28 x = [x < -10.479; -10.479 ≤ x]; parent-#: 20; #children: 2>
                        <DecisionNode #37 x = [x < -11.625; -11.625 ≤ x]; parent-#: 28; #children: 2>
                            <Leaf #51; parent: <DecisionNode #37>>
                            <Leaf #52; parent: <DecisionNode #37>>
                        <Leaf #38; parent: <DecisionNode #28>>
            <DecisionNode #14 x = [x < 6.464; 6.464 ≤ x]; parent-#: 6; #children: 2>
                <DecisionNode #21 x = [x < -6.466; -6.466 ≤ x]; parent-#: 14; #children: 2>
                    <DecisionNode #29 x = [x < -9.112; -9.112 ≤ x]; parent-#: 21; #children: 2>
                        <Leaf #39; parent: <DecisionNode #29>>
                        <DecisionNode #40 x = [x < -7.079; -7.079 ≤ x]; parent-#: 29; #children: 2>
                            <DecisionNode #53 x = [x < -8.435; -8.435 ≤ x]; parent-#: 40; #children: 2>
                                <Leaf #67; parent: <DecisionNode #53>>
                                <DecisionNode #68 x = [x < -7.907; -7.907 ≤ x]; parent-#: 53; #children: 2>
                                    <Leaf #81; parent: <DecisionNode #68>>
                                    <Leaf #82; parent: <DecisionNode #68>>
                            <Leaf #54; parent: <DecisionNode #40>>
                    <DecisionNode #30 x = [x < 3.471; 3.471 ≤ x]; parent-#: 21; #children: 2>
                        <DecisionNode #41 x = [x < -3.720; -3.720 ≤ x]; parent-#: 30; #children: 2>
                            <DecisionNode #55 x = [x < -5.854; -5.854 ≤ x]; parent-#: 41; #children: 2>
                                <DecisionNode #69 x = [x < -6.143; -6.143 ≤ x]; parent-#: 55; #children: 2>
                                    <Leaf #83; parent: <DecisionNode #69>>
                                    <Leaf #84; parent: <DecisionNode #69>>
                                <DecisionNode #70 x = [x < -5.480; -5.480 ≤ x]; parent-#: 55; #children: 2>
                                    <Leaf #85; parent: <DecisionNode #70>>
                                    <DecisionNode #86 x = [x < -4.448; -4.448 ≤ x]; parent-#: 70; #children: 2>
                                        <DecisionNode #95 x = [x < -5.187; -5.187 ≤ x]; parent-#: 86; #children: 2>
                                            <Leaf #103; parent: <DecisionNode #95>>
                                            <Leaf #104; parent: <DecisionNode #95>>
                                        <Leaf #96; parent: <DecisionNode #86>>
                            <DecisionNode #56 x = [x < 2.985; 2.985 ≤ x]; parent-#: 41; #children: 2>
                                <DecisionNode #71 x = [x < 0.987; 0.987 ≤ x]; parent-#: 56; #children: 2>
                                    <DecisionNode #87 x = [x < -0.231; -0.231 ≤ x]; parent-#: 71; #children: 2>
                                        <DecisionNode #97 x = [x < -2.979; -2.979 ≤ x]; parent-#: 87; #children: 2>
                                            <Leaf #105; parent: <DecisionNode #97>>
                                            <DecisionNode #106 x = [x < -0.756; -0.756 ≤ x]; parent-#: 97; #children: 2>
                                                <DecisionNode #113 x = [x < -2.410; -2.410 ≤ x]; parent-#: 106; #children: 2>
                                                    <Leaf #119; parent: <DecisionNode #113>>
                                                    <DecisionNode #120 x = [x < -1.058; -1.058 ≤ x]; parent-#: 113; #children: 2>
                                                        <Leaf #123; parent: <DecisionNode #120>>
                                                        <Leaf #124; parent: <DecisionNode #120>>
                                                <Leaf #114; parent: <DecisionNode #106>>
                                        <DecisionNode #98 x = [x < 0.549; 0.549 ≤ x]; parent-#: 87; #children: 2>
                                            <DecisionNode #107 x = [x < 0.300; 0.300 ≤ x]; parent-#: 98; #children: 2>
                                                <DecisionNode #115 x = [x < 0.106; 0.106 ≤ x]; parent-#: 107; #children: 2>
                                                    <Leaf #121; parent: <DecisionNode #115>>
                                                    <Leaf #122; parent: <DecisionNode #115>>
                                                <Leaf #116; parent: <DecisionNode #107>>
                                            <DecisionNode #108 x = [x < 0.851; 0.851 ≤ x]; parent-#: 98; #children: 2>
                                                <Leaf #117; parent: <DecisionNode #108>>
                                                <Leaf #118; parent: <DecisionNode #108>>
                                    <DecisionNode #88 x = [x < 2.461; 2.461 ≤ x]; parent-#: 71; #children: 2>
                                        <DecisionNode #99 x = [x < 2.127; 2.127 ≤ x]; parent-#: 88; #children: 2>
                                            <Leaf #109; parent: <DecisionNode #99>>
                                            <Leaf #110; parent: <DecisionNode #99>>
                                        <DecisionNode #100 x = [x < 2.797; 2.797 ≤ x]; parent-#: 88; #children: 2>
                                            <Leaf #111; parent: <DecisionNode #100>>
                                            <Leaf #112; parent: <DecisionNode #100>>
                                <DecisionNode #72 x = [x < 3.257; 3.257 ≤ x]; parent-#: 56; #children: 2>
                                    <Leaf #89; parent: <DecisionNode #72>>
                                    <Leaf #90; parent: <DecisionNode #72>>
                        <DecisionNode #42 x = [x < 5.913; 5.913 ≤ x]; parent-#: 30; #children: 2>
                            <DecisionNode #57 x = [x < 3.791; 3.791 ≤ x]; parent-#: 42; #children: 2>
                                <Leaf #73; parent: <DecisionNode #57>>
                                <DecisionNode #74 x = [x < 5.714; 5.714 ≤ x]; parent-#: 57; #children: 2>
                                    <DecisionNode #91 x = [x < 4.162; 4.162 ≤ x]; parent-#: 74; #children: 2>
                                        <Leaf #101; parent: <DecisionNode #91>>
                                        <Leaf #102; parent: <DecisionNode #91>>
                                    <Leaf #92; parent: <DecisionNode #74>>
                            <DecisionNode #58 x = [x < 6.304; 6.304 ≤ x]; parent-#: 42; #children: 2>
                                <Leaf #75; parent: <DecisionNode #58>>
                                <Leaf #76; parent: <DecisionNode #58>>
                <DecisionNode #22 x = [x < 9.243; 9.243 ≤ x]; parent-#: 14; #children: 2>
                    <DecisionNode #31 x = [x < 7.011; 7.011 ≤ x]; parent-#: 22; #children: 2>
                        <DecisionNode #43 x = [x < 6.837; 6.837 ≤ x]; parent-#: 31; #children: 2>
                            <Leaf #59; parent: <DecisionNode #43>>
                            <Leaf #60; parent: <DecisionNode #43>>
                        <DecisionNode #44 x = [x < 8.834; 8.834 ≤ x]; parent-#: 31; #children: 2>
                            <DecisionNode #61 x = [x < 8.552; 8.552 ≤ x]; parent-#: 44; #children: 2>
                                <DecisionNode #77 x = [x < 7.556; 7.556 ≤ x]; parent-#: 61; #children: 2>
                                    <Leaf #93; parent: <DecisionNode #77>>
                                    <Leaf #94; parent: <DecisionNode #77>>
                                <Leaf #78; parent: <DecisionNode #61>>
                            <DecisionNode #62 x = [x < 8.957; 8.957 ≤ x]; parent-#: 44; #children: 2>
                                <Leaf #79; parent: <DecisionNode #62>>
                                <Leaf #80; parent: <DecisionNode #62>>
                    <DecisionNode #32 x = [x < 9.496; 9.496 ≤ x]; parent-#: 22; #children: 2>
                        <Leaf #45; parent: <DecisionNode #32>>
                        <DecisionNode #46 x = [x < 9.763; 9.763 ≤ x]; parent-#: 32; #children: 2>
                            <Leaf #63; parent: <DecisionNode #46>>
                            <Leaf #64; parent: <DecisionNode #46>>

JPT stats: #innernodes = 62, #leaves = 63 (125 total)

Since the data is 2D we can get a visual representation of the predictions of the JPT. Let’s plot the predictions and get an idea of the distribution. We will plot the expectation of y for every datapoint on the x-axis and confidence intervals that explain 95% of the variance.

[8]:
# Mesh the input space for evaluations of the real function,
# the prediction and its MSE
xx = np.atleast_2d(np.linspace(-20, 15, 500)).astype(np.float32).T

# Apply the JPT model
confidence = .95
conf_level = 0.95
my_predictions = [jpt.posterior([vary], evidence={varx: x_}, fail_on_unsatisfiability=False) for x_ in xx.ravel()]
y_pred_ = [(p[vary].expectation() if p is not None else None) for p in my_predictions]
y_lower_ = [(p[vary].ppf.eval((1 - conf_level) / 2) if p is not None else None) for p in my_predictions]
y_upper_ = [(p[vary].ppf.eval(1 - (1 - conf_level) / 2) if p is not None else None) for p in my_predictions]

# posterior = jpt.posterior([varx], {vary: 0})

# Plot the function, the prediction and the 90% confidence interval based on the MSE
plt.plot(xx, f(xx), color='black', linestyle=':', linewidth='2', label=r'$f(x) = x\,\sin(x)$')
plt.plot(df['x'].values, df['y'].values, '.', color='gray', markersize=5, label='Training data')
plt.plot(xx, y_pred_, 'm-', label='JPT Prediction', linewidth=2)
plt.plot(xx, y_lower_, 'y--', label='%.1f%% Confidence bands' % (confidence * 100))
plt.plot(xx, y_upper_, 'y--')
# plt.plot(xx, np.asarray(posterior.distributions[varx].pdf.multi_eval(xx.ravel().astype(np.float64))),
#          label='Posterior')
plt.plot(xx, np.array([jpt.pdf(VariableMap([(varx, x_), (vary, 0)])) for x_ in xx.ravel().astype(np.float64)]),
         label='Posterior')

plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.ylim(-10, 20)
plt.xlim(-25, 15)
plt.legend(loc='upper left')
plt.title(r'2D Regression Example ($\vartheta=%.2f\%%$)' % (confidence * 100))
plt.grid()

plt.show()
../_images/notebooks_tutorial_regression_7_0.png

As we can see the expectations and variances predicted by the JPT characterize this dataset well and the variance even adjusts to the different x values. Impressive!