Medium Article on `Tree-Boosting for Spatial Data` is hanging on Mac #84

IanQS · 2023-01-25T02:29:57Z

While trying to run the code for Tree-Boosting for Spatial Data it seems like the code is hanging? The code below is more-or-less lifted from the article.

My machine

python 3.11
conda 4.12.0
MacOS Ventura 13.1

Steps to replicate

Top of the script

import numpy as np
np.random.seed(1)
# Simulate Gaussian process: training and test data (the latter on a grid for visualization)
sigma2_1 = 0.35  # marginal variance of GP
rho = 0.1  # range parameter
sigma2 = 0.1  # error variance
n = 200  # number of training samples
nx = 50 # test data: number of grid points on each axis
# training locations (exclude upper right rectangle)
coords = np.column_stack((np.random.uniform(size=1)/2, np.random.uniform(size=1)/2))
while coords.shape[0] < n:
    coord_i = np.random.uniform(size=2)
    if not (coord_i[0] >= 0.6 and coord_i[1] >= 0.6):
        coords = np.vstack((coords,coord_i))
# test locations (rectangular grid)
s_1 = np.ones(nx * nx)
s_2 = np.ones(nx * nx)
for i in range(nx):
    for j in range(nx):
        s_1[j * nx + i] = (i + 1) / nx
        s_2[i * nx + j] = (i + 1) / nx
coords_test = np.column_stack((s_1, s_2))
n_all = nx**2 + n # total number of data points 
coords_all = np.vstack((coords_test,coords))
D = np.zeros((n_all, n_all))  # distance matrix
for i in range(0, n_all):
    for j in range(i + 1, n_all):
        D[i, j] = np.linalg.norm(coords_all[i, :] - coords_all[j, :])
        D[j, i] = D[i, j]
Sigma = sigma2_1 * np.exp(-D / rho) + np.diag(np.zeros(n_all) + 1e-10)
C = np.linalg.cholesky(Sigma)
b_all = C.dot(np.random.normal(size=n_all))
b_train = b_all[(nx*nx):n_all] # training data GP
# Mean function. Use two predictor variables of which only one has an effect for easy visualization
def f1d(x):
    return np.sin(3*np.pi*x) + (1 + 3 * np.maximum(np.zeros(len(x)),x-0.5)/(x-0.5)) - 3
X = np.random.rand(n, 2)
F_X_train = f1d(X[:, 0]) # mean
xi_train = np.sqrt(sigma2) * np.random.normal(size=n)  # simulate error term
y = F_X_train + b_train + xi_train  # observed data
# test data
x = np.linspace(0,1,nx**2)
x[x==0.5] = 0.5 + 1e-10
X_test = np.column_stack((x,np.zeros(nx**2)))
F_X_test = f1d(X_test[:, 0])
b_test = b_all[0:(nx**2)]
xi_test = np.sqrt(sigma2) * np.random.normal(size=(nx**2))
y_test = F_X_test + b_test + xi_test

Bottom of the script

Modeling

import gpboost as gpb
gp_model = gpb.GPModel(gp_coords=coords, cov_function="exponential")
data_train = gpb.Dataset(X, y)
params = { 'objective': 'regression_l2', 'learning_rate': 0.01,
            'max_depth': 3, 'min_data_in_leaf': 10, 
            'num_leaves': 2**10, 'verbose': 1}
# Training
bst = gpb.train(params=params, train_set=data_train,
                gp_model=gp_model, num_boost_round=247)
gp_model.summary() # Estimated covariance parameters
# Make predictions: latent variables and response variable
pred = bst.predict(data=X_test, gp_coords_pred=coords_test,  
                   predict_var=True, pred_latent=True)
# pred['fixed_effect']: predictions from the tree-ensemble.
# pred['random_effect_mean']: predicted means of the gp_model.
# pred['random_effect_cov']: predicted (co-)variances  of the gp_model
pred_resp = bst.predict(data=X_test, gp_coords_pred=coords_test, 
                        predict_var=False, pred_latent=False)
y_pred = pred_resp['response_mean'] # predicted response mean
# Calculate mean square error
np.mean((y_pred-y_test)**2)

It has been running for about 5 minutes now and is still going...

IanQS · 2023-01-25T02:49:42Z

Works just fine on my Linux machine

fabsig · 2023-01-25T14:23:44Z

Glad to hear that it runs on Linux. I hope that it also runs on the Mac (computational performance obviously depends on the machine you are using). Otherwise, it is hard to tell from the distance what might have gone wrong. To these examples run on your Mac?

IanQS · 2023-01-25T19:53:59Z

Nope, it hangs :( It stalls on the instantiation line, gp_model = gpb.GPModel(group_data=group, likelihood=likelihood)

fabsig · 2023-01-26T15:26:56Z

That's not good. Unfortunately, I cannot reproduce this on my Apple silicon machine as it works without any problems. I might investigate this issue sometimes in the future. For the time being, the only thing that I can recommend is trying to install from source: https://github.com/fabsig/GPBoost/tree/master/python-package#installation-from-source

fabsig · 2023-01-26T15:37:45Z

FWIW, several Python packages seem to have problems on M1 macs; see, e.g., microsoft/LightGBM#4843

IanQS · 2023-01-27T16:40:00Z

Ahh, gotcha! I'm running into

Exception: Please install CMake and all required dependencies first
The full version of error log was saved into /Users/ianquah/GPBoost_compilation.log

when doing an Installation from source from GitHub at the python setup.py install.

When doing Installation from source from PyPI, it installs just fine (pip install --no-binary :all: gpboost) but it hangs again

StephenRogers1 · 2023-08-02T07:34:45Z

Found this problematic as well and reproduced the stalling on the instantiation line gp_model = gpb.GPModel(group_data=group, likelihood=likelihood) on an Apple M2 machine (python 3.9, conda 23.7.2, MacOS 13.0)

The error seems to occur due to conflicting conda-forge package installation and pip-installed gpboost. That is packages that share dependencies (I think) with gpboost should be installed using pip.

Steps to reproduce error

brew install miniforge
conda create -n env_conda -c conda-forge python=3.9
conda activate env_conda
pip install gpboost -U
conda install lightgbm

(Note conda install scikit-learn also produces error)

Then running gp_model = gpb.GPModel(group_data=group, likelihood=likelihood) will lag

Steps to fix error

As it seems the shared dependencies should be installed by pip so making sure not to conda install any or using a python virtualenv fixes this

Using miniforge

... (as above)
pip install lightgbm

Using virtualenv

python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate
python3 -m pip install gpboost

Doing either and then running gp_model = gpb.GPModel(group_data=group, likelihood=likelihood) will work

If followed brew install miniforge use conda list to make sure all sharing packages i.e. (scikit-learn, lightgbm,...) are installed through pip, this should fix the lagging issue.

IanQS changed the title ~~Medium Article on Tree-Boosting for Spatial Data is hanging on~~ Medium Article on Tree-Boosting for Spatial Data is hanging on Mac Jan 25, 2023

fabsig added the bug Something isn't working label Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Medium Article on `Tree-Boosting for Spatial Data` is hanging on Mac #84

Medium Article on `Tree-Boosting for Spatial Data` is hanging on Mac #84

IanQS commented Jan 25, 2023

IanQS commented Jan 25, 2023

fabsig commented Jan 25, 2023

IanQS commented Jan 25, 2023 •

edited

fabsig commented Jan 26, 2023

fabsig commented Jan 26, 2023

IanQS commented Jan 27, 2023

StephenRogers1 commented Aug 2, 2023

Medium Article on Tree-Boosting for Spatial Data is hanging on Mac #84

Medium Article on Tree-Boosting for Spatial Data is hanging on Mac #84

Comments

IanQS commented Jan 25, 2023

My machine

Steps to replicate

Modeling

IanQS commented Jan 25, 2023

fabsig commented Jan 25, 2023

IanQS commented Jan 25, 2023 • edited

fabsig commented Jan 26, 2023

fabsig commented Jan 26, 2023

IanQS commented Jan 27, 2023

StephenRogers1 commented Aug 2, 2023

Medium Article on `Tree-Boosting for Spatial Data` is hanging on Mac #84

Medium Article on `Tree-Boosting for Spatial Data` is hanging on Mac #84

IanQS commented Jan 25, 2023 •

edited