Inconsistent coefficient values #785

matwroblewski · 2024-04-18T06:49:15Z

I currently have version 2.5.1 of Glum and the latest version of Tabmat (3.1.14). Following your example from the Git repository, I retrained the model twice. Surprisingly, each time I obtained slightly different coefficient values, with changes appearing from the 14th decimal place.

In an attempt to ensure consistency, I conducted a similar test using your example while including the 'random_state' parameter. Despite my expectations for stable results, discrepancies persist.

from sklearn.datasets import fetch_openml
from glum import GeneralizedLinearRegressor

house_data = fetch_openml(name="house_sales", version=3, as_frame=True)

X = house_data.data[
    [
       "bedrooms",
         "bathrooms",
         "sqft_living",
         "floors",
         "waterfront",
         "view",
         "condition",
         "grade",
         "yr_built",
         "yr_renovated",
     ]
 ].copy()

price = house_data.target
y = (price < price.median()).values.astype(int)
model = GeneralizedLinearRegressor(
    family='binomial',
    l1_ratio=1.0,
    alpha=0.001,
    random_state=1,
)

model.fit(X=X, y=y)

model.coef_[1]
# 1st fit: -0.49335439989864244
# 2nd fit: -0.49335439989865093

Differences occur for both the irls-cd and irls-ls solver.

I would greatly appreciate it if you could investigate this issue further.

The text was updated successfully, but these errors were encountered:

jtilly · 2024-04-18T07:01:27Z

This is due to OpenMP:

(glum) ➜  ~/glum git:(main) ✗ python issue_785.py 
-0.493354399898778
-0.4933543998987746
(glum) ➜  ~/glum git:(main) ✗ OMP_NUM_THREADS=1 python issue_785.py
-0.49335439989877533
-0.49335439989877533

When you set OMP_NUM_THREADS=1, you'll get consistent results.

Quantco/tabmat#348 addressed this for products involving a CategoricalMatrix, but apparently, that wasn't the only place, where we're running into this issue in our code base.

matwroblewski · 2024-04-25T19:33:34Z

Thanks for your answer. Adding an environment variable solved the problem. However, I noticed that I get different results on Windows and Ubuntu. Is this normal behavior?

model.coef_[1]
# -0.4933543998986485 Ubuntu
# -0.4933543998986456 Windows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent coefficient values #785

Inconsistent coefficient values #785

matwroblewski commented Apr 18, 2024

jtilly commented Apr 18, 2024

matwroblewski commented Apr 25, 2024

Inconsistent coefficient values #785

Inconsistent coefficient values #785

Comments

matwroblewski commented Apr 18, 2024

jtilly commented Apr 18, 2024

matwroblewski commented Apr 25, 2024