Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric Instability During Linear Regression With Large x-values #3881

Open
JackFielding opened this issue Feb 27, 2024 · 0 comments
Open

Numeric Instability During Linear Regression With Large x-values #3881

JackFielding opened this issue Feb 27, 2024 · 0 comments
Labels
bug For bugs or other software errors

Comments

@JackFielding
Copy link

JackFielding commented Feb 27, 2024

I was using the altair package (version 5.2.0), and the linear regression transform seems numerically unstable for large x values. The regression line fits the data poorly.

I was using a datetime x-axis. The number of epoch seconds since 1970 is 1,709,023,537 (milliseconds 1000x higher), so the x-values can be very large.

Related Issues

Minimal Example

In altair I used the following code (based on the example in https://altair-viz.github.io/user_guide/transform/regression.html):

import altair as alt
import pandas as pd
import numpy as np

np.random.seed(42)
x = np.linspace(0, 10)
x_big = x + 1_000_000_000
y = x - 5 + np.random.randn(len(x))

df = pd.DataFrame({'x': x, 'y': y, "x_big": x_big})

def scatter_with_regression(x):
    chart = alt.Chart(df).mark_point().encode(
        x=alt.X(x, scale=alt.Scale(zero=False)),
        y='y'
    )

    return chart + chart.transform_regression(x, 'y').mark_line()

(scatter_with_regression(x="x") | scatter_with_regression(x="x_big")).properties(title="Linear Regression Transformation Could Benefit from Mean Centering")

This code generated the generated the following chart:

linear_regression

The chart has a vega-lite JSON representation of:

{
  "config": {"view": {"continuousWidth": 300, "continuousHeight": 300}},
  "hconcat": [
    {
      "layer": [
        {
          "mark": {"type": "point"},
          "encoding": {
            "x": {
              "field": "x",
              "scale": {"zero": false},
              "type": "quantitative"
            },
            "y": {"field": "y", "type": "quantitative"}
          }
        },
        {
          "mark": {"type": "line"},
          "encoding": {
            "x": {
              "field": "x",
              "scale": {"zero": false},
              "type": "quantitative"
            },
            "y": {"field": "y", "type": "quantitative"}
          },
          "transform": [{"on": "x", "regression": "y"}]
        }
      ]
    },
    {
      "layer": [
        {
          "mark": {"type": "point"},
          "encoding": {
            "x": {
              "field": "x_big",
              "scale": {"zero": false},
              "type": "quantitative"
            },
            "y": {"field": "y", "type": "quantitative"}
          }
        },
        {
          "mark": {"type": "line"},
          "encoding": {
            "x": {
              "field": "x_big",
              "scale": {"zero": false},
              "type": "quantitative"
            },
            "y": {"field": "y", "type": "quantitative"}
          },
          "transform": [{"on": "x_big", "regression": "y"}]
        }
      ]
    }
  ],
  "data": {"name": "data-44ec9525e1d4ab566db24882f3a645ef"},
  "title": "Linear Regression Transformation Could Benefit from Mean Centering",
  "$schema": "https://vega.github.io/schema/vega-lite/v5.16.3.json",
  "datasets": {
    "data-44ec9525e1d4ab566db24882f3a645ef": [
      {"x": 0, "y": -4.503285846988767, "x_big": 1000000000},
      {
        "x": 0.20408163265306123,
        "y": -4.934182668518123,
        "x_big": 1000000000.2040817
      },
      {
        "x": 0.40816326530612246,
        "y": -3.9441481965931855,
        "x_big": 1000000000.4081633
      },
      {
        "x": 0.6122448979591837,
        "y": -2.8647252456327905,
        "x_big": 1000000000.6122448
      },
      {
        "x": 0.8163265306122449,
        "y": -4.417826844111091,
        "x_big": 1000000000.8163265
      },
      {
        "x": 1.0204081632653061,
        "y": -4.213728793683875,
        "x_big": 1000000001.0204082
      },
      {
        "x": 1.2244897959183674,
        "y": -2.196297388574241,
        "x_big": 1000000001.2244898
      },
      {
        "x": 1.4285714285714286,
        "y": -2.8039938422756623,
        "x_big": 1000000001.4285715
      },
      {
        "x": 1.6326530612244898,
        "y": -3.836821324710462,
        "x_big": 1000000001.6326531
      },
      {
        "x": 1.836734693877551,
        "y": -2.6207052625364846,
        "x_big": 1000000001.8367347
      },
      {
        "x": 2.0408163265306123,
        "y": -3.42260136628185,
        "x_big": 1000000002.0408163
      },
      {
        "x": 2.2448979591836737,
        "y": -3.220831794386583,
        "x_big": 1000000002.244898
      },
      {
        "x": 2.4489795918367347,
        "y": -2.309058136597231,
        "x_big": 1000000002.4489796
      },
      {
        "x": 2.6530612244897958,
        "y": -4.260219020168002,
        "x_big": 1000000002.6530613
      },
      {
        "x": 2.857142857142857,
        "y": -3.8677749753701756,
        "x_big": 1000000002.8571428
      },
      {
        "x": 3.0612244897959187,
        "y": -2.501063039445054,
        "x_big": 1000000003.0612245
      },
      {
        "x": 3.2653061224489797,
        "y": -2.747524997885444,
        "x_big": 1000000003.2653061
      },
      {
        "x": 3.4693877551020407,
        "y": -1.2163649123026854,
        "x_big": 1000000003.4693878
      },
      {
        "x": 3.673469387755102,
        "y": -2.2345546877661087,
        "x_big": 1000000003.6734694
      },
      {
        "x": 3.8775510204081636,
        "y": -2.534752680927128,
        "x_big": 1000000003.8775511
      },
      {
        "x": 4.081632653061225,
        "y": 0.5472814219827786,
        "x_big": 1000000004.0816326
      },
      {
        "x": 4.285714285714286,
        "y": -0.94006201477225,
        "x_big": 1000000004.2857143
      },
      {
        "x": 4.4897959183673475,
        "y": -0.44267587694472865,
        "x_big": 1000000004.4897959
      },
      {
        "x": 4.6938775510204085,
        "y": -1.7308706351930483,
        "x_big": 1000000004.6938776
      },
      {
        "x": 4.8979591836734695,
        "y": -0.6464235408517132,
        "x_big": 1000000004.8979592
      },
      {
        "x": 5.1020408163265305,
        "y": 0.21296340603639657,
        "x_big": 1000000005.1020408
      },
      {
        "x": 5.3061224489795915,
        "y": -0.8448711284427113,
        "x_big": 1000000005.3061224
      },
      {
        "x": 5.510204081632653,
        "y": 0.8859020999783254,
        "x_big": 1000000005.5102041
      },
      {
        "x": 5.714285714285714,
        "y": 0.11364702436690943,
        "x_big": 1000000005.7142857
      },
      {
        "x": 5.918367346938775,
        "y": 0.6266735971454986,
        "x_big": 1000000005.9183674
      },
      {
        "x": 6.122448979591837,
        "y": 0.5207423673624404,
        "x_big": 1000000006.1224489
      },
      {
        "x": 6.326530612244898,
        "y": 3.1788087967538363,
        "x_big": 1000000006.3265306
      },
      {
        "x": 6.530612244897959,
        "y": 1.5171150201600254,
        "x_big": 1000000006.5306122
      },
      {
        "x": 6.73469387755102,
        "y": 0.67698294859512,
        "x_big": 1000000006.7346939
      },
      {
        "x": 6.938775510204081,
        "y": 2.7613204223072705,
        "x_big": 1000000006.9387755
      },
      {
        "x": 7.142857142857143,
        "y": 0.922013492886121,
        "x_big": 1000000007.1428572
      },
      {
        "x": 7.346938775510204,
        "y": 2.5558023705149595,
        "x_big": 1000000007.3469387
      },
      {
        "x": 7.551020408163265,
        "y": 0.5913502842834897,
        "x_big": 1000000007.5510204
      },
      {
        "x": 7.755102040816327,
        "y": 1.4269159919178966,
        "x_big": 1000000007.755102
      },
      {
        "x": 7.959183673469388,
        "y": 3.1560449093385117,
        "x_big": 1000000007.9591837
      },
      {
        "x": 8.16326530612245,
        "y": 3.90173188611786,
        "x_big": 1000000008.1632653
      },
      {
        "x": 8.36734693877551,
        "y": 3.5387152199654808,
        "x_big": 1000000008.3673469
      },
      {
        "x": 8.571428571428571,
        "y": 3.4557802890403306,
        "x_big": 1000000008.5714285
      },
      {
        "x": 8.775510204081632,
        "y": 3.4744065084923434,
        "x_big": 1000000008.7755102
      },
      {
        "x": 8.979591836734695,
        "y": 2.5010698463672676,
        "x_big": 1000000008.9795918
      },
      {
        "x": 9.183673469387756,
        "y": 3.463829260993047,
        "x_big": 1000000009.1836735
      },
      {
        "x": 9.387755102040817,
        "y": 3.9271163310810295,
        "x_big": 1000000009.3877552
      },
      {
        "x": 9.591836734693878,
        "y": 5.648958960912793,
        "x_big": 1000000009.5918367
      },
      {
        "x": 9.795918367346939,
        "y": 5.1395366569154,
        "x_big": 1000000009.7959183
      },
      {"x": 10, "y": 3.236959844637266, "x_big": 1000000010}
    ]
  }
}
@JackFielding JackFielding added the bug For bugs or other software errors label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For bugs or other software errors
Projects
None yet
Development

No branches or pull requests

1 participant