Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.1] Add support for errors-in-variables #19

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

m93a
Copy link

@m93a m93a commented Sep 8, 2018

This pull request resolves #17 by adding support for regression with known standard deviations of data. This can be done with the optional parameters xError and yError on the data object passed to the algorithm. Furthermore I implemented adaptive fitting by changing the lambda parameter. I tried as hard as I could to maintain the good taste of the code, but I had to disable some overly restrictive eslint rules.

I've done these changes:

  1. I've changed minValue and maxValue to minValues and maxValues respectively. This way they're more consistent with the initialValues option. I've exposed these options in jsDoc, which fixes Constraint parameter values (feature request) #15.
  2. Then I changed the property parameterError on the returned object to residuals which is a much better name as I describe in Option errorTolerance is misnamed and theoretically wrong #18. However, that's a breaking change – this needs to be taken care of.
  3. I changed the behavior of errorTolerance which fixes Option errorTolerance is misnamed and theoretically wrong #18.
  4. I've implemented the adaptive changes of damping parameter. The new options are:
  • dampingDrop – if the fit got better, multiply damping by this factor, default value: 0.1
  • dampingBoost – if the fit got worse, multiply damping by this factor, default value: 1.5
  • minDamping and maxDamping – limits of the damping parameter
  1. I've added support for EIV (errors in variables). They can be passed as data.xError and data.yError to the algorithm. The new options are:
  • errorPropagation – how many iterations are used to approximate the error propagation through the parametrized function, default value: 50
    • Previously there have been rough and fine options to distinguish between rough guesses far from minimum and more precise estimations near minimum. But because I wasn't able to avoid getting stuck in local minima because of suboptimal "switching strategy", I decided to drop this distinction.

Both the old and my improved version of this algorithm are quite prone to local minima. My thoughts were that if we did a rough check of the parameter space (computing the residuals for various parameters) before the actual fitting, it could improve the results. But I'd better find some paper on this beforehand.

obrazek

A few things that need to be done before merging:

  • Add more tests for the new code
  • Add a "legacy layer" which would avert the breaking changes
  • Fine-tune the damping parameter
  • Find a new name for errorTolerance
  • Compare the weighted and non-weighted versions of this algorithm
  • Find a way to smoothly transition between errorPropagation.rough and errorPropagation.fine (I decided to drop this API)
  • Find a way to avoid local minima (Probably not in the scope of this PR)

@codecov
Copy link

codecov bot commented Sep 8, 2018

Codecov Report

Merging #19 into master will decrease coverage by 6.83%.
The diff coverage is 86.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
- Coverage   98.63%   91.79%   -6.84%     
==========================================
  Files           3        3              
  Lines          73      134      +61     
  Branches       20       29       +9     
==========================================
+ Hits           72      123      +51     
- Misses          1       10       +9     
- Partials        0        1       +1
Impacted Files Coverage Δ
src/index.js 98% <100%> (+0.63%) ⬆️
src/step.js 100% <100%> (ø) ⬆️
src/errorCalculation.js 81.48% <79.59%> (-18.52%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afbbd90...a3d2b2d. Read the comment docs.

@jobo322
Copy link
Member

jobo322 commented Sep 8, 2018

Hello @m93a
many thanks for your interest in this package, before everything, have you some literature where I could check the theoretical meaning of your suggestions?

@m93a
Copy link
Author

m93a commented Sep 9, 2018

@jobo322 Hey,
My primary sources were these:
Numerical Recipes in FORTRAN 77, Chapter 15
Wikipedia: Levenberg-Marquardt
Damping–undamping strategies for the Levenberg–Marquardt
nonlinear least-squares method

Wikipedia: Nonlinear regression
Plus some basic knowledge of statistics from university and a lot of googling.

If you have doubts about any specific claim, I'll try to provide more specific sources 🙂

@jobo322
Copy link
Member

jobo322 commented Sep 9, 2018

Hello @m93a, I will checking the code as soon as possible, also, I'll review the test cases. Again many thanks for your collaboration

@m93a
Copy link
Author

m93a commented Sep 24, 2018

Now the only thing left is to add new tests and evaluate why the old ones fail.

@jobo322
Copy link
Member

jobo322 commented Oct 3, 2018

Hello @m93a , I'm looking for an explanation about why the parameters in test case of sinFunction is always going down, I have been trying with diferents dampingBoost, dampingDrop, damping but the result is the min coutes defined by minValues, if you do not set inital values close to the real parameters, do you have idea how to control it?

@m93a
Copy link
Author

m93a commented Oct 4, 2018

Hey @jobo322,
Sadly I don't know yet. The first thing to do would be to make a 3D plot of the function sumOfSquaredResiduals(a, b) and explore the parameter space of this particular dataset. It is quite possible that the local minima are very shallow and getting lower with higher frequency. In such case, the algorithm would be performing as expected and an additional technique would be needed to find the fundamental frequency. (Our desired solution would be the local minimum, which of all periodically reocurring local minima has the lowest frequency.)
If you don't manage to inspect it and reach a conclusion, I'll take a look at it as soon as I have some free time for that (which should be in a month at most).

EDIT: It would be really helpful to make a plot with the algorithm's “trajectory” in the parameter space, rendered on top of a heatmap of that space.

@targos
Copy link
Member

targos commented Oct 18, 2018

Hello @m93a and thanks a lot for your work!

I hope you don't mind, I pushed a change to your branch to fix the conflicts with master and outstanding issues with eslint. In case you haven't seen it, we have an eslint-fix npm script to automatically fix styling issues so there shouldn't be a need to disable rules.

@targos
Copy link
Member

targos commented Oct 18, 2018

I don't think we need to support legacy options. Let's design a good API and get rid of the inconsistencies.

@@ -12,7 +12,7 @@
"scripts": {
"eslint": "eslint src",
"eslint-fix": "npm run eslint -- --fix",
"prepare": "rollup -c",
"preparea": "rollup -c",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intended? I indeed have no idea about what it should do, but it seems a lot like a typo to me 😀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, it was a temporary rename because npm install was failing 🙃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, typo. You can push up a fix.

@targos targos mentioned this pull request Oct 18, 2018
@@ -1,6 +1,6 @@
{
"name": "ml-levenberg-marquardt",
"version": "1.0.3",
"version": "1.1.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's customary to separate release bump commits from other work (some package maintainers get really peeved about this). @targos Are there strong opinions about this for this project?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use a tool that bumps the version number automatically (based on the commit messages) when we publish the release

Math.max(minValues[k], parameters[k]),
maxValues[k]
);
params2[k] = Math.min(Math.max(minValues[k], params2[k]), maxValues[k]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params2 seems like a name that isn't highly expressive. Am I understanding correctly that what you're doing here is avoiding overwriting the previous parameters in case they produce better results? If so, perhaps it'd be better to use names like prevParams and updatedParams or lastParams and nextParams, etc.


var converged = error <= errorTolerance;
var residuals = sumOfSquaredResiduals(data, params, paramFunction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be mixing var and let/const. I don't know if the switch belongs in this PR, but I think we should migrate away from using var unless we're trying to be compatible with really old engines. (And with the move to TS, that wouldn't be a problem anyway since it could just be translated down at the end.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can migrate to let and const with typescript

Copy link
Author

@m93a m93a Oct 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to let and const from TS, so it's possible that I accidentaly used them instead of var on some places. I was trying to use var as the old code was written that way and I didn't want to decrease compatibility with older engines. Moving to TS would be really cool.

* @prop {number} [residualEpsilon = 1e-6] - Minimum change of the sum of residuals per step – if the sum of residuals changes less than this number, the algorithm will stop
* @prop {Array<number>} [maxValues] - Maximum values for the parameters
* @prop {Array<number>} [minValues] - Minimum values for the parameters
* @prop {{rough: number, fine: number}} [errorPropagation] - How many evaluations (per point per step) of the fitted function to use to approximate the error propagation through it
Copy link
Contributor

@jacobq jacobq Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these rough and fine values used? (Are these just leftovers from fc79bc6?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I apparently forgot to remove them. In the best-case scenario, the algorithm would interpolate between those two as it approached the minimum, but I didn't manage to do it reliably, so it's best to remove those options.

src/index.js Show resolved Hide resolved
@jacobq
Copy link
Contributor

jacobq commented Oct 19, 2018

FYI I am taking a stab at this also. https://github.com/jacobq/levenberg-marquardt/tree/support-error

initializeErrorPropagation(errorPropagation);

/** @type Array<number> */
var residualDifferences = Array(10).fill(NaN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of 10 here? It seems like a "magic number"

Copy link
Contributor

@jacobq jacobq Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in interpreting the intention here as: (1) require at least 10 iterations and (2) stop after the total amount of largest change in the last 10 residuals is less than residualEpsilon?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. The algorithm just “takes a break” sometimes and other times it starts slow and gets up to speed after some iterations. The number 10 worked best for me, but you're right that it should be configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not saying that it should be configurable (maybe it should, maybe not; I don't know yet). I'm saying that the code should be more expressive (Note: this is just a free article I found from searching the web; there are much better resources on this topic, such as the book "Clean Code" by Robert C. Martin). In other words, when a developer looks at this code, he or she should not have to ask "What does this number mean?" -- the answer should be already provided by giving it a name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would even suggest taking it a step further to cover the question of "Why is this array being filled with NaN"? However, I don't think the solution to that problem lies in improving the name but rather in refactoring the code. It seems like we are only concerned about (1) whether or not we have taken at least 10 steps and (2) whether or not the biggest change in the last 10 steps was smaller than some given threshold. So, for example, we could do something like this instead of having an array, filling it with NaN, shifting it, scanning it, etc.

// Outside of loop
const minIterations = 10;   // We won't use the `residualEpsilon` stopping condition until it's been stable for this many steps
let maxEpsilon = -Infinity; // This will get replaced during the first step because the new values must be > -Infinity (alternatively we could set to null or undefined and handle that as a special case
let iterationsRemaining;    // This count-down will get initialized during the first step because we initialize maxEpsilon to its minimum
// Inside of loop
  // ... algorithm takes a step ...
  let currEpsilon = prevResiduals - currResiduals; // (may need ensure that this value is >= 0)
  if (currEpsilon > maxEpsilon) {
    // We just found a new maximum, so we reset the count-down before allowing this to be used as a stopping condition
    // This eliminates the need to maintain an array of previous epsilon values
    maxEpsilon = maxEpsilon;
    iterationsRemaining = minIterations;
  }
  else {
    // maxEpsilon did not change so we can decrement our counter (limiting to zero) 
    iterationsRemaining--;
    if (iterationsRemaining < 0) {
      iterationsRemaining = 0;
    }	
  }
  const isChangeTooSmall = maxEpsilon <= residualEpsilon; 
  if (isChangeTooSmall && iterationsRemaining === 0) {
    // ... stop ...
  }

src/index.js Show resolved Hide resolved
});
});

describe('error propagation estimator', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m93a Could you please help me understand these test values? When I look at them it is not at all obvious to me what the correct values should be. (Partly because I am not sure how errorPropagation works other than that it seems to take pseudo-random steps around a point and estimate the slope there)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errorPropagation method tries to approximate the propagation of uncertainty through the function, given a variable with known error. Since we know the mean, as well as the standard deviation of the variable, we can assume it's normally distributed and choose equiprobable intervals where the variable's measured value might actually be.

This means that if we slice the interval of all variable's possible values to eg. 10 equiprobable intervals, there's the same probability that the measured value will be in the first interval, as there is probability that it will be in the second, third, etc. interval. For practical purposes, I don't really count with all of the possible values, I only use values between x̅ − 2.25σ and x̅ + 2.25σ. (There's 97.6 % probability that the measured value will be between those values). This process is not pseudo-random, but completely deterministic – the intervals are computed using the CDF of normal distribution.

Then, the functional values on the boundaries of the intervals are computed. This way the function can be approximated as a linear function on each interval. When a variable “passes through” a linear function, its standard error gets “stretched” (ie. multiplied) by the slope of the function. If a variable has the same probability of passing through multiple linear functions, its standard error (on average) gets stretched by the average of the slopes of those linear functions.

Of course the error distribution of the functional value most probably won't be a Gauss curve, even if the input variable was normally distributed in the first place. But as our fitting algorithm can't deal with those either (ie. it's not robust), it's not a big deal. We can simply pretend that all of the resulting distributions are close enough to the normal distribution (which is mostly true as long as the errors are small).

(An uncanny fact as a bonus: if the error is large enough and the function is very unsymmetrical, the mean of the functional value won't be equal to the functional value of the mean. For now let's just pretend those cases never occur.)

Now about the test values. You're right that the values are not exactly obvious. The results should match the standard deviation of normally distributed data that passed through the function. One could use Monte Carlo to get the right value.

// Pseudo-code to validate the results of errorPropagation
const mean = 42;
const sigma = 0.1;
let data = Array(BILLIONS_AND_BILLIONS)
data = data.map(() => normal.random(mean, sigma));
data = data.map( x => fn(x) );
const propagatedSigma = stdev(data);

expect(errCalc.errorPropagation(fn, mean, sigma))
  .toBeCloseTo(propagatedSigma, DEPENDS_ON_NUMBER_OF_INTERVALS);

Wow, this was really time-consuming, I'm taking a break for today. I've got other things to do 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! I plan to read through it a few times and attempt to make the test cases more expressive once I understand it.


function sinFunction([a, b]) {
return (t) => a * Math.sin(b * t);
}

describe('errorCalculation test', () => {
describe('sum of residuals', () => {
it('Simple case', () => {
Copy link
Contributor

@jacobq jacobq Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this test by making the sampleFunction always evaluate to a constant value (perhaps 0)? I don't see why it should matter whether we use sin or any other particular function since sumOfResiduals should be purely a function of the difference between the function's output and the corresponding value in the set of data points. e.g. something like this:

const sampleFunction = ([k]) => (x => k); // resulting function returns constant
const n = 3;
const xs = new Array(n).fill(0).map((zero, i) => i); // [0, 1, ..., n-1] but doesn't matter because f(x; [k]) -> k
const data = {
  x: xs,
  y: xs.map(sampleFunction([0])) // all zeros
};

expect(errCalc.sumOfResiduals(data, [0], sampleFunction)).toBeCloseTo(0, 1);
expect(errCalc.sumOfResiduals(data, [1], sampleFunction)).toBeCloseTo(n, 1);
expect(errCalc.sumOfResiduals(data, [2], sampleFunction)).toBeCloseTo(2*n, 1);

expect(errCalc.sumOfSquaredResiduals(data, [0], sampleFunction)).toBeCloseTo(0, 1);
expect(errCalc.sumOfSquaredResiduals(data, [1], sampleFunction)).toBeCloseTo(n, 1);
expect(errCalc.sumOfSquaredResiduals(data, [2], sampleFunction)).toBeCloseTo(Math.pow(2*n, 2), 1);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a simpler function could be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize now that all your change really does here is update the name. The simplification of the test function is outside the scope of this PR. I've attempted to address this in #26, so after that merges this PR can be rebased to resolve this.

* @param {number} x
* @return {number}
*/
const sq = (x) => x * x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this isn't really any faster than Math.pow(x, 2), which has been part of the ES standard since the early days. (You can also use x ** 2 if you don't care about IE.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that the code in the perf actually does anything. Since you're effectively throwing away the result right after the computation, the interpreter probably optimizes that part out and just iterates i (probably by more than 1).

You're probably right about this function not adding very much to performance. If you don't like it, just remove it, but it seems quite readable to me.

Copy link
Contributor

@jacobq jacobq Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm trying to say is that if you're doing it for performance reasons then it's probably better to remove it. If you think it is more readable I can see why that might be a good reason. However, I think it's hard to argue that a custom function that is identical to a built-in one is easier to read (because one has to look it up to see what it does rather than just recognize it), though in this case it matters little either way since the function is so simple.

You are probably right about jsPerf...sometimes it's too easy to fool oneself. Here's an updated example adapted to reduce the likelihood of this kind of mistake. It still shows less than 1% difference between the two methods of interest.
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the performance differences are negligible. I would choose x ** 2 then, as it seems to have the best readability. And when we move to TypeScript, it will be compiled to Math.pow so we won't lose compatibility with old browsers.

@jacobq
Copy link
Contributor

jacobq commented Oct 22, 2018

@jobo322 @m93a Regarding the question about divergence in the Math.sin test, this family of functions seems to have a deep canyon near the solution but a broad plateau for low amplitude parameters. (I am using the terms like the paper "Why are Nonlinear Fits to Data so Challenging?" -- not sure if they are standard.) In other words, when the amplitude parameter is small, it is very hard to find the optimal frequency parameter because going the "wrong way" may make the solution "better".

Here's a 2D plot showing how SSR varies with the frequency parameter when the amplitude is held correct:
image

Here's a surface plot you can explore on jsbin.com.

(In these examples I used amplitude = 2 and frequency = 3)

@jacobq
Copy link
Contributor

jacobq commented Oct 22, 2018

You can also see how holding the frequency constant could lead toward the sub-optimal solution of zero amplitude:
image

@jacobq
Copy link
Contributor

jacobq commented Oct 22, 2018

Since there is a lot of work in this PR I'd like to break it down into smaller bites (each as their own commit), and possibly even merge them separately. Thoughts? (I don't mind doing the rebasing myself, if it's helpful.)

  • Refactor / update test suite (there is a lot of duplication, and it's often hard to tell what is desired behavior vs. existing behavior) Clean up tests #26
  • Replace parameterError with sumOfSquaredResiduals (breaking change)
  • Support dynamic damping (breaking change)
  • Support weighting/error (breaking change)

@m93a
Copy link
Author

m93a commented Oct 23, 2018

I'd like to break it down into smaller bites

I don't have an opinion on this, we'd better ask @targos 🙂


Should I push all the minor fixes we talked about before you start rebasing? Or will you take care of it yourself? I'm talking specifically:

  • correct preparea to prepare in package.json
  • roll package version back
  • rename params2 and residuals2 to nextParams and nextResiduals
  • refactor the code with residualDifferences = Array(10) to something reasonable
  • change sq(x) to x ** 2 or Math.pow

@jacobq
Copy link
Contributor

jacobq commented Oct 24, 2018

Should I push all the minor fixes we talked about before you start rebasing? Or will you take care of it yourself? I'm talking specifically: ...

Yes, please. I plan to work on other branches and cherry-pick your commits, so feel free to keep working here. (I'm also not sure how much time I'll have available to work on this over the next 2 weeks.)

@jacobq
Copy link
Contributor

jacobq commented Dec 17, 2018

@m93a just checking to see if you have done any more work related to this. I am planning to get back to this in the next week or two.

@m93a
Copy link
Author

m93a commented Dec 19, 2018

Sorry, I didn't have much time because of school and the other project that I'm working on (which will need this repo as a dependency but we're not quite there yet). I'm not sure if I'll have spare time sooner than a month from now, so feel free to work on this until then 🙂

@trulstrengerikkealias
Copy link

I'm very curious of any progress on this proposed upgrade of the algorithm.
I've used this one in C/C++, which has all the features one needs:
https://pages.physics.wisc.edu/~craigm/idl/cmpfit.html

Maybe you can get some help figuring out your problem by looking into this implementation.

@jacobq
Copy link
Contributor

jacobq commented Jan 16, 2020

I'm very curious of any progress on this proposed upgrade of the algorithm.

You know, I meant to get around to this but never did. Thanks for the reminder and the link. @m93a did you / do you plan to work more on this any time soon?

@m93a
Copy link
Author

m93a commented Jan 17, 2020

Hey,
I kinda forgot about this PR too. I might get into it after the end of our exam period at some time in February. If I forget again, please do bump the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants