Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KernelReg performance in a for loop #4187

Open
Landau1908 opened this issue Jan 2, 2018 · 7 comments · May be fixed by #9199
Open

KernelReg performance in a for loop #4187

Landau1908 opened this issue Jan 2, 2018 · 7 comments · May be fixed by #9199

Comments

@Landau1908
Copy link

Hi, everyone
So sorry to post this under Issues, I can't enter google account and mailing list.

I'm new in python. I have a task to fit many 1D profile(about 10000 profiles) obtained from electron beam to Gaussian.

Since the raw data is basically very noisy, I had to denoise before fitting, and KernelReg is a good way to do this as far as I know(in fact, I Know little for denoising).

For each profile, I firstly call KernelReg and then lmfit to extract the center, sigma, amp and offset of raw data in a for loop.

However, depending on the test of 100 data, if I use only lmfit the runtime is 2.4 seconds(cprofiler), and if combining KernelReg and lmfit the runtime would be 272 seconds.

The cprofiler displays a bottleneck in the KernelReg call.

So, my question is how to improve the performance of KernelReg call, and is this a good choice for denoising and fit using KernelReg+lmfit?.

Best Regards!

@josef-pkt
Copy link
Member

KernelReg is slow, and with bandwidth selection it is even slower. My guess is that you don't need the extra features of KernelReg and a faster substitute would work better in this case.
KernelReg is doing the estimation loop in python and is not optimized for speed.
Also, if you do bandwidth selection with one of the cross-validation methods, then doing the bandwidth selection only once and reuse for other series would be faster, if all profiles require approximately the same amount of smoothing.

First, can you add or post somewhere one of your datasets/profiles, mainly because this one of related type of usecases that I want to look into (eventually).

For related smoothing:

  • lowess is faster and should work pretty well if only the smoothed series is required.
  • if the x points are evenly spaced on a grid, then scipy savitsky_golay should work and be very fast.
    (I have an open PR for local smoothing that works similar to savitsky_golay but does binning if the x-values are not evenly spaced.

without smoothing
even without smoothing a nonlinear fit should work quite well, if not with a standard least squares fit, then a robust fit might work. scipy now also allows for Huber robust regression in the extension of leastsq fit, maybe lmfit includes this. I haven't tried those and so I have no idea how well they work for outlier-robust nonlinear fitting.

(There might be also other ways to estimate this in a relatively robust way, e.g. assuming Poisson noise, but I don't have any experience in what works well in cases like these.)

@josef-pkt
Copy link
Member

I just saw https://stackoverflow.com/questions/48069998/kernelreg-performance-in-a-for-loop
which includes one example profile

@Landau1908
Copy link
Author

@Landau1908
Copy link
Author

You can also post your answer to stackoverflow.

@shilinng
Copy link

shilinng commented Apr 6, 2024

I have a solution to speed up bw estimation for Nadaraya–Watson kernel regression (local constant reg_type). The idea is to vectorize the cv_loo method, and I can achieve in general 2x~2.5x speed up for hundreds to a thousand rows of 3-dimensional data (can't do more rows/dimension because my current machine is basically a tablet). For Gaussian kernel and continuous var_type, I can achieve 50x speed up (a few hundreds row) and ~7x speed up for 1000 rows. This obviously do not solve all the performance issues with kernel regression as there is no change to the fit function, and it only works for local constant reg_type. Can I still raise a pull request for this partial solution?

@bashtage
Copy link
Member

bashtage commented Apr 6, 2024

Please open a PR.

@shilinng shilinng linked a pull request Apr 7, 2024 that will close this issue
4 tasks
@shilinng
Copy link

Should I tag someone to review this? Any comment would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants