Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace with regex cause stucks #500

Open
inkydragon opened this issue Jan 24, 2024 · 12 comments
Open

Replace with regex cause stucks #500

inkydragon opened this issue Jan 24, 2024 · 12 comments

Comments

@inkydragon
Copy link
Contributor

  • Notepad Next 0.6.4 (GitHub Rel ver), Win11

txt file

1
2
3

Replace ^ => //

  • No cycle search (p)
  • With Regex (g) open
  • Click "Replace ALL"
  • Stucks

Notepad Next CPU load 5%~8%
Memory usage: rise slowly

When debug with WinDbg
I see those lines repeat forever:
const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)

Full log
[     0.094] I: =============================
[     0.094] I: Notepad Next v0.6.4.0
[     0.094] I: Build Date/Time: Oct 18 2023 16:53:08
[     0.095] I: Qt: 6.5.3
[     0.095] I: OS: Windows 11 Version 23H2
[     0.095] I: Locale: zh_CN
[     0.095] I: CPU: x86_64
[     0.095] I: File Path: D:/env/Notepad Next/NotepadNext.exe
[     0.095] I: Arguments: D:\env\Notepad Next\NotepadNext.exe
[     0.095] I: =============================
[     0.095] I: bool __cdecl NotepadNextApplication::init(void)
ModLoad: 00007ffc`d42b0000 00007ffc`d42be000   D:\env\Notepad Next\imageformats\qgif.dll
ModLoad: 00007ffc`d1440000 00007ffc`d144e000   D:\env\Notepad Next\imageformats\qico.dll
ModLoad: 00007ffc`87ee0000 00007ffc`87f7a000   D:\env\Notepad Next\imageformats\qjpeg.dll
ModLoad: 00007ffc`d1230000 00007ffc`d123c000   D:\env\Notepad Next\imageformats\qsvg.dll
ModLoad: 00007ffc`d11d0000 00007ffc`d122a000   D:\env\Notepad Next\Qt6Svg.dll
[     0.231] I: void __cdecl NotepadNextApplication::loadTranslation(class QLocale)
[     0.231] I: Loaded zh_CN translation :/i18n/NotepadNext_zh_CN.qm for Notepad Next
[     0.232] I: zh_CN translation not found for Qt components
[     0.232] I: __cdecl LuaState::LuaState(void)
[     0.245] I: __cdecl MacroManager::MacroManager(class QObject *)
[     0.245] I: void __cdecl MacroManager::loadSettings(void)
[     0.245] I: __cdecl MainWindow::MainWindow(class NotepadNextApplication *)
[     0.380] I: setupUi Completed
[     0.464] I: __cdecl FileListDock::FileListDock(class MainWindow *)
[     0.465] I: void __cdecl MainWindow::setupLanguageMenu(void)
[     0.496] I: void __cdecl MainWindow::restoreSettings(void)
[     0.496] I: AutoUpdates: 1
[     0.496] I: void __cdecl NotepadNextApplication::openFiles(const class QList<class QString> &)
[     0.496] I: void __cdecl MainWindow::newFile(void)
[     0.497] I: void __cdecl EditorManager::setupEditor(class ScintillaNext *)
[     0.498] I: void __cdecl MainWindow::addEditor(class ScintillaNext *)
[     0.498] I: void __cdecl MainWindow::detectLanguage(class ScintillaNext *)
[     0.498] I: void __cdecl MainWindow::setLanguage(class ScintillaNext *,const class QString &)
[     0.498] I: Language Name: Text
[     0.498] I: void __cdecl DockedEditor::addEditor(class ScintillaNext *)
ModLoad: 00007ffc`49010000 00007ffc`490c0000   C:\WINDOWS\SYSTEM32\d3d9on12.dll
ModLoad: 00007ffc`48fe0000 00007ffc`49009000   C:\WINDOWS\SYSTEM32\d3d12.dll
ModLoad: 00007ffc`48dd0000 00007ffc`48fd3000   C:\WINDOWS\SYSTEM32\D3D12Core.dll
ModLoad: 00007ffc`48da0000 00007ffc`48dc7000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igd12umd64.dll
ModLoad: 00007ffc`47ea0000 00007ffc`48d9f000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igd12um64xel.dll
ModLoad: 00007ffc`dd640000 00007ffc`dd7a6000   C:\WINDOWS\System32\CRYPT32.dll
ModLoad: 00007ffc`97480000 00007ffc`974c5000   C:\WINDOWS\SYSTEM32\ControlLib.dll
ModLoad: 00007ffc`b8580000 00007ffc`b860c000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\IntelControlLib.dll
ModLoad: 00007ffc`da9b0000 00007ffc`da9c5000   C:\WINDOWS\SYSTEM32\resourcepolicyclient.dll
ModLoad: 00007ffc`cf5a0000 00007ffc`cf5ea000   C:\WINDOWS\SYSTEM32\directxdatabasehelper.dll
ModLoad: 00007ffc`ae810000 00007ffc`ae849000   C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvdlistx.dll
ModLoad: 00007ffc`ae810000 00007ffc`ae849000   C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvdlistx.dll
ModLoad: 00007ffc`b0ec0000 00007ffc`b12fb000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igdgmm64.dll
ModLoad: 00007ffc`a2c80000 00007ffc`a6b4b000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igc64.dll
ModLoad: 00007ffc`47ab0000 00007ffc`47adf000   C:\WINDOWS\SYSTEM32\D3DSCache.dll
ModLoad: 00007ffc`477b0000 00007ffc`47aa9000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igd12dxva64.dll
ModLoad: 00007ffc`465d0000 00007ffc`477aa000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igddxvacommon64.dll
ModLoad: 00007ffc`448a0000 00007ffc`465c5000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\media_bin_64.dll
ModLoad: 00007ffc`44870000 00007ffc`44897000   C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_1c096b3397af68b6\igdinfo64.dll
ModLoad: 00007ffc`dcb30000 00007ffc`dcb4b000   C:\WINDOWS\SYSTEM32\CRYPTSP.dll
ModLoad: 00007ffc`dc2f0000 00007ffc`dc325000   C:\WINDOWS\system32\rsaenh.dll
ModLoad: 00007ffc`cc800000 00007ffc`cca85000   C:\WINDOWS\SYSTEM32\twinapi.appcore.dll
ModLoad: 00007ffc`7bfa0000 00007ffc`7c11e000   C:\WINDOWS\SYSTEM32\dxilconv.dll
ModLoad: 00007ffc`da9b0000 00007ffc`da9c5000   C:\WINDOWS\SYSTEM32\resourcepolicyclient.dll
ModLoad: 00007ffc`95e60000 00007ffc`95f1d000   C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvldumdx.dll
ModLoad: 00007ffc`dcf30000 00007ffc`dcf42000   C:\WINDOWS\SYSTEM32\msasn1.dll
ModLoad: 00007ffc`cd760000 00007ffc`cd792000   C:\WINDOWS\SYSTEM32\cryptnet.dll
ModLoad: 00007ffc`cd580000 00007ffc`cd6d8000   C:\WINDOWS\SYSTEM32\drvstore.dll
ModLoad: 00007ffc`dca70000 00007ffc`dcabb000   C:\WINDOWS\SYSTEM32\wldp.dll
ModLoad: 00007ffc`ddc30000 00007ffc`ddc9b000   C:\WINDOWS\System32\WINTRUST.DLL
ModLoad: 00007ffc`df720000 00007ffc`df73f000   C:\WINDOWS\System32\imagehlp.dll
ModLoad: 00007ffb`e8580000 00007ffb`eb474000   C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_76b6a0e822aad26f\nvd3dumx.dll
ModLoad: 00007ffc`88b60000 00007ffc`88bbe000   C:\WINDOWS\system32\dataexchange.dll
ModLoad: 00007ffc`cc800000 00007ffc`cca85000   C:\WINDOWS\system32\twinapi.appcore.dll
[     0.829] I: void __cdecl MainWindow::activateEditor(class ScintillaNext *)
[     0.829] I: bool __cdecl MainWindow::checkFileForModification(class ScintillaNext *)
[     0.829] I: void __cdecl MainWindow::updateGui(class ScintillaNext *)
[     0.829] I: void __cdecl MainWindow::updateFileStatusBasedUi(class ScintillaNext *)
[     0.829] I: void __cdecl MainWindow::updateSaveStatusBasedUi(class ScintillaNext *)
[     0.830] I: void __cdecl MainWindow::updateEOLBasedUi(class ScintillaNext *)
[     0.830] I: void __cdecl MainWindow::updateLanguageBasedUi(class ScintillaNext *)
ModLoad: 00007ffc`c8460000 00007ffc`c85aa000   C:\WINDOWS\SYSTEM32\textinputframework.dll
ModLoad: 00007ffc`aca50000 00007ffc`acaf5000   C:\Windows\System32\Windows.FileExplorer.Common.dll
ModLoad: 00007ffc`d8bc0000 00007ffc`d8cc1000   C:\WINDOWS\system32\propsys.dll
ModLoad: 00007ffc`8d1f0000 00007ffc`8d20a000   C:\WINDOWS\system32\NetworkExplorer.dll
ModLoad: 00007ffc`88170000 00007ffc`88419000   C:\WINDOWS\system32\explorerframe.dll
[     1.035] I: class Scintilla::Internal::RegexSearchBase *__cdecl Scintilla::Internal::CreateRegexSearch(class Scintilla::Internal::CharClassify *)
ModLoad: 00007ffc`d59d0000 00007ffc`d5b04000   C:\WINDOWS\SYSTEM32\CoreMessaging.dll
ModLoad: 00007ffc`ce430000 00007ffc`ce79c000   C:\WINDOWS\SYSTEM32\CoreUIComponents.dll
ModLoad: 00007ffc`b2920000 00007ffc`b2948000   C:\WINDOWS\SYSTEM32\edputil.dll
[     4.643] I: void __cdecl MainWindow::updateSaveStatusBasedUi(class ScintillaNext *)
[     6.007] I: __cdecl FindReplaceDialog::FindReplaceDialog(class ISearchResultsHandler *,class MainWindow *)
[     6.028] I: void __cdecl FindReplaceDialog::loadSettings(void)
[     6.041] I: void __cdecl FindReplaceDialog::showEvent(class QShowEvent *)
[    15.497] I: Last checked for updates at: Wed Jan 24 09:19:29 2024
[    15.774] I: void __cdecl FindReplaceDialog::replaceAll(void)
[    15.775] I: void __cdecl FindReplaceDialog::prepareToPerformSearch(bool)
[    15.775] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.776] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.777] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.778] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.779] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.780] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.780] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.781] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.781] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.782] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.782] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.783] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.784] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
[    15.784] I: const char *__cdecl QRegexSearch::SubstituteByPosition(class Scintilla::Internal::Document *,const char *,__int64 *)
.....
@dail8859
Copy link
Owner

I would guess this is related to #192

It is probably restarting the search after the replace in the incorrect spot, thus an infinite loop.

The regex implementation is...quite rough to be honest, and I figured there were quite a few corner cases. My gut feeling is that the QRegularExpression class is not a long term solution.

Currently there are 3 options:

  • Stick with QRegularExpression and try to deal with all the intricacies and hope there's no "show stoppers".
  • Use Scintilla's built in regular expression engine (this has quite a few limitations though, e.g. no multiline matches)
  • Or...probably the best option...is try to implement the boost regex engine that Notepad++ uses. It uses boost but it also has to provide an implementation to scintilla so that it can work with that regex engine. I've glanced through Notepad++'s code that does that...and it's not immediately easy to figure out.

@mintsoft
Copy link

mintsoft commented Feb 12, 2024

I think the problem here (and with #192) is actually not QRegularExpression itself; it handles this case fine. I think the issue is with the way it is used with NotepadNext as it seems to be relying on the .captured() returning a non-zero length string. If you use the QRegularExpression in a more direct manner to do the replacements it seems to work, Example:

#include <QtGlobal>
#include <QtCore>
#include <QDebug>
#include <QTextStream>
#include <QRegularExpression>
#include <iostream>

int main(int argc, char *argv[])
{
    QTextStream qout(stdout);
   
    auto options = QRegularExpression::MultilineOption | QRegularExpression::UseUnicodePropertiesOption;
    QRegularExpression re("^", options);
   
    QString replacement = "//";
    QString inputString = "1\n2\n3\n";
    QRegularExpressionMatch match = re.match(inputString, 0, QRegularExpression::NormalMatch, QRegularExpression::NoMatchOption);
   
    qDebug() << "input string: ";
    qDebug() << inputString;

    QString newString = inputString.replace(match.regularExpression(), replacement);

    qDebug() << " --------------------- ";
    qDebug() << newString;
    qDebug() << " --------------------- ";

}

Outputs the following:

input string: 
"1\n2\n3\n"
 --------------------- 
"//1\n//2\n//3\n"
 --------------------- 

Which (unless I'm missing something entirely) suggests that the library is fine for this case (tested under QT 5.15 on Debian 12)

@dail8859
Copy link
Owner

@mintsoft Thanks for the info. You are correct that purely using QRegularExpression does work. However to integrate with the editing component you can't simply just call .replace() on a string.

Each search/replace needs a starting location and ending location. Since the substring is pulled out of the editor and turned into a QString, then ^ always matches the beginning of the string, no matter where the search is "started" in the document. Even if you do replace the first valid ^ match with // then when the search starts again after the previously replaced // it then finds another match...and gets stuck.

@mintsoft
Copy link

Each search/replace needs a starting location and ending location. Since the substring is pulled out of the editor and turned into a QString, then ^ always matches the beginning of the string, no matter where the search is "started" in the document. Even if you do replace the first valid ^ match with // then when the search starts again after the previously replaced // it then finds another match...and gets stuck.

Ahh I see. It seems like we'd need to change that behaviour no matter which Regex implementation we used to fix the problem.

I assume it's implemented like that so that we are able to dynamically update the document as it goes-along and avoid converting the entire document into QString and duplicating everything?

@dail8859
Copy link
Owner

It seems like we'd need to change that behaviour no matter which Regex implementation we used to fix the problem.

That is partially correct. If I recall if you use something like re2 (which I believe QRegularExpression uses) then there are more flags you can pass it to tell it that the beginning of the string you are providing isn't actually the start. But that was a while ago...so maybe I'm misremembering. QRegularExpression doesn't provide that fine of control.

That's the whole reason why in the long run I suspect Notepad++'s implementation might be worth while since it uses boostregex and they've already worked out all the corner cases :)

I assume it's implemented like that so that we are able to dynamically update the document as it goes-along and avoid converting the entire document into QString and duplicating everything?

That is a big part of it. The other part is to let Scintilla properly know about each individual change so that it can properly manage the undo stack along with other internals.

@dail8859
Copy link
Owner

On a side note, Scintilla by default supports the standard library regex std::regex but I recall looking into that at one time and found some limitations...not sure if things have improved since then.

@mintsoft
Copy link

On a side note, Scintilla by default supports the standard library regex std::regex but I recall looking into that at one time and found some limitations...not sure if things have improved since then.

Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc); I think that's actually the regex engine that Notepad++ used to have about 10 years ago, I remember it was very very limited for a while

@dail8859
Copy link
Owner

Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc)

That is it's own very basic implementation of a regex engine. But it can also use std::regex which I'm not sure how well it matches up to something like boost.

@mintsoft
Copy link

Yeah, the docs point towards that being quite limited (no ?, or lookaheads/behinds etc)

That is it's own very basic implementation of a regex engine. But it can also use std::regex which I'm not sure how well it matches up to something like boost.

It's probably on-par than Scintilla's one, however if you want PCRE (i.e. "proper regex") then the Boost library is the best bet certainly

@mintsoft
Copy link

Another reproduction of this bug is any find/replace that does not change the actual content or that the replaced string still matches the "find" i.e.:

Find (.*)
Replace $1

This will also loop forever

@SimLV
Copy link

SimLV commented May 11, 2024

the Boost library is the best bet
I am going to add original PCRE2 after finishing current two PRs

@dail8859
Copy link
Owner

the Boost library is the best bet

Notepad++ has used an implementation of the boost regex library with the Scintilla editor and it has been proven to be quite robust over the years (not saying it doesn't have issues).

I've looked at the code a little bit before but it is not just a drop-in replacement since it was made for Windows and Win32 https://github.com/notepad-plus-plus/notepad-plus-plus/tree/master/boostregex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants