Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--output - trims resulting index #262

Open
ArsenArsen opened this issue Mar 15, 2022 · 6 comments
Open

--output - trims resulting index #262

ArsenArsen opened this issue Mar 15, 2022 · 6 comments
Labels
bug-report Unexpected behavior found, including behavior that diverges from documentation. help-wanted Indicator that the maintainers want advice, help, ideas, etc.

Comments

@ArsenArsen
Copy link
Contributor

while :; do 2>/dev/null stork build --input debug.json --output - | wc --bytes; done

Related to #261. This makes it impossible to use stork as a filter. Notably, /dev/stdout does not have the same issue, implying this is an issue with how Rust opens stdout.

ArsenArsen added a commit to ArsenArsen/wiki that referenced this issue Mar 16, 2022
ArsenArsen added a commit to ArsenArsen/wiki that referenced this issue Mar 16, 2022
@jameslittle230
Copy link
Owner

I'm unable to reproduce this, unfortunately:

while TRUE; cargo run -- build --input local-dev/test-configs/federalist.toml --output - 2> /dev/null | wc; end
    1512   46242 1125456
    1512   46242 1125456
    1512   46242 1125456
    1512   46242 1125456

however, I suspect that merging #272 will fix this issue. If you're able, could you pull that branch, build Stork locally, and retry your test?

@ArsenArsen
Copy link
Contributor Author

ArsenArsen commented Mar 22, 2022

[i] ~/stork 130 $ while :; do stork build --input - --output - 2>/dev/null <local-dev/test-configs/federalist.toml | wc --bytes; done | uniq
1124840
1125456
1125420
1125456
1124926
1125456
1124919
1125183
1125456
1125220
1125456

The above is unpatched. For some reason, the Federalist Papers example reproduces this issue a lot less (I had to use uniq to reduce the non-wrong result spam).

Patched:

[c] ~/stork$ while :; do ./target/debug/stork build --input - --output - 2>/dev/null <local-dev/test-configs/federalist.toml | wc --bytes; done
1125456
1125456
1125451
1125456
1125456
1124734
1125456

It'd appear flushing does not help (IIRC, I tried this myself after opening the issue anyways).
For some reason, though, STORK-262/fix-write-to-stdout is insanely slow.

Please try this on a glibc system (such as the Debian Docker container) too.

It's probably worth noting that I'm using rustc 1.58.1 and cargo 1.58.0

PS: Is there some realtime communication channel? It'd likely be more ergonomic to test these kinds of weird issues that way

@jameslittle230
Copy link
Owner

Weird - thanks for checking. I'll be sure not to merge #272 if it makes things too slow.

When you were reproducing it with your own config, was it producing index files that were bigger or smaller than the Federalist Papers example?

I'll keep working on a repro and check back in.

There's no chat set up for the project - I haven't had a need to spin something like that up yet, and I don't yet have a good sense for how useful it would be over Github issues and discussions. Happy to consider it, though - any suggestions?

@ArsenArsen
Copy link
Contributor Author

Weird - thanks for checking. I'll be sure not to merge #272 if it makes things too slow.

Flush alone should't, I think this was just system load. I can't reproduce it now. Even when reverting the BTreeMap changes I only get a 13% increase in speed (builds per second).

When you were reproducing it with your own config, was it producing index files that were bigger or smaller than the Federalist Papers example?

I was under the impression I included my results - my bad!
Considerably smaller.

381545
378037
381545
381523

I'll keep working on a repro and check back in.

This just gets weirder, I am now unable to reproduce it with the flush.
This issue would seem to be fixed now?
Well, at any rate, stdout not being flushed on exit seems like a Rust runtime bug too.

There's no chat set up for the project - I haven't had a need to spin something like that up yet, and I don't yet have a good sense for how useful it would be over Github issues and discussions. Happy to consider it, though - any suggestions?

I don't really have any special suggestions here, just the usual (Matrix or Libera.Chat; Zulip is also a thing some swear by but I haven't used it much).
Whatever works for you works for me

@applejag
Copy link

Hello I'm experiencing the same as we're trying to implement Stork into Emanote (srid/emanote#327).

When using --output - the index becomes "corrupted"/unusable.

To repro:

  1. Clone https://github.com/jilleJr/notes
  2. Run this snippet to generate an ad-hoc config file:
    echo -e "[input]\nfiles = [" > stork.toml
    while read -r file
    do
    	echo "  {path=\"$file\", url=\"$file\", title=\"$(basename "$file")\"}," >> stork.toml
    done < <(find content -name '*.md')
    echo "]" >> stork.toml
  3. Build the index:
    stork build -i stork.toml -o index-from-flag.st
    stork build -i stork.toml -o /dev/stdout > index-from-stdout.st
    stork build -i stork.toml -o - > index-from-dash.st
  4. Attempt a search:
    $ stork search -q foo -i index-from-flag.st
    (large json output)
    
    $ stork search -q foo -i index-from-stdout.st
    (large json output)
    
    $ stork search -q foo -i index-from-dash.st
    thread 'main' panicked at 'split_to out of bounds: 679254 <= 679213', /home/kalle/.cargo/registry/src/github.com-1ecc6299db9ec823/bytes-1.1.0/src/bytes.rs:402:9
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(Using Stork v1.5.0 btw)

@jameslittle230
Copy link
Owner

@jilleJr - thanks for the repro steps. I'll take a look later today and report back what I find.

@jameslittle230 jameslittle230 added bug-report Unexpected behavior found, including behavior that diverges from documentation. help-wanted Indicator that the maintainers want advice, help, ideas, etc. and removed type:bug labels Mar 17, 2023
@jameslittle230 jameslittle230 removed their assignment Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Unexpected behavior found, including behavior that diverges from documentation. help-wanted Indicator that the maintainers want advice, help, ideas, etc.
Projects
None yet
Development

No branches or pull requests

3 participants