Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Hashing prioritisation #1058

Open
Reinachan opened this issue Apr 14, 2023 · 11 comments
Open

[Feature Request] Hashing prioritisation #1058

Reinachan opened this issue Apr 14, 2023 · 11 comments

Comments

@Reinachan
Copy link

Current behaviour seems random, however, people usually watch anime in sequential order, so it would make sense to prioritise hashing files sequentially based on the episode number when possible.

I suggest that for files with similar names where the only difference is a number, ShokoAnime should prioritise files with a lower number before those with higher numbers. If it's unable to determine the episode number, it should do things the way it's currently doing it.

@revam
Copy link
Member

revam commented Apr 14, 2023

Context;

  1. Shoko doesn't care about filenames at all.
  2. The files are processed in the order they are discovered in (with some exceptions).

@revam
Copy link
Member

revam commented Apr 14, 2023

I'm not against adding a bit more "predictability" to the process, but i also don't see the benefit of adding this behaviour. Others on the team might see it differently though.

@Reinachan
Copy link
Author

Context;

  1. Shoko doesn't care about filenames at all.
  2. The files are processed in the order they are discovered in (with some exceptions).

That's what I assumed. I had that issue with my fileserver when reconstructing chunked uploads and ended up fetching filenames first and then initialise the process of reconstructing the file.

I'd suggest doing something similar for Shoko. First grab the filenames, check for prioritisation, then run the hasher.

@bigretromike
Copy link
Contributor

  1. The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

@revam
Copy link
Member

revam commented Apr 14, 2023

  1. The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

Only if they are discovered in sequential order.

@Reinachan
Copy link
Author

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)
@bigretromike

Assuming C# (or whatever library) is using the same APIs under the hood as Rust does, that's not the case, no.

This function currently corresponds to the opendir function on Unix and the FindFirstFile function on Windows. Advancing the iterator currently corresponds to readdir on Unix and FindNextFile on Windows. [...]

The order in which this iterator returns entries is platform and filesystem dependent.
(source)

That said, this is only an issue when the server is recieving a directory, not when it recieves individual files (like if you're downloading the episodes separately). Idk if those are distinguishable events for the server or not.

@Cazzar
Copy link
Member

Cazzar commented Apr 14, 2023

Ultimately without being stupidly slow in file discovery I don’t feel this will be that viable, and there is the difference between the full file tree scan and the filesystem watcher, once the commands are in the queue, they may be processed typically in order of priority then last updated, but that could change.

we don’t have any sorting currently as to do that we would need to load the entire import folder tree into memory before sorting and such a situation will lead quickly into poor performance in larger collections, and we already have a large memory footprint

@maxpiva
Copy link
Member

maxpiva commented Apr 14, 2023 via email

@Reinachan
Copy link
Author

I primarily mean on filewatcher events when you put a directory into the import folder. The way I have things set up is that once an anime is fully downloaded, it'll hardlink the containing folder into the Shoko import folder.

I don't think this should be done on an initial import, nor on individual files placed into the import folder. Only when a directory with multiple anime in it is placed into the import folder. You could also make it optional.

Basically, on filesystem event directiry, read entries in directory, determine sorting, perform in that order.

As for memory footprint, I personally don't mind short spikes of increased memory. You can mark the setting as "potentially memory intensive during imports" if it turns out to be a problem.

@da3dsoul
Copy link
Member

That could be done, since a directory detection is unique from a file detection

@maxpiva
Copy link
Member

maxpiva commented Apr 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants