Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package with offline links never "finishes" #4392

Open
mihawk90 opened this issue Nov 12, 2023 · 7 comments
Open

Package with offline links never "finishes" #4392

mihawk90 opened this issue Nov 12, 2023 · 7 comments
Assignees
Labels
bug Something isn't working pyLoad Next

Comments

@mihawk90
Copy link
Contributor

Description

I have a package in my Queue where 3 out of 24 links are offline. The rest of them has been downloaded, however the package never enters the "finished" state and therefore never fires an event for the ExtractArchive plugin to pick up and do the extraction.

I'm not sure whether this is a bug or intentional, since offline links are obviously not downloaded. However that being the case they also will never be downloaded and finished, so pyload should treat them as finished and at least attempt extraction. If extraction fails, then that's just that. At that point it doesn't matter whether the links are treated as finished or not though since it doesn't change the situation. All it does is saving time by automating an extraction step that will otherwise have to be done manually.

Debug log

My log is 17k lines and I don't know where exactly the last link finished... I can however say that the package name only appears in the log where the package was created, where normally finished packages are logged with Package finished: <packagename>.
Here is however a screenshot for illustration:
image
As you can see on the progress bar 3 links are "unfinished", and those are exactly the 3 offline links.
It should be noted in this case those were links that would have been skipped due to the file already existing (or rather having been downloaded on another hoster), so the package is finished regardless (but of course pyload can't know that when the offline link doesn't provide the filename).

Additional references

I dug through the code a little and the issue seems to lie here:

def check_package_finished(self, pyfile):
"""
checks if package is finished and calls addon_manager.
"""
ids = self.pyload.db.get_unfinished(pyfile.packageid)
if not ids or (pyfile.id in ids and len(ids) == 1):
if not pyfile.package().set_finished:
self.pyload.log.info(
self._("Package finished: {}").format(pyfile.package().name)
)
self.pyload.addon_manager.package_finished(pyfile.package())
pyfile.package().set_finished = True

Now, L600 calls get_unfinished:
def get_unfinished(self, pid):
"""
return list of max length 3 ids with pyfiles in package not finished or
processed.
"""
self.c.execute(
"SELECT id FROM links WHERE package=? AND status NOT IN (0, 4, 13) LIMIT 3",
(str(pid),),
)
return [r[0] for r in self.c]

Unfortunately I didn't find any documentation on what those status (0, 4, 13) are, but I'm assuming something like downloading, finished, and skipped. So I guess this would need a fourth filter for offline, however I don't know what that would be and also I couldn't test it even if I knew ( #4391 ).

@mihawk90 mihawk90 added bug Something isn't working pyLoad Next labels Nov 12, 2023
@GammaC0de
Copy link
Member

Those are finished, skipped and processing
See:

status_map = {
"finished": 0,
"offline": 1,
"online": 2,
"queued": 3,
"skipped": 4,
"waiting": 5,
"temp. offline": 6,
"starting": 7,
"failed": 8,
"aborted": 9,
"decrypting": 10,
"custom": 11,
"downloading": 12,
"processing": 13,
"unknown": 14,
}

mihawk90 added a commit to mihawk90/pyload that referenced this issue Nov 15, 2023
This change considers (permanently) offline links as finished. Although
they are not technically finished in being downloaded, they also never
will be so there is no reason not to dispatch `package_finished`. This
allows plugins and scripts to further process the files.

Note that this does not adjust the progress display in queue, allowing
users to check why their progress never finishes and decide what to do
with the package or links in question.

Fixes pyload#4392
mihawk90 added a commit to mihawk90/pyload that referenced this issue Nov 18, 2023
This change considers (permanently) offline links as finished. Although
they are not technically finished in being downloaded, they also never
will be so there is no reason not to dispatch `package_finished`. This
allows plugins and scripts to further process the files.

Note that this does not adjust the progress display in queue, allowing
users to check why their progress never finishes and decide what to do
with the package or links in question.

Fixes pyload#4392
@michnixweiss
Copy link

From my point of view the behaviour of pyload is correct.
If not all links in a package are downloaded its not finished.
Example: if the links are parts of one archiv you will get damaged files when extracting.
If the links are all separat files which you can use without the missing ones, I use a workaround:
Delete the offline links from the package and restart the hole package. The downloaded ones will be skipped, the package finishes and will be extracted. The result of extracting depends on what is downloaded/missing.

@mihawk90
Copy link
Contributor Author

Example: if the links are parts of one archiv you will get damaged files when extracting.

You won't because the extract will simply fail on a missing part.

Also remember that pyload handles mirrored files across different sources. So say you have hosterA.com/file1.zip and hosterB.net/file1.zip and we assume the second link is offline. Ordinarily it would simply be skipped as a duplicate (if skip existing is enabled), however various hosters obfuscate the link in a way that pyload can only get the filename once the download starts. Now if the second link is offline that obviously won't happen and pyload can't even get the filename, thus is unable to skip as a duplicate. Even though the file is functionally present, the package will never finish.

The user will then have to periodically check manually whether the packages finished or not just to discover that it didn't finish because a functionally irrelevant link was offline.

If the links are all separat files which you can use without the missing ones, I use a workaround

Which is effectively doing the same thing just manually. You're treating the remaining offline links as if they hadn't been present in the first place. Treating them as finished results in the same.


On that note please take a look at my proposed PR #4394. If you have any thoughts on where treating them as not finished has any advantages I'd very much like to know to take this into account.

@michnixweiss
Copy link

@mihawk90 I get your point and in case of mirrored links it could be a possible way. AND I really like the idea of using the err out from extracting.
Sometimes I have a finished package where all files are passed but when extracting, one of them has a CRC missmatch. So I have to check manually which one, delete the file and restart the package to get the corrupted downloaded again and successful extracted. It would be awesome if pyload could do that by itself.

Back to treating a offline link as finished.
If I have a package with several archives which are independed but the content is connected, treating a offline one as finished would be the wrong way (from my point of view).
I would rather delete one offline link from an package to get it finished, instead of thinking it is finished and realising that something is missing when I want to use it.
When I notice a package will not complete, I can look for a reup or something. If the package is finished (but incomplete) and the finished are cleaned up, its much more complex to get whats missing.

The only safe way would be if pyload could check what file the link is for. But if understand correct this impossible when the link is offline!?

@mihawk90
Copy link
Contributor Author

mihawk90 commented Dec 17, 2023

I would rather delete one offline link from an package to get it finished, instead of thinking it is finished and realising that something is missing when I want to use it.

Then you either didn't read the PR description properly or I'm not understanding your workflow.

As noted there:

Note that this does not adjust the progress display in queue, allowing users to check why their progress never finishes and decide what to do with the package or links in question.

This covers exactly what you are saying. When you open the queue it will still say e.g. 80/81.

So this makes me wonder how else you check whether a package is finished, if it's not by the progress display.

The PR covers treating the package as finished programatically to trigger events, visually to the user there is (intentionally) no difference.
It's not intended to replace the user having to go in and manually check what's wrong with their links (because mind reading is difficult).

On the other end of the spectrum it helps users who never check their queue and instead rely on notifications to be triggered. As it currently stands packages with offline links will never trigger notifications because the offline links are.. well, not "finished" and so the event to trigger notifications is never emitted. The PR allows for those users to receive notifications, and then potentially check the progress (they'll have to check the queue for cleanup anyway).
Although admittedly that currently has the downside that notifications wouldn't include whether links were offline or how many. Personally I'd argue that's better then never receiving a notification at all though.

When I notice a package will not complete, I can look for a reup or something. If the package is finished (but incomplete) and the finished are cleaned up, its much more complex to get whats missing.

I don't really see how to be honest. The package and all its links remain in the queue since they aren't deleted.
That means you can still look for the offline links the same way you do now, and replace them.

It changes nothing in the workflow for the user, it allows for users receiving the event via notifications though or have already (actually) finished files extracted.

As noted on the PR though, the best approach is probably giving users an option in the settings, which I haven't done yet (because I was waiting for feedback).

@michnixweiss
Copy link

@mihawk90 ...my fault, I REALLY read over this part of the PR description...

Note that this does not adjust the progress display in queue, allowing users to check why their progress never finishes and decide what to do with the package or links in question.

With this I think it could be a usable solution. But

the downside that notifications wouldn't include whether links were offline

contradicts

it helps users who never check their queue

This could be confusing. The trigger is "package finished" and the notification will tell something similar.
What is passed within the "finished" trigger?
Are there other possible triggers?

@mihawk90
Copy link
Contributor Author

mihawk90 commented Dec 19, 2023

@mihawk90 ...my fault, I REALLY read over this part of the PR description...

the downside that notifications wouldn't include whether links were offline

contradicts

it helps users who never check their queue

This could be confusing. The trigger is "package finished" and the notification will tell something similar. What is passed within the "finished" trigger? Are there other possible triggers?

It might contradict in the way it was worded yes, however what I meant was never checking the queue while the downloads are running, i.e. until they get a finished notification.

Either way, yes the way it currently is, it would simply trigger the finished notification the same way they are triggered now when everything is done. (and personally I would argue offline links are also done because there's nothing else to do with them)

That being said, this could be adjusted in the future. The package_finished function already passes the package along to both the function calling all plugins as well as the event dispatcher:

@lock
def package_finished(self, package):
for plugin in self.plugins:
if plugin.is_activated():
plugin.package_finished(package)
self.dispatch_event("package_finished", package)

So it's possible to print whether there were offline links and how many. Not sure that's in scope for that PR though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pyLoad Next
Development

No branches or pull requests

4 participants