Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Tasks generally need to be able to handle snapshots being deleted by the UI mid-run #1124

Closed
pirate opened this issue Mar 18, 2023 · 1 comment
Labels
expected: maybe someday size: medium status: backlog Work is planned someday but is not the highest priority at the moment touches: data/schema/architecture type: bug report why: performance Intended to improve ArchiveBox speed or responsiveness

Comments

@pirate
Copy link
Member

pirate commented Mar 18, 2023

Many archivebox tasks (e.g. update, add, etc.) fail mid-run if Snaphsots are deleted from the UI while archivebox is iterating over them.

We should do a pass over the codebase and find all the for Snaphot.objects... loops and add try:/except: within them to handle the case where the snapshot dissapears because it was deleted by another process.

root@kiwi /o/archivebox.un# docker-compose run archivebox update --index-only
Creating archiveboxun_archivebox_run ... done
find: '/.config/chromium/Crash Reports/pending/': No such file or directory
[i] [2023-03-18 06:07:27] ArchiveBox v0.6.3: archivebox update --index-only
    > /data

find: '/.config/chromium/Crash Reports/pending/': No such file or directory
Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 33, in <module>
    sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/cli/archivebox_update.py", line 119, in main
    update(
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/main.py", line 797, in update
    write_link_details(link, out_dir=out_dir, skip_sql_index=True)
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/__init__.py", line 335, in write_link_details
    write_json_link_details(link, out_dir=out_dir)
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/json.py", line 99, in write_json_link_details
    atomic_write(str(path), link._asdict(extended=True))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/schema.py", line 193, in _asdict
    'snapshot_id': self.snapshot_id,
                   ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 48, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
                                         ^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/schema.py", line 265, in snapshot_id
    return str(Snapshot.objects.only('id').get(url=self.url).id)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 429, in get
    raise self.model.DoesNotExist(
core.models.Snapshot.DoesNotExist: Snapshot matching query does not exist.
@pirate pirate added type: bug report size: medium touches: data/schema/architecture why: performance Intended to improve ArchiveBox speed or responsiveness status: backlog Work is planned someday but is not the highest priority at the moment expected: maybe someday labels Jun 13, 2023
@pirate pirate changed the title Bug: Taks generally need to be able to handle snapshots being deleted by the UI mid-run Bug: Tasks generally need to be able to handle snapshots being deleted by the UI mid-run Jan 19, 2024
@pirate
Copy link
Member Author

pirate commented Jan 19, 2024

Closing as duplicate of #1309

@pirate pirate closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expected: maybe someday size: medium status: backlog Work is planned someday but is not the highest priority at the moment touches: data/schema/architecture type: bug report why: performance Intended to improve ArchiveBox speed or responsiveness
Projects
None yet
Development

No branches or pull requests

1 participant