Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to adjust or lightweight metadata in 'media_sync' table #470

Open
woozu-shin opened this issue Feb 21, 2024 · 1 comment
Open

Comments

@woozu-shin
Copy link

Version

  • 0.13.3

Abstract

  • The more videos there are, the larger the media_sync table becomes.
  • The reason for the bloat is that each metadata json is hundreds of KB.
  • It need a way to solve this.

Detail

  • I am syncing a source with about 1500 videos.
  • Each sync_media record has a data size of approximately 500KB.
  • The reason is because of the bloat of metadata, and a way to reduce this is needed.
  • The sqlite file reached 700MB, and CPU usage and memory usage were observed to be high due to the overhead caused by - reading/dumping the huge metadata json.

AS-IS

tubesync_1_huge_db

TO-BE

  • There are many unverified and unworked Mock parts to create P/R.
  • I am attaching a sample of my work locally.
    tubesync_2_redueced

Sample Source

  • models.py
class Source(models.Model):
    ...
    LIGHTWEIGHT_METADATA_TYPE_RAW = 'RAW'
    LIGHTWEIGHT_METADATA_TYPE_FEATHER = 'FEATHER'
    LIGHTWEIGHT_METADATA_TYPES = (LIGHTWEIGHT_METADATA_TYPE_RAW, LIGHTWEIGHT_METADATA_TYPE_FEATHER)
    LIGHTWEIGHT_METADATA_TYPE_CHOICES = (
        (LIGHTWEIGHT_METADATA_TYPE_RAW, _("(LARGE) Save raw metadata")),
        (LIGHTWEIGHT_METADATA_TYPE_FEATHER, _("(TINY) if the capacity is large, Treeshake it event if it is in use")),
    )

    lightweight_metadata = models.CharField(
        _('lightweight metadata'),
        max_length=20,
        default=LIGHTWEIGHT_METADATA_TYPE_RAW,
        choices=LIGHTWEIGHT_METADATA_TYPE_CHOICES,
        help_text=_('Lightweight metadata')
    )
  • tasks.py
        if source.lightweight_metadata == Source.LIGHTWEIGHT_METADATA_TYPE_FEATHER:
            del media.metadata["formats"]
            del media.metadata["thumbnails"]
            del media.metadata["automatic_captions"]
            del media.metadata["requested_formats"]
            del media.metadata["heatmap"]

Sample View

  • Add/Edit source
    image

  • Media item view (one of the media details)
    image

@meeb
Copy link
Owner

meeb commented Feb 21, 2024

If you delete formats and thumbnails from the metadata then thumbnails can't be downloaded and downloading media won't work as the media format can't be evaluated. This occurs on model save at the moment to determine if an item can be downloaded when there's a match for the requested format. While you may want to ignore the thumbnails, currently the formats (which get refreshed as and when the metadata is updated) are required.

Over the years I've had a good look at the large metadata myself, probably the most sensible may be to move it out of the database and store them as msgpack'd blobs on disk in the config dir or similar. There isn't much you can truncate from the metadata without losing functionality. You can save 5-10% but that never really seemed that worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants