Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract booru notes and put them in text file #5556

Open
ocrhell opened this issue May 5, 2024 · 2 comments
Open

Extract booru notes and put them in text file #5556

ocrhell opened this issue May 5, 2024 · 2 comments

Comments

@ocrhell
Copy link

ocrhell commented May 5, 2024

Is there a way to extract notes (translations) and put them in a similarly named downloaded text file? Specifically gelbooru.
When running an instance with this in config file:

        "booru":
        {
            "tags": false,
            "notes": true
        }

Only the image is downloaded and the notes aren't extracted at all. Not even in cmd. Should I be adding anything in gelbooru's block?

Thanks.

@Hrxn
Copy link
Contributor

Hrxn commented May 7, 2024

Have you tried it with a "metadata" post-processor?

https://gdl-org.github.io/docs/configuration.html#postprocessor-configuration
https://gdl-org.github.io/docs/configuration.html#postprocessor-options

For example

{
    "extractor":
    {
        "booru":
        {
            "..": "..",
            
            "postprocessors":[
                
                {
                    "name" : "metadata",
                    "event": "post",
                    "mode" : "custom",
                    "skip": true,
                    "content-format": "{content|description}\n",
                    "filename": "{id}.txt"
                }
            ]

        }
    }
}

Of course, you need to check the output with -K, if it's actually {content} you want, or {description}, or whatever the name is for translation - given that the site provides something like such translations.

@ocrhell ocrhell closed this as completed May 7, 2024
@ocrhell
Copy link
Author

ocrhell commented May 7, 2024

Closed it prematurely, sorry.
Going through the notes block from gelbooru.py and gelbooru_v02.py, is it possible to filter out height, width, x, y?

notes.append({
                "width" : int(extr(note, 'data-width="', '"')[0]),
                "height": int(extr(note, 'data-height="', '"')[0]),
                "x"     : int(extr(note, 'data-x="', '"')[0]),
                "y"     : int(extr(note, 'data-y="', '"')[0]),
                "body"  : extr(note, 'data-body="', '"')[0],
            })

I've tried # but that doesn't work.
Tried multiple variations of notes.x / notes.width etc... with an additional postprocessors instance with delete before and after. Also didn't work.
-K gives notes[N]['width'] / notes[N]['height'] etc... and I've tried those too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants