Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write tool to scrape Airtable metadata for tag categories, etc and open-source separately #40

Open
avanavana opened this issue Jan 15, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@avanavana
Copy link
Owner

avanavana commented Jan 15, 2022

Related to #39 .

To make the categories also sync, either gain access to the Airtable Metadata API (not going to happen, they've stopped onboarding new teams), or write code that scrapes the ESOVDB Airtable API docs and syncs the tag categories with Zotero, creating collections with the Tags parent collection, and then returning their Zotero Keys and Versions back to the ESOVDB API, either stored in a new table on the Airtable (not preferable), or in a JSON data file. This should all happen on command, set it up as a command line tool

Since Airtable is no longer onboarding new users on the Metadata API, all that can be done is to scrape the API documentation page, which contains the same data, but annoyingly distributed throughout rendered HTML. The idea is to create a command-line tool that will take an Airtable Base ID, email, and password, and spit out a JSON file for all the metadata in that Airtable base.

Schema:

{
    "name": "baseName",
    "id": "baseId",
    "apiBaseURL": "https://api.airtable.com/v0/baseId",
    "tables": [
        {
            "name": "tableName",
            "fields": [
                {
                    "name": "fieldName",
                    "fieldType": "airtableFieldType",
                    "type": "dataType",
                    "description": "airtableDescription",
                    "examples": [
                        {
                            "type": "text",
                            "value": "exampleText"
                        },
                        {
                            "type": "array",
                            "value": [
                                "arrayItem",
                                ...
                            ]
                        }
                    ]
                },
                ...
            ]
        },
        ...
    ]
}

This can be open-sourced and distributed to others separate from the ESOVDB, who might find it usefu, given the moratorium on Airtable metadata API.

The data will be scraped using puppeteer.js and cheerio.js, with puppeteer in stealth mode.

The JSON file that this script writes out can be parsed and used for instance to dynamically list tag categories in #40, and this can either run on a schedule with crontab, or with the various ESOVDB sync functions, or manually through the command-line.

@avanavana avanavana self-assigned this Jan 15, 2022
@avanavana avanavana added the enhancement New feature or request label Jan 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

1 participant