Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Table Understanding conversion to accept updated schema #190

Open
frreiss opened this issue Apr 26, 2021 · 0 comments
Open

Update Table Understanding conversion to accept updated schema #190

frreiss opened this issue Apr 26, 2021 · 0 comments

Comments

@frreiss
Copy link
Member

frreiss commented Apr 26, 2021

Recent versions of Watson Discovery have made undocumented changes to the format of the output of the Table Understanding enrichment. The old column names are documented at https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-understanding_tables#table-output-schema

Rough translation of field names into the new naming convention:

new_name_to_old = {
    "row_min": "row_index_begin",
    "row_max": "row_index_end",
    "column_min": "column_index_begin",
    "column_max": "column_index_end",
    "cell_text": "text",
    "id": "cell_id"
}

Also, the field location at the top of the table record now appears to be optional.

Our conversion to Pandas needs to be updated to cover both the old schema and the new schema.

I recommend that we first determine which schema is the canonical one and convert non-canonical schemas to the canonical one as a preprocessing step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant