Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load other language data. #25

Open
leon2milan opened this issue Apr 2, 2021 · 2 comments
Open

Failed to load other language data. #25

leon2milan opened this issue Apr 2, 2021 · 2 comments

Comments

@leon2milan
Copy link

This is Chinese NL2SQL dataset. It has same format with wikisql. https://github.com/ZhuiyiTechnology/TableQA
Except a little of difference.
"sql": [2]. It uses list to wrap the value.
When I load this dataset, I get error.
image
I change the tokenizer to bert_base_chinese. still no working.
So, what can I do to finetuen your model in Chinese NL2SQL dataset?
Thank You very much!!!

@todpole3
Copy link
Collaborator

todpole3 commented Apr 2, 2021

Try replacing the BERT model we used with a multilingual LM such as mBERT or XLM-R. They can be accessed the same way via Hugging Face transformers library.

@leon2milan
Copy link
Author

leon2milan commented Apr 2, 2021

@todpole3 THX.
In my dataset, some data's headers have duplicated name. I already fix this.
And, there are two place difference.
First, seq and agg items are list.

{
     "table_id": "a1b2c3d4", # related table id
     "question": "世茂茂悦府的套均面积是多少?", # QUESTION
     "sql":{ # SQL
        "sel": [7, 8], # SQL selected columns
        "agg": [0, 1], #  aggregate function
        "cond_conn_op": 0, # the relation of condition
        "conds": [
            [1,2,"世茂茂悦府"] # conditional columns, conditional type, conditional values,col_1 == "世茂茂悦府"
        ]
    }
}

Second, the representation of agg and op are different.

op_sql_dict = {0:">", 1:"<", 2:"==", 3:"!="}
agg_sql_dict = {0:"", 1:"AVG", 2:"MAX", 3:"MIN", 4:"COUNT", 5:"SUM"}
conn_sql_dict = {0:"", 1:"and", 2:"or"}

After I run the code, example.matched_values got OrderedDict().
How to deal with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants