Failed to load other language data. #25

leon2milan · 2021-04-02T04:01:41Z

This is Chinese NL2SQL dataset. It has same format with wikisql. https://github.com/ZhuiyiTechnology/TableQA
Except a little of difference.
"sql": [2]. It uses list to wrap the value.
When I load this dataset, I get error.

I change the tokenizer to bert_base_chinese. still no working.
So, what can I do to finetuen your model in Chinese NL2SQL dataset?
Thank You very much!!!

todpole3 · 2021-04-02T05:38:21Z

Try replacing the BERT model we used with a multilingual LM such as mBERT or XLM-R. They can be accessed the same way via Hugging Face transformers library.

leon2milan · 2021-04-02T08:10:04Z

@todpole3 THX.
In my dataset, some data's headers have duplicated name. I already fix this.
And, there are two place difference.
First, seq and agg items are list.

{
     "table_id": "a1b2c3d4", # related table id
     "question": "世茂茂悦府的套均面积是多少？", # QUESTION
     "sql":{ # SQL
        "sel": [7, 8], # SQL selected columns
        "agg": [0, 1], #  aggregate function
        "cond_conn_op": 0, # the relation of condition
        "conds": [
            [1,2,"世茂茂悦府"] # conditional columns, conditional type, conditional values，col_1 == "世茂茂悦府"
        ]
    }
}

Second, the representation of agg and op are different.

op_sql_dict = {0:">", 1:"<", 2:"==", 3:"!="}
agg_sql_dict = {0:"", 1:"AVG", 2:"MAX", 3:"MIN", 4:"COUNT", 5:"SUM"}
conn_sql_dict = {0:"", 1:"and", 2:"or"}

After I run the code, example.matched_values got OrderedDict().
How to deal with this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to load other language data. #25

Failed to load other language data. #25

leon2milan commented Apr 2, 2021

todpole3 commented Apr 2, 2021

leon2milan commented Apr 2, 2021 •

edited

Failed to load other language data. #25

Failed to load other language data. #25

Comments

leon2milan commented Apr 2, 2021

todpole3 commented Apr 2, 2021

leon2milan commented Apr 2, 2021 • edited

leon2milan commented Apr 2, 2021 •

edited