is chinese id supported ? #172

shatealaboxiaowang · 2023-06-09T10:25:51Z

Hi, i want to extract the key field information of Chinese text, so does kor.Text id supports chinese ?

eyurtsev · 2023-06-09T20:32:11Z

Not at the moment, it's likely not that difficult to add, likely requires creating another identifier that would be used instead of the ID when writing the prompt to the LLM

shatealaboxiaowang · 2023-06-12T02:03:10Z

Thank you very much for your reply, but it seems that I did not understand what you wrote. The language of ‘another identifier’ should still be English, right? Do I need to do a Chinese to English mapping?

shatealaboxiaowang · 2023-06-12T02:23:52Z

Like the code below? Just give each 'id' a name, and in the 'description' clearly describe the field to be extracted。

"messages": {
"id": "test",
"description": "test",
"field_01": {
"id": "id_01",
"description": "歌手名字？",
"example": []
},
"field_02": {
"id": "id_02",
"description": "专辑有哪些？",
"example": []
},
"field_03": {
"id": "id_03",
"description": "蔡依林的歌曲叫什么？",
"example": []
}
},

eyurtsev · 2023-06-12T15:57:38Z

Kor cannot support an ID field in Chinese right now. This could be a feature that will be added at some point.

In the meantime, you could rely on examples to improve the quality of extraction. It's unclear to what extent having an ID provided in chinese will affect the quality of the result since the language models already understand multiple languages.

shatealaboxiaowang · 2023-06-19T02:21:52Z

Kor cannot support an ID field in Chinese right now. This could be a feature that will be added at some point.

In the meantime, you could rely on examples to improve the quality of extraction. It's unclear to what extent having an ID provided in chinese will affect the quality of the result since the language models already understand multiple languages.

Thank you very much for your reply. I have modified it on the basis of your source code, and now it supports Chinese. The modified code is as follows:

ADD：
VALID_IDENTIFIER_PATTERN_CH = re.compile(r"[\u4e00-\u9fff]+")
Modify:
if not (VALID_IDENTIFIER_PATTERN.match(uid) or VALID_IDENTIFIER_PATTERN_CH.match(uid)):

in kor.modes
could you please help check whether it is accurate?

eyurtsev · 2023-07-06T21:25:04Z

If you're working with your own clone of the library and you could probably remove the VALID_IDENTIFIER check completely -- as long as the code runs without the identifier and generates the correct prompt you should be OK.

aixiamomo · 2023-08-08T02:49:23Z

Kor cannot support an ID field in Chinese right now. This could be a feature that will be added at some point.
In the meantime, you could rely on examples to improve the quality of extraction. It's unclear to what extent having an ID provided in chinese will affect the quality of the result since the language models already understand multiple languages.

Thank you very much for your reply. I have modified it on the basis of your source code, and now it supports Chinese. The modified code is as follows:

ADD： VALID_IDENTIFIER_PATTERN_CH = re.compile(r"[\u4e00-\u9fff]+") Modify: if not (VALID_IDENTIFIER_PATTERN.match(uid) or VALID_IDENTIFIER_PATTERN_CH.match(uid)):

in kor.modes could you please help check whether it is accurate?

from kor import nodes
nodes.VALID_IDENTIFIER_PATTERN = re.compile(r".")  # monkey patch 使支持中文identifier

Make kor pydantic v1 and v2 compatible. Additional changes: * No more validation on node ids -- makes it easier to use node ids in other languages (e.g., [chinese](#172)) * Add testing to CI to test with both v1 and v2 Not fully implemented: * Support for serialization via parse_obj

eyurtsev added the enhancement New feature or request label Jun 9, 2023

eyurtsev mentioned this issue Sep 6, 2023

Pydantic v1 and v2 support #213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is chinese id supported ? #172

is chinese id supported ? #172

shatealaboxiaowang commented Jun 9, 2023

eyurtsev commented Jun 9, 2023

shatealaboxiaowang commented Jun 12, 2023

shatealaboxiaowang commented Jun 12, 2023

eyurtsev commented Jun 12, 2023

shatealaboxiaowang commented Jun 19, 2023

eyurtsev commented Jul 6, 2023

aixiamomo commented Aug 8, 2023

is chinese id supported ? #172

is chinese id supported ? #172

Comments

shatealaboxiaowang commented Jun 9, 2023

eyurtsev commented Jun 9, 2023

shatealaboxiaowang commented Jun 12, 2023

shatealaboxiaowang commented Jun 12, 2023

eyurtsev commented Jun 12, 2023

shatealaboxiaowang commented Jun 19, 2023

eyurtsev commented Jul 6, 2023

aixiamomo commented Aug 8, 2023