Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Split CJK names #2624

Open
ZnqbuZ opened this issue Aug 18, 2023 · 184 comments
Open

[Feature]: Split CJK names #2624

ZnqbuZ opened this issue Aug 18, 2023 · 184 comments

Comments

@ZnqbuZ
Copy link

ZnqbuZ commented Aug 18, 2023

Debug log ID

NA

What happened?

I believe many people would like to keep a Chinese name in a whole:

image

instead of splitting it:

image

Still, we want the first & last names to be correctly capitalized, e.g. "杨莲亭" has pinyin "yang lian ting", and should be capitalized like "YangLianting".

Usually, hacks like auth.substring(1,1).clean.capitalize + auth.substring(2).clean.capitalize do the trick, since most Chinese surnames is simply 1 character.

However, there are some surnames consisting of 2 characters. For example, “东方不败” has pinyin "dong fang bu bai", and should be capitalized like "DongfangBubai" rather than "DongFangbubai", which the formula I mentioned will give.

I tried using jieba, but it seems to think of a name as one word. Please correct me if that's not the case.

So, to achieve this, I wrote a simple JavaScript snippet to split Chinese names. It extracts the first 2 characters of a name, and looks them up in a dictionary, to see if the chars constitute a surname. Currently, it supports Simplified & Traditional Chinese. Korean / Japanese could also be supported once someone gives me a list of Korean / Japanese compound surnames.

I hope you could consider adding this function.

function splitName(name, lang) {
    var compoundSurnames = {
        'zh-hans': ['阿单', '阿跌', '阿贺', '阿会', '阿里', '阿仑', '阿罗', '阿热', '哀骀', '艾岁', '安迟', '安都', '安国', '安金', '安陵', '安平', '安期', '安丘', '安是', '安阳', '奥敦', '奥鲁', '奥屯', '阿史那', '巴公', '拔拔', '拔列', '拔略', '拔也', '把利', '罢敌', '白马', '白狄', '白公', '白侯', '白鹿', '白鸾', '白冥', '白男', '白象', '白亚', '白乙', '白石', '百里', '柏常', '柏侯', '柏高', '班丘', '阪泉', '阪上', '鲍丘', '鲍俎', '苞丘', '卑梁', '卑徐', '北方', '北宫', '北郭', '北海', '北旄', '北门', '北比', '北丘', '北人', '北唐', '北堂', '北乡', '北殷', '北野', '北城', '北关', '北辰', '北山', '倍俟', '奔水', '逼阳', '比丘', '比人', '闭珊', '辟闾', '宾牟', '并官', '波斯', '拨略', '薄奚', '薄野', '伯比', '伯夫', '伯常', '伯成', '伯德', '伯封', '伯丰', '伯高', '伯昏', '伯暋', '伯夏', '伯有', '伯州', '伯宗', '不第', '不戴', '驳马', '薄姑', '薄奚', '薄野', '卜成', '卜梁', '卜马', '步叔', '步扬', '步温', '池张', '陈一', '曹牟', '曹丘', '常涛', '长鱼', '车非', '成功', '成公', '成阳', '乘马', '叱卢', '丑门', '樗里', '穿封', '淳于', '单于', '答禄', '达勃', '达步', '达奚', '登徒', '邓陵', '第一', '第二', '第三', '第四', '第五', '第六', '第七', '第八', '地连', '地伦', '东方', '东里', '东南', '东宫', '东门', '东乡', '东丹', '东郭', '东陵', '东关', '东闾', '东阳', '东野', '东莱', '豆卢', '斗于', '都尉', '独孤', '端木', '段干', '多子', '尔朱', '方雷', '丰将', '封人', '封父', '夫蒙', '夫余', '浮丘', '富察', '傅其', '傅余', '棼冒', '蚡冒', '范姜', '干已', '高车', '高陵', '高堂', '高阳', '高辛', '皋落', '哥舒', '盖楼', '庚桑', '梗阳', '宫孙', '公羊', '公良', '公孙', '公罔', '公西', '公冶', '公敛', '公梁', '公输', '公上', '公山', '公户', '公玉', '公仪', '公仲', '公甲', '公坚', '公宾', '公伯', '公祖', '公乘', '公晰', '公族', '姑布', '古口', '古龙', '古赖', '古孙', '穀梁', '谷浑', '瓜田', '关龙', '鲑阳', '归海', '虢射', '函治', '韩余', '罕井', '浩生', '浩星', '纥骨', '纥奚', '纥于', '贺陈', '贺拨', '贺兰', '贺楼', '赫连', '赫王', '黑齿', '黑肱', '侯冈', '呼延', '壶丘', '呼衍', '斛律', '胡非', '胡母', '胡毋', '胡林', '忽仑', '皇甫', '皇父', '花裳', '火拔', '胡桃', '兀官', '吉白', '即墨', '季夙', '季瓜', '季连', '季融', '季孙', '季尹', '茄众', '蒋丘', '姜匀', '金齿', '晋楚', '京城', '经孙', '泾阳', '九百', '九方', '九吾', '睢鸠', '沮渠', '巨母', '勘阻', '渴侯', '渴单', '可汗', '空桐', '崆峒', '空桑', '空相', '昆吾', '老阳', '郎佳', '乐羊', '荔菲', '栎阳', '梁丘', '梁由', '梁余', '梁垣', '陵阳', '伶舟', '冷沦', '令狐', '柳下', '龙丘', '龙藤', '卢妃', '卢蒲', '鲁步', '甪里', '陆费', '闾丘', '禄阁', '马矢', '马师', '麦丘', '麦卢', '茅夷', '蒙山', '孟孙', '弥牟', '密革', '密茅', '墨夷', '墨台', '万俊', '昌顿', '慕容', '木门', '木易', '万俟', '孟玄', '纳喇', '那拉', '纳兰', '南宫', '南郭', '南门', '南荣', '南离', '宁李', '欧侯', '欧阳', '逄门', '盆成', '彭祖', '平陵', '平宁', '破丑', '仆固', '濮阳', '浦思', '漆雕', '奇介', '綦母', '綦毋', '綦连', '祁连', '乞伏', '绮里', '千代', '千乘', '勤宿', '青阳', '丘丽', '丘陵', '曲沃', '屈侯', '屈突', '屈男', '屈卢', '屈同', '屈门', '屈引', '七七', '壤驷', '扰龙', '容成', '汝嫣', '萨孤', '三饭', '三闾', '三州', '桑丘', '商瞿', '上官', '尚方', '少师', '少施', '少室', '少叔', '少正', '社南', '社北', '申屠', '申徒', '沈犹', '神农', '胜屠', '石作', '石雨', '石牛', '侍其', '士季', '士弱', '士孙', '士贞', '叔敖', '叔梁', '叔孙', '叔先', '叔促', '水丘', '司城', '司空', '司寇', '司鸿', '司马', '司徒', '司士', '似和', '素和', '素黎', '夙沙', '孙阳', '索阳', '索卢', '沈江', '沓卢', '太史', '太叔', '太阳', '淡台', '唐山', '堂溪', '陶丘', '同蹄', '统奚', '秃发', '涂钦', '屠岸', '吐火', '吐贺', '吐万', '吐罗', '吐缶', '吐难', '吐缶', '吐浑', '吐奚', '吐和', '屯浑', '脱脱', '秃发', '拓拨', '拓跋', '澹台', '谭刘', '太宰', '完颜', '王孙', '王官', '王人', '王刘', '王子', '微生', '尾勺', '温孤', '温稽', '闻人', '屋户', '巫马', '巫许', '吾丘', '无庸', '无钩', '无终', '五鹿', '五鸠', '武安', '吴刘', '王黄', '息夫', '西陵', '西乞', '西钥', '西乡', '西门', '西周', '西郭', '西方', '西野', '西宫', '戏阳', '瑕吕', '霞露', '夏侯', '鲜虞', '鲜于', '鲜阳', '咸丘', '相里', '解枇', '谢丘', '新垣', '辛垣', '信都', '信平', '修鱼', '徐辜', '徐吾', '徐藤', '徐离', '宣于', '轩辕', '轩丘', '阏氏', '延陵', '罔法', '铅陵', '羊角', '耶律', '叶阳', '伊祁', '伊耆', '猗卢', '义渠', '邑由', '意如', '因孙', '银齿', '尹文', '雍门', '游水', '由吾', '右师', '有莘', '宥连', '于陵', '虞丘', '盂丘', '宇文', '尉迟', '乐羊', '乐正', '运龙', '运期', '宰父', '辗迟', '湛卢', '臧孙', '章仇', '仉督', '长孙', '长儿', '张廖', '张简', '真鄂', '正令', '执头', '中央', '中长', '中行', '中野', '中英', '中梁', '中垒', '钟离', '钟吾', '终黎', '终葵', '仲孙', '仲长', '周阳', '周氏', '周生', '朱阳', '诸葛', '主父', '颛孙', '颛顼', '訾辱', '淄丘', '子言', '子人', '子服', '子家', '子桑', '子南', '子叔', '子车', '子阳', '宗伯', '宗正', '宗政', '尊卢', '昨和', '左人', '左丘', '左师', '左行', '佐南'],
        'zh-hant': ['阿單', '阿跌', '阿賀', '阿會', '阿裏', '阿侖', '阿羅', '阿熱', '哀駘', '艾歲', '安遲', '安都', '安國', '安金', '安陵', '安平', '安期', '安丘', '安是', '安陽', '奧敦', '奧魯', '奧屯', '阿史那', '巴公', '拔拔', '拔列', '拔略', '拔也', '把利', '罷敵', '白馬', '白狄', '白公', '白侯', '白鹿', '白鸞', '白冥', '白男', '白象', '白亞', '白乙', '白石', '百裏', '柏常', '柏侯', '柏高', '班丘', '阪泉', '阪上', '鮑丘', '鮑俎', '苞丘', '卑梁', '卑徐', '北方', '北宮', '北郭', '北海', '北旄', '北門', '北比', '北丘', '北人', '北唐', '北堂', '北鄉', '北殷', '北野', '北城', '北關', '北辰', '北山', '倍俟', '奔水', '逼陽', '比丘', '比人', '閉珊', '辟閭', '賓牟', '並官', '波斯', '撥略', '薄奚', '薄野', '伯比', '伯夫', '伯常', '伯成', '伯德', '伯封', '伯豐', '伯高', '伯昏', '伯暋', '伯夏', '伯有', '伯州', '伯宗', '不第', '不戴', '駁馬', '薄姑', '薄奚', '薄野', '蔔成', '蔔梁', '蔔馬', '步叔', '步揚', '步溫', '池張', '陳一', '曹牟', '曹丘', '常濤', '長魚', '車非', '成功', '成公', '成陽', '乘馬', '叱盧', '醜門', '樗裏', '穿封', '淳於', '單於', '答祿', '達勃', '達步', '達奚', '登徒', '鄧陵', '第一', '第二', '第三', '第四', '第五', '第六', '第七', '第八', '地連', '地倫', '東方', '東裏', '東南', '東宮', '東門', '東鄉', '東丹', '東郭', '東陵', '東關', '東閭', '東陽', '東野', '東萊', '豆盧', '鬥於', '都尉', '獨孤', '端木', '段幹', '多子', '爾朱', '方雷', '豐將', '封人', '封父', '夫蒙', '夫余', '浮丘', '富察', '傅其', '傅余', '棼冒', '蚡冒', '範姜', '幹已', '高車', '高陵', '高堂', '高陽', '高辛', '臯落', '哥舒', '蓋樓', '庚桑', '梗陽', '宮孫', '公羊', '公良', '公孫', '公罔', '公西', '公冶', '公斂', '公梁', '公輸', '公上', '公山', '公戶', '公玉', '公儀', '公仲', '公甲', '公堅', '公賓', '公伯', '公祖', '公乘', '公晰', '公族', '姑布', '古口', '古龍', '古賴', '古孫', '穀梁', '谷渾', '瓜田', '關龍', '鮭陽', '歸海', '虢射', '函治', '韓余', '罕井', '浩生', '浩星', '紇骨', '紇奚', '紇於', '賀陳', '賀撥', '賀蘭', '賀樓', '赫連', '赫王', '黑齒', '黑肱', '侯岡', '呼延', '壺丘', '呼衍', '斛律', '胡非', '胡母', '胡毋', '胡林', '忽侖', '皇甫', '皇父', '花裳', '火拔', '胡桃', '兀官', '吉白', '即墨', '季夙', '季瓜', '季連', '季融', '季孫', '季尹', '茄眾', '蔣丘', '姜勻', '金齒', '晉楚', '京城', '經孫', '涇陽', '九百', '九方', '九吾', '睢鳩', '沮渠', '巨母', '勘阻', '渴侯', '渴單', '可汗', '空桐', '崆峒', '空桑', '空相', '昆吾', '老陽', '郎佳', '樂羊', '荔菲', '櫟陽', '梁丘', '梁由', '梁余', '梁垣', '陵陽', '伶舟', '冷淪', '令狐', '柳下', '龍丘', '龍藤', '盧妃', '盧蒲', '魯步', '甪裏', '陸費', '閭丘', '祿閣', '馬矢', '馬師', '麥丘', '麥盧', '茅夷', '蒙山', '孟孫', '彌牟', '密革', '密茅', '墨夷', '墨臺', '萬俊', '昌頓', '慕容', '木門', '木易', '萬俟', '孟玄', '納喇', '那拉', '納蘭', '南宮', '南郭', '南門', '南榮', '南離', '寧李', '歐侯', '歐陽', '逄門', '盆成', '彭祖', '平陵', '平寧', '破醜', '仆固', '濮陽', '浦思', '漆雕', '奇介', '綦母', '綦毋', '綦連', '祁連', '乞伏', '綺裏', '千代', '千乘', '勤宿', '青陽', '丘麗', '丘陵', '曲沃', '屈侯', '屈突', '屈男', '屈盧', '屈同', '屈門', '屈引', '七七', '壤駟', '擾龍', '容成', '汝嫣', '薩孤', '三飯', '三閭', '三州', '桑丘', '商瞿', '上官', '尚方', '少師', '少施', '少室', '少叔', '少正', '社南', '社北', '申屠', '申徒', '沈猶', '神農', '勝屠', '石作', '石雨', '石牛', '侍其', '士季', '士弱', '士孫', '士貞', '叔敖', '叔梁', '叔孫', '叔先', '叔促', '水丘', '司城', '司空', '司寇', '司鴻', '司馬', '司徒', '司士', '似和', '素和', '素黎', '夙沙', '孫陽', '索陽', '索盧', '沈江', '沓盧', '太史', '太叔', '太陽', '淡臺', '唐山', '堂溪', '陶丘', '同蹄', '統奚', '禿發', '塗欽', '屠岸', '吐火', '吐賀', '吐萬', '吐羅', '吐缶', '吐難', '吐缶', '吐渾', '吐奚', '吐和', '屯渾', '脫脫', '禿發', '拓撥', '拓跋', '淡臺', '譚劉', '太宰', '完顏', '王孫', '王官', '王人', '王劉', '王子', '微生', '尾勺', '溫孤', '溫稽', '聞人', '屋戶', '巫馬', '巫許', '吾丘', '無庸', '無鉤', '無終', '五鹿', '五鳩', '武安', '吳劉', '王黃', '息夫', '西陵', '西乞', '西鑰', '西鄉', '西門', '西周', '西郭', '西方', '西野', '西宮', '戲陽', '瑕呂', '霞露', '夏侯', '鮮虞', '鮮於', '鮮陽', '鹹丘', '相裏', '解枇', '謝丘', '新垣', '辛垣', '信都', '信平', '修魚', '徐辜', '徐吾', '徐藤', '徐離', '宣於', '軒轅', '軒丘', '閼氏', '延陵', '罔法', '鉛陵', '羊角', '耶律', '葉陽', '伊祁', '伊耆', '猗盧', '義渠', '邑由', '意如', '因孫', '銀齒', '尹文', '雍門', '遊水', '由吾', '右師', '有莘', '宥連', '於陵', '虞丘', '盂丘', '宇文', '尉遲', '樂羊', '樂正', '運龍', '運期', '宰父', '輾遲', '湛盧', '臧孫', '章仇', '仉督', '長孫', '長兒', '張廖', '張簡', '真鄂', '正令', '執頭', '中央', '中長', '中行', '中野', '中英', '中梁', '中壘', '鐘離', '鐘吾', '終黎', '終葵', '仲孫', '仲長', '周陽', '周氏', '周生', '朱陽', '諸葛', '主父', '顓孫', '顓頊', '訾辱', '淄丘', '子言', '子人', '子服', '子家', '子桑', '子南', '子叔', '子車', '子陽', '宗伯', '宗正', '宗政', '尊盧', '昨和', '左人', '左丘', '左師', '左行', '佐南']
    }
    var splitIndex = compoundSurnames[lang].includes(name.substr(0, 2)) ? 2 : 1
    return [name.substr(0, splitIndex), name.substr(splitIndex)]
}

splitName('东方不败', 'zh-hans') will gives ['东方', '不败'], and should be eventually capitalized to ['Dongfang', 'Bubai'].

@github-actions
Copy link

Hello there @ZnqbuZ,

Hope you're doing well! @retorquere is here to help you get the most out of your experience with Better BibTeX. To make sure he can assist you effectively, he kindly asks for your cooperation in providing a debug log – it's like giving him the key to understanding and solving the puzzle!

Getting your debug log is a breeze and will save us both time. Trust me, it's way quicker than discussing why it's important. 😃

How to Share Your Debug Log:

  1. If the issue involves specific references or exports, just right-click on the relevant item(s) and choose "Better BibTeX -> Submit Better BibTeX debug log" from the menu.

  2. For other issues, follow these simple steps:

    • Restart Zotero with debugging enabled (Help -> Debug Output Logging -> Restart with logging enabled).
    • Reproduce the problem.
    • Select "Send Better BibTeX debug report..." from the help menu.

Once you hit that submit button, you'll get a special red debug ID. Just share that with @retorquere in this issue thread. If the question is regarding an export, don't forget to include what you see exported and what you expected.

By sharing your debug log, you're giving @retorquere a clearer picture of your setup and the items causing the issue. It's like a superhero cape for him – he can swoop in and tackle the problem much faster.

We totally get that your time is valuable, and we appreciate your effort in helping @retorquere help you. You might be surprised at how much this simple step speeds up the whole process.

Thanks a bunch!

@retorquere
Copy link
Owner

A debug log is not "not applicable" here. A debug log per point 1 gives me the entry we're discussing here -- I cannot enter Chinese names myself.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Aug 18, 2023

Sorry. I've sent a log with ID YeGr1kqXOgnV-6U3RYALN

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Aug 18, 2023

A log with more examples ZAVVH2PE-apse/6.7.112-6 was sent.

@retorquere
Copy link
Owner

Thank you.

@retorquere
Copy link
Owner

Does that mean that there is a definitive list of compound Chinese family names, and that they all exist of two characters?

@duncdrum, can I ask you to jump in? I mean no offence @ZnqbuZ but I don't know anything about Chinese so if there's anything to discuss I need to have others involved.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Aug 18, 2023

Does that mean that there is a definitive list of compound Chinese family names, and that they all exist of two characters?

Yes, for all modern names and almost all ancient names. I got the list from wikipedia and I'm pretty sure that list contains all surnames used by people in recent 150 years. Actually, only 81 of them are still used nowadays. However, the Chinese history is so long (~5000 years) that I doubt there exists a full list.

I guess you could add a filter and let users choose if they want to use it. And store the name list in configuration so users can modify it.

After some investigation, I found there seemed to be surnames of 3 chars in 2000-3000 years ago. I don't think it's possible that they happen to be authors of any document...

@retorquere
Copy link
Owner

I tried using jieba, but it seems to think of a name as one word. Please correct me if that's not the case.

Jieba puts spaces between the characters which makes each character a "word" for the citekey formatter.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Aug 19, 2023

I tried using jieba, but it seems to think of a name as one word. Please correct me if that's not the case.

Jieba puts spaces between the characters which makes each character a "word" for the citekey formatter.

I think jieba's hanling of names is expected.

By the way, I observed some strange behaviours in capitalization of Chinese titles, which should be jieba's problem. Are you still using js-jieba? It seems to be outdated. I wonder what prevent you from using C library? Have you considered using WASM?

@retorquere
Copy link
Owner

By the way, I observed some strange behaviours in capitalization of Chinese titles, which should be jieba's problem. Are you still using js-jieba? It seems to be outdated.

Still using js-jieba, indeed

I wonder what prevent you from using C library?

Using C code in Zotero extensions is not trivial. It's not work I'm keen to pick up.

Have you considered using WASM?

I've looked into it briefly but I'd only consider it if there was a clean javascript wrapper for an already-compiled wasm binary. I don't want to get into a whole new programming language for this.

@retorquere
Copy link
Owner

The wrappers that do exist either assume node as an environment, where they use node-specific libraries like fs or stream to load wasm from disk, or web, where they assume the wasm will be served from a http(s) server. Zotero is a weird mix of both environments that no library understands. Pure-JS libraries usually run just fine. Anything that's not requires working around the library -- I have monkey-patches in place to reroute the jieba dictionary loading for example. I'll take a look at jieba-wasm.

@retorquere
Copy link
Owner

Can you see whether https://www.npmjs.com/package/jieba-wasm offers different cutting modes for cn and tw (jieba-js offers use of jieba-zh-tw and jieba-zh-cn as cutting modes)?

@retorquere
Copy link
Owner

Also what the different cut functions and their parameters mean?

@retorquere
Copy link
Owner

Is there also a full list of single-character Chinese family names?

@retorquere
Copy link
Owner

jieba puts spaces between the characters which makes each character a "word" for the citekey formatter.

this is incorrect. auth.jieba cuts up auth using whatever rules jieba applies -- I know absolutely nothing about Chinese, so I don't know what jieba does either. It is auth.ideographs that puts spaces between the characters which makes each character a "word" for the citekey formatter.

@retorquere
Copy link
Owner

ZAVVH2PE-apse does not contain samples, logs with samples have -refs- in the debug log ID. See point 1. above.

@retorquere
Copy link
Owner

Can you export the items from YeGr1kqXOgnV-6U3RYALN to RDF and attach them to this issue? YeGr1kqXOgnV-6U3RYALN does contain items but I cannot import them.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Aug 19, 2023

Can you see whether https://www.npmjs.com/package/jieba-wasm offers different cutting modes for cn and tw (jieba-js offers use of jieba-zh-tw and jieba-zh-cn as cutting modes)?

I'm creating a testing environment.

Is there also a full list of single-character Chinese family names?

Yes, there is, but do we really need it? I mean modern Chinese family names are either 2 chars or 1 char - so we just need a function to ensure that content in author field is truly a Chinese name, basically a utf8 range checker is ok. I can write it if needed.

jieba puts spaces between the characters which makes each character a "word" for the citekey formatter.

this is incorrect. auth.jieba cuts up auth using whatever rules jieba applies -- I know absolutely nothing about Chinese, so I don't know what jieba does either. It is auth.ideographs that puts spaces between the characters which makes each character a "word" for the citekey formatter.

It's hard for a segmentation library to deal with names. Analogically speaking, it may cut "WallaceGoodman" to "Wall Ace Good Man"

ZAVVH2PE-apse does not contain samples, logs with samples have -refs- in the debug log ID. See point 1. above.

I'm sorry. I'm creating a testing library. Soon it will be uploaded.

@retorquere
Copy link
Owner

Yes, there is, but do we really need it? I mean modern Chinese family names are either 2 chars or 1 char - so we just need a function to ensure that content in author field is truly a Chinese name, basically a utf8 range checker is ok. I can write it if needed.

No need, that's already in my current tests.

I'm creating a testing library. Soon it will be uploaded.

Thanks.

@retorquere
Copy link
Owner

Just got back from zotero-dev that it might be because Z6 only supports the MVP specification.

Is it possible to compile jieba-rs to stay within that spec (until Zotero 7 goes GA)?

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

Not a single name is correctly parsed... How did you use the lib?

Sorry -- I was just using auth. But auth.jieba returns all-lowercase now. What should I get instead of changsunwuji?

It should be ZhangsunWuji - I have written the right version in a filed of the items, maybe titles, I forgot it.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

Just got back from zotero-dev that it might be because Z6 only supports the MVP specification.

Is it possible to compile jieba-rs to stay within that spec (until Zotero 7 goes GA)?

Honestly, I'm not quite sure about how to do it now... Maybe I can manage it after several days of research.

@retorquere
Copy link
Owner

It should be ZhangsunWuji - I have written the right version in a filed of the items, maybe titles, I forgot it.

@book{changsunwuji::PossiblywrongpinyinHansAuthor:ZhangsunWuji,
@book{changsunwuji::PossiblywrongpinyinHantAuthor:ZhangsunWuji,
@book{dongfangbubai::HansAuthor:DongfangBubai,
@book{dongfangbubai::HantAuthor:DongfangBubai,
@book{linghuchong::HansAuthor:LinghuChong,
@book{linghuchong::HantAuthor:LinghuChong,
@book{murongfu::HansAuthor:MurongFu,
@book{murongfu::HantAuthor:MurongFu,
@book{ouyangfeng::HansAuthor:OuyangFeng,
@book{ouyangfeng::HantAuthor:OuyangFeng,
@book{renwohang::HansAuthor:RenWoxing,
@book{renwohang::HantAuthor:RenWoxing,
@book{shangguanyun::HansAuthor:ShangguanYun,
@book{shangguanyun::HantAuthor:ShangguanYun,
@book{simayi::HansAuthor:SimaYi,
@book{simayi::HantAuthor:SimaYi,
@book{simazhongda::HansAuthor:SimaZhongda,
@book{simazhongda::HantAuthor:SimaZhongda,
@book{weichirong::PossiblywrongpinyinHansAuthor:YuchiRong,
@book{weichirong::PossiblywrongpinyinHantAuthor:YuchiRong,
@book{xiahoudun::HansAuthor:XiahouDun,
@book{xiahoudun::HantAuthor:XiahouDun,
@book{yanglianting::HansAuthor:YangLianting,
@book{yanglianting::HantAuthor:YangLianting,
@book{zhugekongming::HansAuthor:ZhugeKongming,
@book{zhugekongming::HantAuthor:ZhugeKongming,
@book{zhugeliang::HansAuthor:ZhugeLiang,
@book{zhugeliang::HantAuthor:ZhugeLiang,

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

Just got back from zotero-dev that it might be because Z6 only supports the MVP specification.

Is it possible to compile jieba-rs to stay within that spec (until Zotero 7 goes GA)?

Honestly, I'm not quite sure about how to do it now... Maybe I can manage it after several days of research

It should be ZhangsunWuji - I have written the right version in a filed of the items, maybe titles, I forgot it.

@book{changsunwuji::PossiblywrongpinyinHansAuthor:ZhangsunWuji,
@book{changsunwuji::PossiblywrongpinyinHantAuthor:ZhangsunWuji,
@book{dongfangbubai::HansAuthor:DongfangBubai,
@book{dongfangbubai::HantAuthor:DongfangBubai,
@book{linghuchong::HansAuthor:LinghuChong,
@book{linghuchong::HantAuthor:LinghuChong,
@book{murongfu::HansAuthor:MurongFu,
@book{murongfu::HantAuthor:MurongFu,
@book{ouyangfeng::HansAuthor:OuyangFeng,
@book{ouyangfeng::HantAuthor:OuyangFeng,
@book{renwohang::HansAuthor:RenWoxing,
@book{renwohang::HantAuthor:RenWoxing,
@book{shangguanyun::HansAuthor:ShangguanYun,
@book{shangguanyun::HantAuthor:ShangguanYun,
@book{simayi::HansAuthor:SimaYi,
@book{simayi::HantAuthor:SimaYi,
@book{simazhongda::HansAuthor:SimaZhongda,
@book{simazhongda::HantAuthor:SimaZhongda,
@book{weichirong::PossiblywrongpinyinHansAuthor:YuchiRong,
@book{weichirong::PossiblywrongpinyinHantAuthor:YuchiRong,
@book{xiahoudun::HansAuthor:XiahouDun,
@book{xiahoudun::HantAuthor:XiahouDun,
@book{yanglianting::HansAuthor:YangLianting,
@book{yanglianting::HantAuthor:YangLianting,
@book{zhugekongming::HansAuthor:ZhugeKongming,
@book{zhugekongming::HantAuthor:ZhugeKongming,
@book{zhugeliang::HansAuthor:ZhugeLiang,
@book{zhugeliang::HantAuthor:ZhugeLiang,

Yes, and those words after the last colons are correctly capitalized.

@retorquere
Copy link
Owner

Honestly, I'm not quite sure about how to do it now... Maybe I can manage it after several days of research

How do I build the package?

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

Honestly, I'm not quite sure about how to do it now... Maybe I can manage it after several days of research

How do I build the package?

Sorry, what do you mean?

@retorquere
Copy link
Owner

Yes, and those words after the last colons are correctly capitalized.

But that's how they come out of auth.jieba. I'm not applying lower.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

Yes, and those words after the last colons are correctly capitalized.

But that's how they come out of auth.jieba. I'm not applying lower.

Have you used splitName in spellnames? It does give right answers for these names?

@retorquere
Copy link
Owner

Honestly, I'm not quite sure about how to do it now... Maybe I can manage it after several days of research

How do I build the package?

Sorry, what do you mean?

I've cloned WasmJieba, I just wanted to see if I could help with the compilation.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

Honestly, I'm not quite sure about how to do it now... Maybe I can manage it after several days of research

How do I build the package?

Sorry, what do you mean?

I've cloned WasmJieba, I just wanted to see if I could help with the compilation.

My build command is wasm-pack build --target web --out-dir pkg/web/pkg --out-name wasmjieba-web, where wasm-pack can be installed by cargo.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

Besides, in .cargo/config.toml the target should be wasm32-unknown-unknown.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

I believe it must have to do with wasm-opt. Could you try this unoptimized debug version and see if it works?

@retorquere
Copy link
Owner

I believe it must have to do with wasm-opt. Could you try this unoptimized debug version and see if it works?

CompileError: at offset 679499: bad type

@retorquere
Copy link
Owner

Does work on Zotero 7.

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

I believe it must have to do with wasm-opt. Could you try this unoptimized debug version and see if it works?

CompileError: at offset 679499: bad type

OK... no idea what this means, so maybe it's not related to wasm-opt. I'm going to dig into those rust wasm things and try that old firefox tomorrow. Tell me if you want more explanation for compilation.

@retorquere
Copy link
Owner

What is the full contents of the config.toml?

@ZnqbuZ
Copy link
Author

ZnqbuZ commented Nov 22, 2023

config.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants