Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The tokenizing performance of mixed language #113

Open
kwkwvenusgod opened this issue Dec 16, 2016 · 0 comments
Open

The tokenizing performance of mixed language #113

kwkwvenusgod opened this issue Dec 16, 2016 · 0 comments

Comments

@kwkwvenusgod
Copy link

The performance of lib for pure Japanese language in text is fine. But when I am trying to tokenize some texts with mixed languages (English and Japanese) when the language recognizer detects it as Japanese, the lib will filter some English words which could be key information of the texts. How can I deal with the cases by tuning or changing some parameters when I use it? You can try the following plain text first. The English word "[ 1 ] SINGAPORE 2" will be filtered, for example.

"ELECTRONIC e チケットお客様控 TICKET ITINERARY/RECEIPT 国際線自動 チェックイン 機用2次元 バーコード For International Self Service Unit ・搭乗手続き時、又は、出入国審査時に提示を求められた場合には、出入国に必要な全ての書類又は滞在先住所等の情報、 e チケットお客様控 ( Itine- rary/Receipt ) 、及びパスポート等の公的書類をご提示ください。コードシェア便の搭乗手続きは、運航会社で承ります。 ・ e チケットお客様控えは、旅程変更又は払い戻しの際に必要となる場合がありますので、ご旅行終了までお待ちください。 ・ Please present all necessary country specific travel documentation or data such as staying address , Itinerary/Receipt , and positive identification such as passport , when you are requested to do so at check-in , or at Immigration/Customs . ・ Please retain Itinerary/Receipt throughout your journey . Itinerary/Receipt may be required in case of itinerary change or refund . 搭乗者名: CHINOMI/KENJI MR PASSENGER NAME 航空券番号: 2052402171368 予約番号: YIFRBL 発行日: 24JAN16 TICKET NUMBER RESERVATION CODE DATE OF ISSUE OF ISSUE ISS . OFFICE CODE PLACE 発行所: SINGAPORE - NH SKY WEB SG R 発行店舗コード: 32393852 旅程表 ITINERARY 都市 /空港 ターミナル 便名 日付 曜日 時間 クラス 運賃種別 予約状況 手荷物 有効期限 CITY/AIRPORT TERMINAL FLIGHT NO . DATE DAY TIME CLASS FARE BASIS STATUS BAGGAGE INVALID BEFORE/AFTER 出発 DEPARTURE 出発 DEPARTURE [ 1 ] SINGAPORE 2 NH844 27JAN16 WED 2220 W ( Y ) WRCS0 OK 2PC 27JAN/27JAN 到着 ARRIVAL 座席 SEAT 到着 ARRIVAL 運航航空会社 OPERATING CARRIER 備考 REMARKS TOKYO ( HANEDA ) INT 28JAN16 THU 0600 ALL NIPPON AIRWAYS 出発 DEPARTURE 出発 DEPARTURE [ 2 ] TOKYO ( HANEDA ) SURFACE 到着 ARRIVAL 座席 SEAT 到着 ARRIVAL 運航航空会社 OPERATING CARRIER 備考 REMARKS TOKYO ( NARITA ) 出発 DEPARTURE 出発 DEPARTURE [ 3 ] TOKYO ( NARITA ) 1 NH801 30JAN16 SAT 1805 V ( Y ) VRCS0 OK 2PC 30JAN/30JAN 到着 ARRIVAL 座席 SEAT 到着 ARRIVAL 運航航空会社 OPERATING CARRIER 備考 REMARKS SINGAPORE 2 31JAN16 SUN 0040 ALL NIPPON AIRWAYS 全日本空輸株式会社 ALL NIPPON AIRWAYS CO . , LTD . PAGE 1 / 2 PRINTED 24JAN16 ELECTRONIC e チケットお客様控 TICKET ITINERARY/RECEIPT 国際線自動 チェックイン 機用2次元 バーコード For International Self Service Unit 搭乗者名: CHINOMI/KENJI MR PASSENGER NAME 航空券番号: 2052402171368 予約番号: YIFRBL 発行日: 24JAN16 TICKET NUMBER RESERVATION CODE DATE OF ISSUE OF ISSUE ISS . OFFICE CODE PLACE 発行所: SINGAPORE - NH SKY WEB SG R 発行店舗コード: 32393852 運賃/航空券情報 FARE/TICKET INFORMATION 運賃額: 支払運賃額: FARE SGD830.00 EQUIV . FARE PAID 税金・料金等合計: 航空会社手数料: TAXES/FEES/CHARGES/AIRLINE CHARGES TOTAL SGD74.90 AIRLINE SERVICE CHARGE SGD0.00 ツアーコード: TOTAL ( AIRLINE SERVICE CHARGE is not included . ) SGD904.90 TOUR CODE 支払手段: FORM OF PAYMENT CCCAXXXXXXXXXXXX3426**/XX-XX S 811289 制限事項: FLT/CNX/CHG RESTRICTED CHECK FARE RULE ENDORSEMENTS/RESTRICTIONS 運賃詳細: SIN NH TYO Q14.23 259.84NH SIN316.79NUC590.86END ROE1.404690 FARE CALCULATION 税金・料金等 詳細: SGD8.80YQ/ SGD19.90SG/ SGD6.10OP/ SGD8.00OO/ SGD25.70SW/ SGD6.40OI/ TAXES/FEES/CHARGES/ AIRLINE CHARGES DETAILS シンガポールの空港から出発する場合、上記金額には OP TAX ( Aviation Levy ) が含まれています。 OP tax ( Aviation Levy ) is included on a ticket when departing from the airport in Singapore . 原券: 交換券: ORIGINAL ISSUE ISSUED IN EXCHANGE FOR ご注意及び契約条件 /TICKET NOTICE ・運送やその他のサービスは、各運送人の運送約款に従います。運送約款については発行運送人にご確認ください。なお、 ANA の運送による日本国内区間のみの旅行であって国際運送の一環ではない場合、 ANAの 国内旅客運送約款が適用となります。 ・旅客が出発国以外の国に最終到達地又は寄港地を有する旅行を行なう場合は、その旅客の旅程全体 ( 同一国内の区間を含む ) についてモントリオール条約又はその前身のワルソー条約 ( その改正を含む ) の適用を 受けることがあります。その旅客に対し適用となる条約 ( 適用タリフに含まれる特別運送契約を含む ) が、運送人の責任を制限することがあります。詳細は、各運送人へお問い合わせください。 ・エアゾール、花火、可燃性液体などの危険物は航空機へ持込はできません。これら制限の詳細は航空会社へお問い合わせください。 ・このお客様控とともに、航空券の一部を成し、かつ「契約条件及びその他重要事項」を含む、一連のご案内書をお受け取りになります。これらのご案内書をお受け取りになられたことを必ずご確認いただき、 もしお受け取りになられていない場合には、旅行開始前に次の URL : https : //www . ana . co . jp/other/int/meta/0192.html ? CONNECTION_KIND\u003djp\u0026LANG\u003dj で入手いただくか、又は、発行運送人若しくは旅行会社へ 連絡ください。 ・このお客様控は、モントリオール条約及びワルソー条約第3条でいう「航空券」の一部をなします。ただし、航空会社が第3条の要件を満たす別の書類を旅客へ渡す場合を除きます。 ・ ANA のコンピュータシステムに保管されている eチケット情報と e チケットお客様控の情報に相違がある場合、コンピュータシステム上の e チケット情報を有効と致します。 ・ Carriage and other services provided by the carrier are subject to conditions of carriage , which are hereby incorporated by reference . These conditions may be obtained from the issuing carrier . Please note that if you travel on ANA \u0027s domestic sector flights within Japan only , without any international connecting flights , ANA \u0027s Conditions of Carriage for Passengers and Baggage for domestic flights will apply . ・ Passengers on a journey involving an ultimate destination or a stop in a country other than the country of departure are advised that international treaties known as the Montreal Convention , or its predecessor , the Warsaw Convention , including its amendments ( the Warsaw Convention System ) , may apply to the entire journey , including any portion thereof within a country . For such passengers , the applicable treaty , including special contracts of carriage embodied in any applicable tariffs , governs and may limit the liability of the carrier . Check with your carrier for more information . ・ The carriage of certain hazardous materials , like aerosols , fireworks , and flammable liquids , aboard the aircraft is forbidden . If you do not understand these restrictions , further information may be obtained from your airline . ・ Further information may be obtained from the carrier . With this ticket you will receive a set of notices which forms part of the ticket and contains the " Conditions of Contract and Other Important Notices " . Please make sure that you have received these notices , and if not , obtain copies prior to the commencement of your journey at the following URL : https : //www . ana . co . jp/other/int/meta/0192.html ? CONNECTION_KIND\u003djp\u0026LANG\u003de , or contact the issuing airline or travel agent . ・ This Itinerary/Receipt constitutes the " passenger ticket " for the purposes of Article 3 of the Montreal Convention and the Warsaw Convention , except where the carrier delivers to the passenger another document complying with the requirements of Article 3. ・ Ticketing information contained in ANA \u0027s computer system shall prevail should any discrepancy occur between the Itinerary/Receipt held by the customer and the ticketing information in our computer system . 全日本空輸株式会社 ALL NIPPON AIRWAYS CO . , LTD . PAGE 2 / 2 PRINTED 24JAN16"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant