Question: how to obtain multiple parsings? #99

fasiha · 2016-03-29T01:52:59Z

MeCab has a -N flag with which a user can specify the top-N results to get back. On http://www.atilika.org/ the Viterbi algorithm's output graph shows all possible morphemes, along with the cost of each path, so I'm sure it's possible to get the top, say, five results, but is there a simpler way to get this, the equivalent of mecab -N 5? I'm using UniDic. Thank you 🙇!

The text was updated successfully, but these errors were encountered:

fasiha · 2016-09-26T13:38:20Z

I see that, since I asked this, multiTokenizeNBest has been added to TokenizerBasevia d0ed0eb by EmanuelGedin! Awesome!

Question: I’ve been using Maven, e.g., for kuromoji-unidic but these were last updated in September 2015. I’ll try to get my build tool (leiningen) to use a cloned Git repo but I was wondering if there were plans on bumping the versions on Maven soon?

Thanks!

cmoen · 2016-09-27T06:38:45Z

We're planning on publishing a new version to Maven soon. We'd like to let the n-best APIs to bake a little bit before we release the new version, though. Any feedback you have on usage, etc. would be greatly appreciated. Thanks!

fasiha · 2016-09-28T02:10:15Z

Thanks for the info!

This is probably not the best place to ask about it, forgive me, but:

何できた？, vs.
バスできた。

With the first sentence, multiTokenizeNBest with UniDic returns several tokenizations 😄, but with the second, it returns an empty list 😭. I’m testing with 1fad6cc (HEAD as of yesterday), and in my informal testing, there’re a number of other sentences for which multiTokenizeNBest returns the empty list.

Any suggestions?

(In both these sentences, the lowest-cost tokenization uses 出来る, instead of で＋来る, which is what I expected. For the first sentence above, multiTokenizeNBest includes this expected tokenization as in the top-3 list.)

Update: same problem happens with IPADIC too, so it’s not a UniDic issue.

Update the second: multiTokenize with a high, but not too high, costSlack does work! multiTokenize with costSlack = 214748364 (one-tenth of Integer.MAX_VALUE) works as expected. Somewhere between 214'748'364 and 2'147'483'647=Integer.MAX_VALUE, something bad happens.

cmoen · 2016-11-01T10:35:18Z

Sorry for the slow response here. Emanuel is looking into a fix and we hope to have something you can test soon.

emmanuellegedin · 2016-11-07T17:47:27Z

I found the overflow error. A fix will be coming soon!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: how to obtain multiple parsings? #99

Question: how to obtain multiple parsings? #99

fasiha commented Mar 29, 2016

fasiha commented Sep 26, 2016 •

edited

cmoen commented Sep 27, 2016

fasiha commented Sep 28, 2016 •

edited

cmoen commented Nov 1, 2016

emmanuellegedin commented Nov 7, 2016

Question: how to obtain multiple parsings? #99

Question: how to obtain multiple parsings? #99

Comments

fasiha commented Mar 29, 2016

fasiha commented Sep 26, 2016 • edited

cmoen commented Sep 27, 2016

fasiha commented Sep 28, 2016 • edited

cmoen commented Nov 1, 2016

emmanuellegedin commented Nov 7, 2016

fasiha commented Sep 26, 2016 •

edited

fasiha commented Sep 28, 2016 •

edited