Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FONEM Rule V-10 #175

Open
gewy opened this issue Aug 20, 2020 · 14 comments
Open

FONEM Rule V-10 #175

gewy opened this issue Aug 20, 2020 · 14 comments
Labels

Comments

@gewy
Copy link

gewy commented Aug 20, 2020

Hi,
Rule V-10 seams to be incorrect.
The paper say : "Replace Y by I except if Y is between two vowels".
TYOU and YOU should give TIOU and IOU and not be inchanged.
Regards

@Yomguithereal
Copy link
Owner

Hello @gewy. Would you have some time to open a PR on the subject along with a unit test?

Yomguithereal added a commit that referenced this issue Sep 2, 2020
@Yomguithereal
Copy link
Owner

Hello @gewy. I just pushed a commit fixing rule V-10. I add to interpret some details of the paper to make this work because the way the algo is described is not completely sound. What do you think of the solution?

@gewy
Copy link
Author

gewy commented Sep 2, 2020

Hi, My implementation in Java :
new Rule("V-10", "(?<=^|[^aeiouy])y|y(?=[^aeiouy]|$)", "I");
Test on vowels is not necessary IMHO.
Having consonant on one side (or ^$) is enough to proove that we don't have vowels on both sides.

BTW I will check but I am not sure that C-27 and C-28 are corrects either.

@Yomguithereal
Copy link
Owner

new Rule("V-10", "(?<=^|[^aeiouy])y|y(?=[^aeiouy]|$)", "I");

Unfortunately JavaScript does not support lookbehind assertions in regex (at least not all engines, since lookbehinds were added recently to the specs).

BTW I will check but I am not sure that C-27 and C-28 are corrects either.

Fair enough. Tell me when you know and I'll make the required changes on my side.

@gewy
Copy link
Author

gewy commented Sep 2, 2020

new Rule("V-10", "(^|[^aeiouy])y|y([^aeiouy]|$)", "$1I$2");
do not work in JS ??

@gewy
Copy link
Author

gewy commented Sep 2, 2020

C-27 the document says Z with vowels BEFORE and you regex is Z(?=${V})

Yomguithereal added a commit that referenced this issue Sep 2, 2020
@gewy
Copy link
Author

gewy commented Sep 2, 2020

C-28 exclude SS between vowels, your regex check the right side only (cf. V-10)

@Yomguithereal
Copy link
Owner

Yomguithereal commented Sep 2, 2020

I have simplified V-10 rule as per your suggestion. Concerning C-27, I have an interpretation question: should OZOUADE finally be OSWADE then (I am fine with this). But should POUYEZ become POUYES as per C-27 (I am less fine with this). Sorry if this is obvious but I did not read this paper since a very long time.

Yomguithereal added a commit that referenced this issue Sep 2, 2020
@Yomguithereal
Copy link
Owner

I have updated rule C-28.

@gewy
Copy link
Author

gewy commented Sep 3, 2020 via email

@Yomguithereal
Copy link
Owner

So what did you choose regarding C-27? Do you get POUYES?

@gewy
Copy link
Author

gewy commented Sep 3, 2020 via email

@gewy
Copy link
Author

gewy commented Sep 3, 2020 via email

@Yomguithereal
Copy link
Owner

Yes, this algorithm is not very good outside of its original goal to match names from Saguenay etc. I work on a personal algorithm for French that is way better but is geared to keep vocalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants