NAME

Lingua::Stem::UniNE - University of Neuchâtel stemmers

VERSION

This document describes Lingua::Stem::UniNE v0.08.

SYNOPSIS

use Lingua::Stem::UniNE;

# create Bulgarian stemmer
$stemmer = Lingua::Stem::UniNE->new(language => 'bg');

# get stem for word
$stem = $stemmer->stem($word);

# get list of stems for list of words
@stems = $stemmer->stem(@words);

DESCRIPTION

This module contains a collection of stemmers for multiple languages based on stemming algorithms provided by Jacques Savoy of the University of Neuchâtel (UniNE). The languages currently implemented are Bulgarian, Czech, German, and Persian. Work is ongoing for Arabic, Bengali, Finnish, French, Hindi, Hungarian, Italian, Portuguese, Marathi, Russian, Spanish, and Swedish. The top priority is languages for which there are no stemmers available on CPAN.

Attributes

language

The following language codes are currently supported.

  ┌───────────┬────┐
  │ Bulgarian │ bg │
  │ Czech     │ cs │
  │ German    │ de │
  │ Persian   │ fa │
  └───────────┴────┘

They are in the two-letter ISO 639-1 format and are case-insensitive but are always returned in lowercase when requested.

# instantiate a stemmer object
$stemmer = Lingua::Stem::UniNE->new(language => $language);

# get current language
$language = $stemmer->language;

# change language
$stemmer->language($language);

Country codes such as cz for the Czech Republic are not supported, nor are IETF language tags such as fa-AF or fa-IR.

aggressive

By default, if there are multiple strengths of stemmers, a light stemmer will be used. When aggressive is set to true, an aggressive stemmer will be used if available.
```
$stemmer->aggressive(1);
```
Czech and German have aggressive options.

Methods

stem

Accepts a list of words, stems each word, and returns a list of stems. The list returned will always have the same number of elements in the same order as the list provided. When no stemming rules apply to a word, the original word is returned.
```
@stems = $stemmer->stem(@words);

# get the stem for a single word
$stem = $stemmer->stem($word);
```
The words should be provided as character strings and the stems are returned as character strings. Byte strings in arbitrary character encodings are intentionally not supported.

languages

Returns a list of supported two-letter language codes using lowercase letters.

# object method
@languages = $stemmer->languages;

# class method
@languages = Lingua::Stem::UniNE->languages;

AUTHOR

Nick Patch patch@cpan.org

This project is brought to you by Shutterstock. Additional open source projects from Shutterstock can be found at code.shutterstock.com.

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
lib/Lingua/Stem		lib/Lingua/Stem
t		t
xt/author		xt/author
.gitignore		.gitignore
.travis.yml		.travis.yml
Build.PL		Build.PL
Changes		Changes
INSTALL		INSTALL
LICENSE		LICENSE
MANIFEST		MANIFEST
MANIFEST.SKIP		MANIFEST.SKIP
README.md		README.md

License

patch/lingua-stem-unine-pm5

Folders and files

Latest commit

History

Repository files navigation

NAME

VERSION

SYNOPSIS

DESCRIPTION

Attributes

Methods

SEE ALSO

AUTHOR

COPYRIGHT AND LICENSE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages