Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update various regex escape sequences #2378

Closed
pombredanne opened this issue Aug 28, 2019 · 14 comments · Fixed by #2747
Closed

Update various regex escape sequences #2378

pombredanne opened this issue Aug 28, 2019 · 14 comments · Fixed by #2747

Comments

@pombredanne
Copy link
Contributor

The latest versions of Python are more strict wrt. escape in regex.
For instance with 3.6.8, there are 10+ warnings like this one:

...
lib/python3.6/site-packages/nltk/featstruct.py:2092: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile('(-?\d+):(-?\d+)')

The regex(es) should be updated to silence these warnings.

@PabloDino
Copy link

If there is no one working on this, I would like to. Can you tell steps to duplicate the issue please?

@pombredanne
Copy link
Contributor Author

@PabloDino Install Python 3.6.8 or later and try to import every module. The fix the regex either by using raw strings or using proper escape such that this works both on Python 2 and 3

@PabloDino
Copy link

I'm on it- been working through some exercises but not seeing any warnings. Can you post a code snippet in which the warnings manifest pl

@pombredanne
Copy link
Contributor Author

@PabloDino :

$ python --version
Python 3.6.8
$ git clone git://github.com/nltk/nltk.git
$ pip install pytest
$ pytest -vvs nltk/ --collect-only
========================================= warnings summary =========================================
nltk/nltk/featstruct.py:1295
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
    name, n = re.sub("\d+$", "", var.name), 2

nltk/nltk/featstruct.py:2091
  /home/pombreda/tmp/nl/nltk/nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
    RANGE_RE = re.compile("(-?\d+):(-?\d+)")

nltk/nltk/sem/evaluate.py:307
  /home/pombreda/tmp/nl/nltk/nltk/sem/evaluate.py:307: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/sem/relextract.py:128
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
    ENT = re.compile("&(\w+?);")

nltk/nltk/sem/relextract.py:407
  /home/pombreda/tmp/nl/nltk/nltk/sem/relextract.py:407: DeprecationWarning: invalid escape sequence \s
    """

nltk/nltk/sem/boxer.py:776
  /home/pombreda/tmp/nl/nltk/nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
    assert re.match("^[exps]\d+$", var), var

nltk/nltk/sem/drt.py:716
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
    + [" \  " + blank + line for line in term_lines[1:2]]

nltk/nltk/sem/drt.py:717
  /home/pombreda/tmp/nl/nltk/nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
    + [" /\ " + var_string + line for line in term_lines[2:3]]

nltk/nltk/grammar.py:1291
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1291: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/grammar.py:1463
  /home/pombreda/tmp/nl/nltk/nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
    _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)

nltk/nltk/text.py:650
  /home/pombreda/tmp/nl/nltk/nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
    _CONTEXT_RE = re.compile("\w+|[\.\!\?]")

nltk/nltk/tokenize/punkt.py:1462
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
    pat = "\s*".join(re.escape(c) for c in tok)

nltk/nltk/tokenize/regexp.py:100
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:100: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/regexp.py:193
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/regexp.py:193: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/tokenize/repp.py:133
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
    line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)

nltk/nltk/tokenize/texttiling.py:96
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
    c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)

nltk/nltk/tokenize/texttiling.py:229
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
    matches = re.finditer("\w+", text)

nltk/nltk/tokenize/toktok.py:53
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
    FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "

nltk/nltk/tokenize/toktok.py:55
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
    FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "

nltk/nltk/tokenize/toktok.py:62
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
    PIPE = re.compile("\|"), " &#124; "

nltk/nltk/tokenize/treebank.py:269
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:269: DeprecationWarning: invalid escape sequence \]
    """

nltk/nltk/tokenize/treebank.py:273
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:273: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tokenize/treebank.py:277
  /home/pombreda/tmp/nl/nltk/nltk/tokenize/treebank.py:277: DeprecationWarning: invalid escape sequence \s
    re.compile(pattern.replace("(?#X)", "\s"))

nltk/nltk/tree.py:99
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:99: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/tree.py:652
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
    if re.search("\s", brackets):

nltk/nltk/tree.py:658
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
    node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:660
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
    leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)

nltk/nltk/tree.py:662
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
    "%s\s*(%s)?|%s|(%s)"

nltk/nltk/tree.py:900
  /home/pombreda/tmp/nl/nltk/nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
    reserved_chars = re.compile("([#\$%&~_\{\}])")

nltk/nltk/parse/chart.py:1034
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1034: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1073
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1073: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1128
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1128: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1148
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1148: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1218
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1218: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/chart.py:1241
  /home/pombreda/tmp/nl/nltk/nltk/parse/chart.py:1241: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:270
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:270: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/parse/featurechart.py:369
  /home/pombreda/tmp/nl/nltk/nltk/parse/featurechart.py:369: DeprecationWarning: invalid escape sequence \*
    """

nltk/nltk/tag/sequential.py:730
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
    elif re.match("\w+$", word):

nltk/nltk/tag/sequential.py:724
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
    elif re.match("\W+$", word):

nltk/nltk/tag/sequential.py:722
  /home/pombreda/tmp/nl/nltk/nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
    if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):

nltk/nltk/classify/rte_classify.py:61
  /home/pombreda/tmp/nl/nltk/nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
    tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")

nltk/nltk/classify/maxent.py:1351
  /home/pombreda/tmp/nl/nltk/nltk/classify/maxent.py:1351: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/chunk/util.py:371
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
    _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")

nltk/nltk/chunk/util.py:517
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
    _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')

nltk/nltk/chunk/util.py:526
  /home/pombreda/tmp/nl/nltk/nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
    for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):

nltk/nltk/chunk/regexp.py:70
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
    _BRACKETS = re.compile("[^\{\}]+")

nltk/nltk/chunk/regexp.py:215
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
    s = re.sub("\{\}", "", s)

nltk/nltk/chunk/regexp.py:426
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)

nltk/nltk/chunk/regexp.py:471
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)

nltk/nltk/chunk/regexp.py:510
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
    regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))

nltk/nltk/chunk/regexp.py:511
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)

nltk/nltk/chunk/regexp.py:575
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)

nltk/nltk/chunk/regexp.py:708
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
    "(?P<left>%s)\{(?P<right>%s)"

nltk/nltk/chunk/regexp.py:714
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)

nltk/nltk/chunk/regexp.py:778
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
    "(?P<left>%s)\}(?P<right>%s)"

nltk/nltk/chunk/regexp.py:784
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
    RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)

nltk/nltk/chunk/regexp.py:896
nltk/nltk/chunk/regexp.py:896
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
    r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")

nltk/nltk/chunk/regexp.py:1175
  /home/pombreda/tmp/nl/nltk/nltk/chunk/regexp.py:1175: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/inference/discourse.py:44
  /home/pombreda/tmp/nl/nltk/nltk/inference/discourse.py:44: DeprecationWarning: invalid escape sequence \ 
    """

nltk/nltk/stem/lancaster.py:192
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")

nltk/nltk/stem/lancaster.py:225
  /home/pombreda/tmp/nl/nltk/nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
    valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")

nltk/nltk/stem/porter.py:177
  /home/pombreda/tmp/nl/nltk/nltk/stem/porter.py:177: DeprecationWarning: invalid escape sequence \m
    """

nltk/nltk/corpus/__init__.py:116
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:123
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
    ".*\.(test|train).*",

nltk/nltk/corpus/__init__.py:126
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
    crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")

nltk/nltk/corpus/__init__.py:128
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
    "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"

nltk/nltk/corpus/__init__.py:311
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
    "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"

nltk/nltk/corpus/__init__.py:335
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
    twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")

nltk/nltk/corpus/__init__.py:364
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
    wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")

nltk/nltk/corpus/__init__.py:374
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:383
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:392
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/__init__.py:401
  /home/pombreda/tmp/nl/nltk/nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
    "frames/.*\.xml",

nltk/nltk/corpus/reader/plaintext.py:62
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/plaintext.py:62: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/corpus/reader/util.py:635
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
    if re.match("^\d+-\d+", line) is not None:

nltk/nltk/corpus/reader/util.py:859
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
    if re.match("======+\s*$", line):

nltk/nltk/corpus/reader/api.py:77
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
    m = re.match("(.*\.zip)/?(.*)$|", root)

nltk/nltk/corpus/reader/timit.py:165
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
    encoding = [(".*\.wav", None), (".*", encoding)]

nltk/nltk/corpus/reader/bracket_parse.py:214
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bracket_parse.py:214: DeprecationWarning: invalid escape sequence \.
    "alpino\.xml",

nltk/nltk/corpus/reader/xmldocs.py:232
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
    _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")

nltk/nltk/toolbox.py:209
  /home/pombreda/tmp/nl/nltk/nltk/toolbox.py:209: DeprecationWarning: invalid escape sequence \_
    """

nltk/nltk/corpus/reader/bnc.py:29
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/bnc.py:29: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/switchboard.py:113
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
    _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")

nltk/nltk/corpus/reader/childes.py:281
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
    m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)

nltk/nltk/corpus/reader/framenet.py:2753
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/framenet.py:2753: DeprecationWarning: invalid escape sequence \w
    """

nltk/nltk/corpus/reader/udhr.py:30
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
    ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),

nltk/nltk/corpus/reader/twitter.py:54
  /home/pombreda/tmp/nl/nltk/nltk/corpus/reader/twitter.py:54: DeprecationWarning: invalid escape sequence \.
    """

nltk/nltk/ccg/combinator.py:225
  /home/pombreda/tmp/nl/nltk/nltk/ccg/combinator.py:225: DeprecationWarning: invalid escape sequence \Y
    """

nltk/nltk/treetransforms.py:108
  /home/pombreda/tmp/nl/nltk/nltk/treetransforms.py:108: DeprecationWarning: invalid escape sequence \ 
    """

@pombredanne
Copy link
Contributor Author

And FWIW: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.

Changed in version 3.6: Unrecognized escape sequences produce a DeprecationWarning. In some future version of Python they will be a SyntaxError.

@PabloDino
Copy link

$ python --version
Python 3.6.7
$ pytest --version
This is pytest version 5.1.2, imported from ***/pytest.py
$ pytest -vvs nltk/ --collect-only
============================= test session starts ==============================
platform linux -- Python 3.6.7, pytest-5.1.2, py-1.8.0, pluggy-0.12.0 -- ***/python3
cachedir: .pytest_cache
rootdir: ***/nltk
collected 381 items
<Package ***/nltk/test/unit>

Unit tests for nltk.compat.
See also nltk/test/compat.doctest.






Unit tests for nltk.metrics.aline

Test Aline algorithm for aligning phonetic sequences


Test aline for computing the difference between two segments

Tests for Brill tagger.















Test for bug #1597

    Ensures that curly bracket quantifiers can be used inside a chunk rule.
    This type of quantifier has been used for the supplementary example
    in http://www.nltk.org/book/ch07.html#exploring-text-corpora.
Unit tests for nltk.classify. See also: nltk/test/classify.doctest Text constructed using: http://www.nltk.org/book/ch01.html Mock test for Stanford CoreNLP wrappers. Corpus View Regression Tests Class containing unit tests for nltk.metrics.agreement.Disagreement. More advanced test, based on http://www.agreestat.com/research_papers/onkrippendorffalpha.pdf Same more advanced example, but with 1 rating removed. Again, removal of that 1 rating shoudl not matter. Simple test, based on https://github.com/foolswood/krippendorffs_alpha/raw/master/krippendorff.pdf. Same simple test with 1 rating removed. Removal of that rating should not matter: K-Apha ignores items with only 1 rating. Regression tests for `json2csv()` and `json2csv_entities()` in Twitter package. Sanity check that file comparison is not giving false positives. Unit tests for nltk.corpus.nombank Tests for nltk.pos_tag The following test performs a random series of reads, seeks, and tells, and checks that the results are consistent. Unit tests for Senna Unittest for nltk.classify.senna Senna pipeline interface Unittest for nltk.tag.senna this unit testing for test the snowball arabic light stemmer this stemmer deals with prefixes and suffixes Test for bug https://github.com//issues/1581
    Ensures that 'oed' can be stemmed without throwing an error.
  <TestCaseFunction test_vocabulary_martin_mode>
    Tests all words from the test vocabulary provided by M Porter
    
    The sample vocabulary and output were sourced from:
    http://tartarus.org/martin/PorterStemmer/voc.txt
    http://tartarus.org/martin/PorterStemmer/output.txt
    and are linked to from the Porter Stemmer algorithm's homepage
    at
    http://tartarus.org/martin/PorterStemmer/
  <TestCaseFunction test_vocabulary_nltk_mode>
  <TestCaseFunction test_vocabulary_original_mode>
Unit tests for nltk.tgrep. Class containing unit tests for nltk.tgrep. Test error handling of undefined tgrep operators. Test that comments are correctly filtered out of tgrep search strings. Test the Basic Examples from the TGrep2 manual. Test labeled nodes.
    Test case from Emily M. Bender.
  <TestCaseFunction test_multiple_conjs>
    Test that multiple (3 or more) conjunctions of node relations are
    handled properly.
  <TestCaseFunction test_node_encoding>
    Test that tgrep search strings handles bytes and strs the same
    way.
  <TestCaseFunction test_node_nocase>
    Test selecting nodes using case insensitive node names.
  <TestCaseFunction test_node_noleaves>
    Test node name matching with the search_leaves flag set to False.
  <TestCaseFunction test_node_printing>
    Test that the tgrep print operator ' is properly ignored.
  <TestCaseFunction test_node_quoted>
    Test selecting nodes using quoted node names.
  <TestCaseFunction test_node_regex>
    Test regex matching on nodes.
  <TestCaseFunction test_node_regex_2>
    Test regex matching on nodes.
  <TestCaseFunction test_node_simple>
    Test a simple use of tgrep for finding nodes matching a given
    pattern.
  <TestCaseFunction test_node_tree_position>
    Test matching on nodes based on NLTK tree position.
  <TestCaseFunction test_rel_precedence>
    Test matching nodes based on precedence relations.
  <TestCaseFunction test_rel_sister_nodes>
    Test matching sister nodes in a tree.
  <TestCaseFunction test_tokenize_encoding>
    Test that tokenization handles bytes and strs the same way.
  <TestCaseFunction test_tokenize_examples>
    Test tokenization of the TGrep2 manual example patterns.
  <TestCaseFunction test_tokenize_link_types>
    Test tokenization of basic link types.
  <TestCaseFunction test_tokenize_macros>
    Test tokenization of macro definitions.
  <TestCaseFunction test_tokenize_node_labels>
    Test tokenization of labeled nodes.
  <TestCaseFunction test_tokenize_nodenames>
    Test tokenization of node names.
  <TestCaseFunction test_tokenize_quoting>
    Test tokenization of quoting.
  <TestCaseFunction test_tokenize_segmented_patterns>
    Test tokenization of segmented patterns.
  <TestCaseFunction test_tokenize_simple>
    Simple test of tokenization.
  <TestCaseFunction test_trailing_semicolon>
    Test that semicolons at the end of a tgrep2 search string won't
    cause a parse failure.
  <TestCaseFunction test_use_macros>
    Test defining and using tgrep2 macros.
  <TestCaseFunction tests_rel_dominance>
    Test matching nodes based on dominance relations.
  <TestCaseFunction tests_rel_indexed_children>
    Test matching nodes based on their index in their parent node.
Unit tests for nltk.tokenize. See also nltk/test/tokenize.doctest Test padding of asterisk for word tokenization. Test padding of dotdot* for word tokenization. Test a string that resembles a phone number but contains a newline Test remove_handle() from casual.py with specially crafted edge cases Test SyllableTokenizer tokenizer. Test the Stanford Word Segmenter for Arabic (default config) Test the Stanford Word Segmenter for Chinese (default config) Test TreebankWordTokenizer.span_tokenize function Test TweetTokenizer using words with special and accented characters. Test word_tokenize function Tests for static parts of Twitter package Tests that Twitter credentials information from file is handled correctly. Default credentials file is identified Default credentials file has been read correctluy Path to default credentials file is well-formed, given specified subdir. Setting subdir to empty path should raise an error. Setting subdir to `None` should raise an error. Test that environment variable has been read correctly. Credentials file 'bad_oauth1-1.txt' is incomplete First key in credentials file 'bad_oauth1-2.txt' is ill-formed First key in credentials file 'bad_oauth1-2.txt' is ill-formed Setting subdir to nonexistent directory should raise an error. Defaults for authentication will fail since 'credentials.txt' not present in default subdir, as read from `os.environ['TWITTER']`. Credentials file 'foobar' cannot be found in default subdir. Unit tests for nltk.corpus.wordnet See also nltk/test/wordnet.doctest Tests for NgramCounter that only involve lookup, no modification. Unit tests for MLE ngram model. MLE trigram model tests Unit tests for Lidstone class

@stevenbird
Copy link
Member

I'm seeing the same output as @pombredanne.

@ab-10
Copy link
Contributor

ab-10 commented Sep 30, 2019

Hi, is @PabloDino still planning to work on the issue?

I have been able to replicate @pombredanne 's output and would like to work on fixing this issue.

@PabloDino
Copy link

PabloDino commented Oct 12, 2019 via email

@gertjanwytynck
Copy link

@ab-10 Have you been able to fix those dep warnings?

@tirkarthi
Copy link
Contributor

An updated list with Python 3.8 with running below command :

find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}
./nltk/chat/iesha.py:52: DeprecationWarning: invalid escape sequence \<
  "u think I can%2??! really?? kekeke \<_\<",
./nltk/tag/sequential.py:730: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word):
./nltk/tag/sequential.py:724: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word):
./nltk/tag/sequential.py:722: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word):
./nltk/app/chunkparser_app.py:206: DeprecationWarning: invalid escape sequence \#
  "\t<regexp><\#><CD> # This is a comment...</regexp>\n"
./nltk/app/chunkparser_app.py:315: DeprecationWarning: invalid escape sequence \s
  grammar = re.sub("\n\s+", "\n", grammar)
./nltk/app/chunkparser_app.py:1061: DeprecationWarning: invalid escape sequence \w
  key=lambda t_w: re.match("\w+", t_w[0])
./nltk/app/chunkparser_app.py:1422: DeprecationWarning: invalid escape sequence \#
  "^\# Regexp Chunk Parsing Grammar[\s\S]*" "F-score:.*\n", "", grammar
./nltk/sem/cooper_storage.py:48: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/relextract.py:128: DeprecationWarning: invalid escape sequence \w
  ENT = re.compile("&(\w+?);")
./nltk/sem/relextract.py:382: DeprecationWarning: invalid escape sequence \s
  roles = """
./nltk/sem/boxer.py:776: DeprecationWarning: invalid escape sequence \d
  assert re.match("^[exps]\d+$", var), var
./nltk/sem/drt.py:716: DeprecationWarning: invalid escape sequence \ 
  + [" \  " + blank + line for line in term_lines[1:2]]
./nltk/sem/drt.py:717: DeprecationWarning: invalid escape sequence \ 
  + [" /\ " + var_string + line for line in term_lines[2:3]]
./nltk/sem/chat80.py:9: DeprecationWarning: invalid escape sequence \P
  """
./nltk/sem/chat80.py:705: DeprecationWarning: invalid escape sequence \P
  template = "PropN[num=sg, sem=<\P.(P %s)>] -> '%s'\n"
./nltk/sem/evaluate.py:257: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/corpus/reader/util.py:635: DeprecationWarning: invalid escape sequence \d
  if re.match("^\d+-\d+", line) is not None:
./nltk/corpus/reader/util.py:859: DeprecationWarning: invalid escape sequence \s
  if re.match("======+\s*$", line):
./nltk/corpus/reader/framenet.py:2748: DeprecationWarning: invalid escape sequence \w
  """
./nltk/corpus/reader/bracket_parse.py:215: DeprecationWarning: invalid escape sequence \.
  "alpino\.xml",
./nltk/corpus/reader/twitter.py:25: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/xmldocs.py:232: DeprecationWarning: invalid escape sequence \s
  _XML_TAG_NAME = re.compile("<\s*/?\s*([^\s>]+)")
./nltk/corpus/reader/bnc.py:15: DeprecationWarning: invalid escape sequence \w
  """Corpus reader for the XML version of the British National Corpus.
./nltk/corpus/reader/udhr.py:30: DeprecationWarning: invalid escape sequence \-
  ("Abkhaz\-Cyrillic\+Abkh", "cp1251"),
./nltk/corpus/reader/timit.py:165: DeprecationWarning: invalid escape sequence \.
  encoding = [(".*\.wav", None), (".*", encoding)]
./nltk/corpus/reader/childes.py:281: DeprecationWarning: invalid escape sequence \d
  m = re.match("P(\d+)Y(\d+)M?(\d?\d?)D?", age_year)
./nltk/corpus/reader/plaintext.py:47: DeprecationWarning: invalid escape sequence \.
  """
./nltk/corpus/reader/switchboard.py:113: DeprecationWarning: invalid escape sequence \w
  _UTTERANCE_RE = re.compile("(\w+)\.(\d+)\:\s*(.*)")
./nltk/corpus/reader/api.py:77: DeprecationWarning: invalid escape sequence \.
  m = re.match("(.*\.zip)/?(.*)$|", root)
./nltk/corpus/__init__.py:116: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:123: DeprecationWarning: invalid escape sequence \.
  ".*\.(test|train).*",
./nltk/corpus/__init__.py:126: DeprecationWarning: invalid escape sequence \.
  crubadan = LazyCorpusLoader("crubadan", CrubadanCorpusReader, ".*\.txt")
./nltk/corpus/__init__.py:128: DeprecationWarning: invalid escape sequence \.
  "dependency_treebank", DependencyCorpusReader, ".*\.dp", encoding="ascii"
./nltk/corpus/__init__.py:311: DeprecationWarning: invalid escape sequence \.
  "timit", TimitTaggedCorpusReader, ".+\.tags", tagset="wsj", encoding="ascii"
./nltk/corpus/__init__.py:335: DeprecationWarning: invalid escape sequence \.
  twitter_samples = LazyCorpusLoader("twitter_samples", TwitterCorpusReader, ".*\.json")
./nltk/corpus/__init__.py:364: DeprecationWarning: invalid escape sequence \.
  wordnet_ic = LazyCorpusLoader("wordnet_ic", WordNetICCorpusReader, ".*\.dat")
./nltk/corpus/__init__.py:374: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:383: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:392: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/corpus/__init__.py:401: DeprecationWarning: invalid escape sequence \.
  "frames/.*\.xml",
./nltk/text.py:650: DeprecationWarning: invalid escape sequence \w
  _CONTEXT_RE = re.compile("\w+|[\.\!\?]")
./nltk/inference/discourse.py:9: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:38: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/tree.py:652: DeprecationWarning: invalid escape sequence \s
  if re.search("\s", brackets):
./nltk/tree.py:658: DeprecationWarning: invalid escape sequence \s
  node_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:660: DeprecationWarning: invalid escape sequence \s
  leaf_pattern = "[^\s%s%s]+" % (open_pattern, close_pattern)
./nltk/tree.py:662: DeprecationWarning: invalid escape sequence \s
  "%s\s*(%s)?|%s|(%s)"
./nltk/tree.py:900: DeprecationWarning: invalid escape sequence \$
  reserved_chars = re.compile("([#\$%&~_\{\}])")
./nltk/ccg/combinator.py:220: DeprecationWarning: invalid escape sequence \Y
  """
./nltk/tokenize/toktok.py:53: DeprecationWarning: invalid escape sequence \]
  FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"\])}»›”؟¡%٪°±©®।॥…])'), r" \1 "
./nltk/tokenize/toktok.py:55: DeprecationWarning: invalid escape sequence \[
  FUNKY_PUNCT_2 = re.compile(u"([({\[“‘„‚«‹「『])"), r" \1 "
./nltk/tokenize/toktok.py:62: DeprecationWarning: invalid escape sequence \|
  PIPE = re.compile("\|"), " &#124; "
./nltk/tokenize/punkt.py:1462: DeprecationWarning: invalid escape sequence \s
  pat = "\s*".join(re.escape(c) for c in tok)
./nltk/tokenize/repp.py:133: DeprecationWarning: invalid escape sequence \(
  line_regex = re.compile("^\((\d+), (\d+), (.+)\)$", re.MULTILINE)
./nltk/tokenize/nist.py:81: DeprecationWarning: invalid escape sequence \{
  PUNCT = re.compile("([\{-\~\[-\` -\&\(-\+\:-\@\/])"), " \\1 "
./nltk/tokenize/nist.py:83: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_PRECEED = re.compile("([^0-9])([\.,])"), "\\1 \\2 "
./nltk/tokenize/nist.py:85: DeprecationWarning: invalid escape sequence \.
  PERIOD_COMMA_FOLLOW = re.compile("([\.,])([^0-9])"), " \\1 \\2"
./nltk/tokenize/treebank.py:194: DeprecationWarning: invalid escape sequence \]
  """
./nltk/tokenize/treebank.py:255: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/treebank.py:259: DeprecationWarning: invalid escape sequence \s
  re.compile(pattern.replace("(?#X)", "\s"))
./nltk/tokenize/texttiling.py:96: DeprecationWarning: invalid escape sequence \-
  c for c in lowercase_text if re.match("[a-z\-' \n\t]", c)
./nltk/tokenize/texttiling.py:229: DeprecationWarning: invalid escape sequence \w
  matches = re.finditer("\w+", text)
./nltk/tokenize/regexp.py:76: DeprecationWarning: invalid escape sequence \w
  """
./nltk/tokenize/regexp.py:184: DeprecationWarning: invalid escape sequence \w
  """
./nltk/classify/maxent.py:1292: DeprecationWarning: invalid escape sequence \ 
  """
./nltk/classify/rte_classify.py:61: DeprecationWarning: invalid escape sequence \w
  tokenizer = RegexpTokenizer("[\w.@:/]+|\w+|\$[\d.]+")
./nltk/parse/chart.py:1024: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1057: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1123: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1140: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1213: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/chart.py:1232: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:251: DeprecationWarning: invalid escape sequence \*
  """
./nltk/parse/featurechart.py:353: DeprecationWarning: invalid escape sequence \*
  """
./nltk/chunk/util.py:371: DeprecationWarning: invalid escape sequence \S
  _LINE_RE = re.compile("(\S+)\s+(\S+)\s+([IOB])-?(\S+)?")
./nltk/chunk/util.py:517: DeprecationWarning: invalid escape sequence \w
  _IEER_TYPE_RE = re.compile('<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
./nltk/chunk/util.py:526: DeprecationWarning: invalid escape sequence \s
  for piece_m in re.finditer("<[^>]+>|[^\s<]+", s):
./nltk/chunk/named_entity.py:178: DeprecationWarning: invalid escape sequence \w
  elif re.match("\w+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:176: DeprecationWarning: invalid escape sequence \W
  elif re.match("\W+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:174: DeprecationWarning: invalid escape sequence \.
  if re.match("[0-9]+(\.[0-9]*)?|[0-9]*\.[0-9]+$", word, re.UNICODE):
./nltk/chunk/named_entity.py:250: DeprecationWarning: invalid escape sequence \s
  text = re.sub("[\s\S]*<TEXT>", subfunc, text)
./nltk/chunk/named_entity.py:251: DeprecationWarning: invalid escape sequence \s
  text = re.sub("</TEXT>[\s\S]*", "", text)
./nltk/chunk/regexp.py:70: DeprecationWarning: invalid escape sequence \{
  _BRACKETS = re.compile("[^\{\}]+")
./nltk/chunk/regexp.py:215: DeprecationWarning: invalid escape sequence \{
  s = re.sub("\{\}", "", s)
./nltk/chunk/regexp.py:426: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<chunk>}", descr)
./nltk/chunk/regexp.py:471: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "}\g<chink>{", descr)
./nltk/chunk/regexp.py:510: DeprecationWarning: invalid escape sequence \{
  regexp = re.compile("\{(?P<chunk>%s)\}" % tag_pattern2re_pattern(tag_pattern))
./nltk/chunk/regexp.py:511: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<chunk>", descr)
./nltk/chunk/regexp.py:575: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>", descr)
./nltk/chunk/regexp.py:708: DeprecationWarning: invalid escape sequence \{
  "(?P<left>%s)\{(?P<right>%s)"
./nltk/chunk/regexp.py:714: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "{\g<left>\g<right>", descr)
./nltk/chunk/regexp.py:778: DeprecationWarning: invalid escape sequence \}
  "(?P<left>%s)\}(?P<right>%s)"
./nltk/chunk/regexp.py:784: DeprecationWarning: invalid escape sequence \g
  RegexpChunkRule.__init__(self, regexp, "\g<left>\g<right>}", descr)
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:896: DeprecationWarning: invalid escape sequence \{
  r"^((%s|<%s>)*)$" % ("([^\{\}<>]|\{\d+,?\}|\{\d*,\d+\})+", "[^\{\}<>]+")
./nltk/chunk/regexp.py:1136: DeprecationWarning: invalid escape sequence \.
  """
./nltk/featstruct.py:1295: DeprecationWarning: invalid escape sequence \d
  name, n = re.sub("\d+$", "", var.name), 2
./nltk/featstruct.py:2091: DeprecationWarning: invalid escape sequence \d
  RANGE_RE = re.compile("(-?\d+):(-?\d+)")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:166: DeprecationWarning: invalid escape sequence \s
  _ARROW_RE = re.compile("\s*(->|(" + ARROW + "))\s*")
./nltk/draw/cfg.py:171: DeprecationWarning: invalid escape sequence \s
  + "))\s*"
./nltk/toolbox.py:159: DeprecationWarning: invalid escape sequence \_
  """
./nltk/grammar.py:1278: DeprecationWarning: invalid escape sequence \*
  """
./nltk/grammar.py:1463: DeprecationWarning: invalid escape sequence \w
  _STANDARD_NONTERM_RE = re.compile("( [\w/][\w/^<>-]* ) \s*", re.VERBOSE)
./nltk/stem/porter.py:145: DeprecationWarning: invalid escape sequence \m
  """Returns the 'measure' of stem, per definition in the paper
./nltk/stem/lancaster.py:192: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^[a-z]+\*?\d[a-z]*[>\.]?$")
./nltk/stem/lancaster.py:225: DeprecationWarning: invalid escape sequence \*
  valid_rule = re.compile("^([a-z]+)(\*?)(\d)([a-z]*)([>\.]?)$")
./nltk/treetransforms.py:8: DeprecationWarning: invalid escape sequence \ 
  """
./tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./tools/nltk_term_index.py:53: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./tools/nltk_term_index.py:56: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./tools/find_deprecated.py:43: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./tools/find_deprecated.py:45: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./tools/find_deprecated.py:47: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./tools/find_deprecated.py:64: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./tools/find_deprecated.py:67: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')

@ab-10
Copy link
Contributor

ab-10 commented Jan 21, 2020

@gertjanwytynck I'm currently fixing them one by one, should be done by the end of the week.

@morrme
Copy link

morrme commented Oct 19, 2020

Has this been completed?

@pombredanne
Copy link
Contributor Author

It looks like there are still a few left. I wonder if adding a unit test could help.

  • ./nltk/tools/nltk_term_index.py
  • ./nltk/tools/find_deprecated.py
  • ./nltk/nltk/tokenize/punkt.py

... and even though the impact of tools deprecation is not much, there is a bit of irony that the find_deprecated.py scripts uses deprecated syntax :)

$ git clone https://github.com/nltk/nltk.git
$ find . -iname '*.py' | xargs -P 4 -I{} python3.8 -Wall -m py_compile {}
./nltk/tools/nltk_term_index.py:51: DeprecationWarning: invalid escape sequence \s
  SCAN_RE1 = "<programlisting>[\s\S]*?</programlisting>"
./nltk/tools/nltk_term_index.py:52: DeprecationWarning: invalid escape sequence \s
  SCAN_RE2 = "<literal>[\s\S]*?</literal>"
./nltk/tools/nltk_term_index.py:55: DeprecationWarning: invalid escape sequence \w
  TOKEN_RE = re.compile('[\w\.]+')
./nltk/tools/find_deprecated.py:42: DeprecationWarning: invalid escape sequence \s
  '"""[\s\S]*?"""|'
./nltk/tools/find_deprecated.py:44: DeprecationWarning: invalid escape sequence \s
  "'''[\s\S]*?'''|"
./nltk/tools/find_deprecated.py:46: DeprecationWarning: invalid escape sequence \s
  ")\s*"
./nltk/tools/find_deprecated.py:63: DeprecationWarning: invalid escape sequence \.
  '({})\.read\('.format('|'.join(re.escape(n) for n in dir(nltk.corpus)))
./nltk/tools/find_deprecated.py:66: DeprecationWarning: invalid escape sequence \s
  CLASS_DEF_RE = re.compile('^\s*class\s+(\w+)\s*[:\(]')
./nltk/nltk/tokenize/punkt.py:223: DeprecationWarning: invalid escape sequence \]
  return "(?:[)\";}\]\*:@\'\({\[%s])" % re.escape("".join(set(self.sent_end_chars) - {"."}))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants