Skip to content
This repository has been archived by the owner on Sep 3, 2021. It is now read-only.

Bump chardet from 3.0.4 to 4.0.0 #163

Open
wants to merge 1 commit into
base: dev-indep
Choose a base branch
from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Dec 11, 2020

Bumps chardet from 3.0.4 to 4.0.0.

Release notes

Sourced from chardet's releases.

chardet 4.0.0

⚠️ This will be the last release of chardet to support Python 2.7. chardet 5.0 will only support 3.6+ ⚠️

Major Changes

This release is multiple years in the making, and provides so quality of life improvements to chardet. The primary user-facing changes are:

  1. Single-byte charset probers now use nested dictionaries under the hood, so they are usually a little faster than before. (See #121 for details)
  2. The CharsetGroupProber class now properly short-circuits when one of the probers in the group is considered a definite match. This lead to a substantial speedup.
  3. There is now a chardet.detect_all function that returns a list of possible encodings for the input with associated confidences.
  4. We have dropped support for Python 2.6, 3.4, and 3.5 as they are all past end-of-life.

The changes in this release have also laid the groundwork for retraining the models to make them more accurate, and to support some more encodings/languages (see #99 for progress). This is our main focus for chardet 5.0 (beyond dropping Python 2 support).

Benchmarks

Running on a MacBook Pro (15-inch, 2018) with 2.2GHz 6-core i7 processor and 32GB RAM

old version (chardet 3.0.4)

Benchmarking chardet 3.0.4 on CPython 3.7.5 (default, Sep  8 2020, 12:19:42)
[Clang 11.0.3 (clang-1103.0.32.62)]
--------------------------------------------------------------------------------
Calls per second for each encoding:
ascii: 25559.439366240098
big5: 7.187002209518091
cp932: 4.71090956645177
cp949: 2.937256786994428
euc-jp: 4.870580412090848
euc-kr: 6.6910755971933416
euc-tw: 87.71098043480079
gb2312: 6.614302607154443
ibm855: 27.595893549680685
ibm866: 29.93483661732791
iso-2022-jp: 3379.5052775763434
iso-2022-kr: 26181.67290886392
iso-8859-1: 120.63424740403983
iso-8859-5: 32.65106262196898
iso-8859-7: 62.480089080556084
koi8-r: 13.72481001727257
maccyrillic: 33.018537255804496
shift_jis: 4.996013583677438
tis-620: 14.323112928341818
utf-16: 166771.53081510935
utf-32: 198782.18009478672
utf-8: 13.966236809766901
utf-8-sig: 193732.28637413395
windows-1251: 23.038910006925768
</tr></table> 

... (truncated)

Commits
  • a808ed1 Merge pull request #140 from chardet/master
  • 53854fb Add language to detect_all output
  • 1e208b7 Properly set CharsetGroupProber.state to FOUND_IT (#203)
  • a9286f7 Try to switch from Travis to GitHub Actions (#204)
  • 1db0347 Handle weird logging edge case in universaldetector.py
  • 056a2a4 Remove shebang and executable bit from chardet/cli/chardetect.py (#171)
  • 55ef330 Update links (#152)
  • e4290b6 Remove unnecessary numeric placeholders from format strings (#176)
  • 6a59c4b Remove use of deprecated 'setup.py test' (#187)
  • 4650dbf Remove shebang from nonexecutable script (#192)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Dec 11, 2020
@dependabot dependabot bot force-pushed the dependabot/pip/chardet-4.0.0 branch from 8e7c9cb to 508b780 Compare December 11, 2020 08:41
Bumps [chardet](https://github.com/chardet/chardet) from 3.0.4 to 4.0.0.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Commits](chardet/chardet@3.0.4...4.0.0)

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/chardet-4.0.0 branch from 508b780 to 66d4047 Compare February 3, 2021 07:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

0 participants