-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Bytes and string handling in Python 3
Python 3 differs a lot in handling strings and bytes from Python 2 (you can read more about this in this article or in this Pragmatic Unicode talk). Basically, strings (str
) in Python 3 are Unicode by default and “bytes” (bytes
) are lists of integers from 0 to 255 (lists of 8 bits). There is no implicit conversion between str
and bytes
in Python 3, so any conversion needs to be done explicitly using encode
(str
→ bytes
) and decode
(bytes
→ str
) functions.
Throughout Oppia, we typically use strings. However, you may come across bytes in places where there is an interaction with some outside library or API — for example, when standard input or output is read or written, or when data is read from or written to files. Some standard Python libraries also only accept bytes.
The general rule you should follow is to keep all text in Oppia as strings, where possible. If a conversion to bytes is necessary, that conversion should happen as close to the “edges” of the app as possible. So, for example:
- When you receive bytes from some library, immediately convert them to string using decode.
- If you need to use a function that needs bytes, use encode to convert the string to bytes immediately before you call the function.
In the Oppia codebase all data (that we can decide about) should be encoded/decoded using utf-8 encoding (encode('utf-8')
). If you find a case where utf-8 cannot be used, please raise this with the Core Maintainers team.
If, in some case, an external source returns or receives data with a different encoding, it is fine to use that encoding only for that source. However, please first be sure to investigate whether that source can be configured to use utf-8 instead.
Have an idea for how to improve the wiki? Please help make our documentation better by following our instructions for contributing to the wiki.
Core documentation
Developing Oppia
- FAQs
- Installing Oppia
- Getting started with the codebase
- Making your first PR
- Learning resources for developers
- Codebase Overview
- Coding Guidelines
- Coding style guide
- Guidelines for creating new files
- How to add a new page
- How to write frontend type definitions
- How to write design docs
- Revert and Regression Policy
- Server errors and solutions
-
Debugging
- If your presubmit checks fail
- If CI checks fail on your PR
- Finding the commit that introduced a bug
- Interpreting GitHub Actions Results
- Debugging Docs
- Debugging datastore locally
- Debugging end-to-end tests
- Debugging backend tests
- Debugging frontend tests
- Debug frontend code
- Debugging custom ESLint check tests
- Debugging custom Pylint check tests
- Debugging Stories
- Guidelines for launching new features
- Guidelines for making an urgent fix (hotfix)
- Lint Checks
- Oppia's code owners and checks to be carried out by developers
- Privacy aware programming
- Backend Type Annotations
- Bytes and string handling in Python 3
- Guidelines for Developers with Write Access to oppia/oppia
- Testing
- Release Process
Developer Reference
- Oppiabot
- Frontend
- Backend
- Translations
- Webpack
- Third-party libraries
- Extension frameworks
- Oppia-ml Extension
- Mobile development
- Mobile device testing
- Performance testing
- Build process
- Team structure
- Triaging Process
- Playbooks
- Wiki
- Past Events