Bytes and string handling in Python 3

Introduction

Python 3 differs a lot in handling strings and bytes from Python 2 (you can read more about this in this article or in this Pragmatic Unicode talk). Basically, strings (str) in Python 3 are Unicode by default and “bytes” (bytes) are lists of integers from 0 to 255 (lists of 8 bits). There is no implicit conversion between str and bytes in Python 3, so any conversion needs to be done explicitly using encode (str → bytes) and decode (bytes → str) functions.

Throughout Oppia, we typically use strings. However, you may come across bytes in places where there is an interaction with some outside library or API — for example, when standard input or output is read or written, or when data is read from or written to files. Some standard Python libraries also only accept bytes.

Rules for handling strings and bytes

Bytes outside, strings inside

The general rule you should follow is to keep all text in Oppia as strings, where possible. If a conversion to bytes is necessary, that conversion should happen as close to the “edges” of the app as possible. So, for example:

When you receive bytes from some library, immediately convert them to string using decode.
If you need to use a function that needs bytes, use encode to convert the string to bytes immediately before you call the function.

Use utf-8 (or ascii)

In the Oppia codebase all data (that we can decide about) should be encoded/decoded using utf-8 encoding (encode('utf-8')). If you find a case where utf-8 cannot be used, please raise this with the Core Maintainers team.

If, in some case, an external source returns or receives data with a different encoding, it is fine to use that encoding only for that source. However, please first be sure to investigate whether that source can be configured to use utf-8 instead.

Have an idea for how to improve the wiki? Please help make our documentation better by following our instructions for contributing to the wiki.

Core documentation

Developing Oppia

FAQs
Installing Oppia
Getting started with the codebase
- 'Getting started' guide
- Populating data on local server
- Tutorials
- How to access Oppia webpages
- Team onboarding guides
  - LaCE Team
  - Contributor Dashboard Team
Making your first PR
Learning resources for developers
Codebase Overview
Coding Guidelines
Testing
- Automated tests
- Manual tests
Release Process

Developer Reference

Oppiabot
Frontend
Backend
Translations
- Adding new translations
- How to develop for i18n
Webpack
Third-party libraries
Extension frameworks
Oppia-ml Extension
Mobile development
Mobile device testing
Performance testing
Build process
Team structure
Triaging Process
Playbooks
Wiki
- Wiki-style-guide
Past Events
- Google Summer of Code: 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016
- Hacktoberfest: 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016
- GHC Open Source Day: 2019, 2018
- Season of Docs: 2024, 2021, 2019
- DSC-SLoP (Semester Long Project): 2022, 2020
- Outreachy: Dec 2021 to Mar 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytes and string handling in Python 3

Introduction

Rules for handling strings and bytes

Bytes outside, strings inside

Use utf-8 (or ascii)

Clone this wiki locally