Python Styleguide

Python

We follow the PEP8 style guide for Python. Docstrings follow PEP257. The rest of the document describes additions and clarifications to the PEP documents that we follow at MapStory. This document was compiled from many sources, primarily from the Khan Academy Python Guidelines. A complete list of sources will be listed at the bottom of this document.

Django

We also follow the guidelines set out by Django itself. You can review these at https://docs.djangoproject.com/en/1.8/internals/contributing/writing-code/coding-style/.

Modular Code

Write your code in highly modular, pluggable django applications. These will go in the mapstory-geonode/mapstory/apps directory to keep all of the code clean and easy to read. Each app should have it's own admin, models, tests, urls, views, migrations and templates. You should be able to move the functionality of the app into any other django project and know that it's going to work.

The journal app is an example of proper structure - https://github.com/MapStory/mapstory-geonode/tree/master/mapstory/journal

Indentation

Use 4 spaces -- never tabs -- for indentation.

This is a strict rule and ignoring this can (has) cause(d) bugs.

init.py

__init__.py files should be empty.

Rationale: when you do import foo.bar python imports two files: foo/bar.py, and foo/__init__.py. If foo/__init__.py has imports of its own, those will be run as well -- even if you don’t plan to run any of the code defined in __init__.py. This slows down execution, and worse causes circular-import problems that could be entirely avoided.

If you have code that you think every user of every function inside this directory needs to run first, __init__.py may be appropriate, but you should also consider just creating a function that executes that code, and running the function at the top level (that is, not indented) inside each file in your directory. This makes it more obvious what's going on, and also makes it easier to special-case certain files if the need ever arises.

Using __init__.py to bring variables from sub-modules into the main module space totally defeats the point of having sub-modules in the first place; don’t do it.

For more discussion, see http://stackoverflow.com/questions/1944569/how-do-i-write-good-correct-init-py-files.

TODO: Ask Tyler why handlers are stored in init.py for importer project.

Top of the file

Start your file with a module docstring. Do not put a shebang line (#!/usr/bin/python) or copyright notice, or anything else. Follow the docstring with your imports; don't put an __author__ line.

Exception: if the python file is meant to be executable, it should start with the following shebang line:

#!/usr/bin/env python

Rationale: a shebang line is useless for non-executable files. An __author__ line just gets out of date, and is better determined by looking at source control history in any case.

Imports

from...import

Only import entire modules, never individual symbols from a module. For top-level modules, that means import foo. For sub-modules, you can do either import foo.bar or from foo import bar.

import auth_util                      # module import: importing the file auth_util.py
import auth.oauth_credentials         # module import: importing the file auth/oauth_credentials.py
from auth import oauth_credentials    # module import: importing the file auth/oauth_credentials.py
from auth_util import get_response    # BAD: symbol import: importing the function get_response from auth_util.py
from auth_util import Authorize       # BAD: symbol import: importing the class Authorize from auth_util.py
from auth_util import AUTH_HOST       # BAD: symbol import: importing the variable AUTH_HOST from auth_util.py

Exception: for third-party code, where the module documentation explicitly says to import individual symbols.

If the basename of a sub-module is generic, prefer the import form to the from form, otherwise either is fine:

import auth.models                   # code uses auth.models.Token -- clear

from auth import models              # code uses models.Token -- ambiguous!

from auth import oauth_credentials   # code uses oauth_credentials.Token -- clear

Rationale: This is the single best -- and easiest -- way to avoid the circular-import problem. To simplify, when you say import x, Python executes the x.py file, but doesn't need to get any attributes from it. When you say from x import foo, Python needs x.py to be executed enough that foo is defined in it. When x does a 'from-import' from y, and y does a 'from-import' from x, the circular import can succeed if a partial module is enough (as it will be with import x), but it can fail if the circularity happens before the needed attribute is defined (as it might be with from x import foo).

Side note: While this rule helps with most circular-import problems, it doesn’t help with all: a module x may still need symbols from a module y while x is still being imported. For instance, if you say class xclass(y.yclass): ..., Python needs the yclass attribute at import-time.

The downside of this rule is that code gets more verbose: where before you could do

from possibly_a_very_long_name import MyClass
[...]
m = MyClass(...)

now you have to do

import possibly_a_very_long_name
[...]
m = possibly_a_very_long_name.MyClass(...)

I argue, though, this verbiage is beneficial: in the same way that self.xxx is an immediate sign at the point of use that xxx is a member of the current class, x.MyClass is an immediate sign that MyClass comes from file x (which often tells you more about what MyClass is for, and makes it easier to find in the source as well). This often tells you enough that you can continue reading the code, without needing a context switch to look up the where MyClass is defined and what it says.

Import style

All imports should be at the top of the file, separated (by blank lines) into four sections:

Standard library imports.
Imports from core Django.
Imports from third-party apps including those unrelated to Django.
Imports from the internal MapStory apps that have been created.

All non-system imports should be specified relative to the root of the ka python tree; do not use absolute_import. Each section should be sorted alphabetically. Only one import should be on each line.

Rationale: When I see autocomplete.foo() in the code, and I want to know what it does, it’s helpful to know if I should be looking on the web (because autocomplete is part of the python distribution), or in the local source tree (because autocomplete is written by us). It’s also helpful to know if it’s code we wrote (and the barrier to hacking on it is low) or code someone else wrote (which means it may be easier to just work around any problems I have with it). The three sections tell me that with just a quick glance at the top of the file. Plus, since each section is alphabetical, it’s easy for me to find the import within the section.

Alphabetical sorting is by the main module name (so second word of the line), and ignores case:

import api
from app import App
from app import Zebra
import auth_models
import NonExistentModule
[...]

Here are some constructs that are not consistent with this style rule:

from app import App, Zebra    # BAD: two imports on the same line
import .models                # BAD: relative import.  Alternative: from <curmodule> import models

Docstrings

All non-trivial methods should have docstrings. Docstrings should follow guidelines here: PEP257. For more examples, see the Google style guide around docstrings.

To summarize: There are two types of docstrings, long-form and short-form.

A short-form docstring fits entirely on one line, including the triple quotes. It is used for simple functions, especially (though by no means exclusively) ones that are not part of a public API:

"""Return a user-readable form of a Frobnozz, html-escaped."""

Note that the text is specified as an action (“return”) rather than a description (“returns”). This has the added advantage of taking less space, so the comment is more likely to fit on a single line. :-)

If the description spills past one line, you should move to the long-form docstring: a summary line (one physical line) starting with a triple-quote ("""), terminated by a period, question mark, or exclamation point, followed by a blank line, followed by the rest of the doc string starting at the same cursor position as the first quote of the first line, ending with triple-quotes on a line by themselves. (Unlike what the BDFL suggests in PEP 257, we do not put a blank line before the ending quotes.)

"""This comment serves to demonstrate the format of a docstring.

Note that the summary line is always at most one line long, and
on the same line as the opening triple-quote, and with no spaces
between the triple-quote and the text.  Always use double-quotes
(") rather than single-quotes (') for your docstring.   (This
lets us reserve ''' (3 single-quotes) as a way of commenting out
big blocks of code.
"""

A function (including methods and generators) must have a docstring, unless it meets all of the following criteria:

not externally visible
very short
obvious

The docstring should describe the function's calling syntax and its semantics, not its implementation.

The docstring should end with the following special sections (see the Google style guide for more details).

Arguments: List each parameter by name, and a description of it. The description can span several lines (use a hanging indent if so). Use instead of "Args".
Returns: (or Yields: for generators): Describe the type and semantics of the return value. If the function only returns None, this section is not required.
Raises: List all exceptions that are relevant to the interface.

Classes should follow a similar format: a single line describing the class, plus more details, but instead of Arguments/Returns/Raises, it should have an Attributes: section that lists and describes the public attributes of the class (if any).

Modules (files) should have a docstring too, at the top of the file, starting with the usual one-line summary:

"""One line summary.

Longer description.
"""

Rationale: People will read a piece of code many more times than they will write it. Time spent documenting at write-time more than pays off at read time. What is obvious to you as the code-author, well versed in the module where this function lives, may not be at all obvious to a code reader, who is possibly jumping into this function from some unrelated part of the codebase.

The rules here may seem like overkill, especially the need to document every argument and return value. I can say from experience two things: it often does seem like overkill when writing it (especially when the docstring is longer than the function!) but I've almost never thought it was overkill when reading unfamiliar code. You may find, as you write the docstring, you're putting down something that wasn't as obvious as you thought it was:

def WriteTimestamp(now):
    """Write the current time to stdout in an arbitrary user-readable format.

    Arguments:
        now: the current time as a time_t. Should be time.time() except for tests.
    """
    ...

Even though the meaning of now may seem obvious, it's not obvious that it's only being passed in so it can be mocked out for tests. Its type also isn't obvious (a time_t? a tuple? a datetime object?), so it's nice to have that documented as well.

Unused variables

If you want to assign a value to a variable that will not be used again, name the variable either _ (python convention) or unused_<something> (less-well-known python convention). This will keep our lint checkers from complaining.

Naming

Variables, functions, methods, packages, modules
- lower_case_with_underscores
Classes and Exceptions
- CapWords
Protected methods and internal functions
- _single_leading_underscore(self, ...)
Private methods
- __double_leading_underscore(self, ...)
Constants
- ALL_CAPS_WITH_UNDERSCORES

General Naming Guidelines

Avoid one-letter variables (esp. l, O, I).

Avoid redundant labeling.

Yes

import audio

core = audio.Core()
controller = audio.Controller()

No

import audio

core = audio.AudioCore()
controller = audio.AudioController()

Prefer "reverse notation".

Yes

elements = ...
elements_active = ...
elements_defunct = ...

No

elements = ...
active_elements = ...
defunct_elements ...

Avoid getter and setter methods.

Yes

person.age = 42

No

person.set_age(42)

Todo comments

Use TODO comments for code that is temporary, a short-term solution, or good-enough but not perfect. TODOs should include the string TODO in all caps, followed by the name, e-mail address, or other identifier of the person who can best provide context about the problem referenced by the TODO, in parentheses. A colon is optional. A comment explaining what there is to do is required. The main purpose is to have a consistent TODO format that can be searched to find the person who can provide more details upon request. A TODO is not a commitment that the person referenced will fix the problem. Thus when you create a TODO, it is almost always your name that is given.

TODO(first_name): Use a "*" here for string repetition.
TODO(first_name): Change this to use relations.

If your TODO is of the form "At a future date do something" make sure that you either include a very specific date ("Fix by November 2009") or a very specific event ("Remove this code when all clients can handle XML responses.").

Testing

Strive for 100% code coverage, but don't get obsessed over the coverage score.

General testing guidelines

Use long, descriptive names. This often obviates the need for doctrings in test methods.
Tests should be isolated. Don't interact with a real database or network. Use a separate test database that gets torn down or use mock objects.
Prefer factories to fixtures.
Never let incomplete tests pass, else you run the risk of forgetting about them. Instead, add a placeholder like assert False, "TODO: finish me".

Unit Tests

Focus on one tiny bit of functionality.
Should be fast, but a slow test is better than no test.
It often makes sense to have one testcase class for a single class or model.

import unittest
import factories

class PersonTest(unittest.TestCase):
    def setUp(self):
        self.person = factories.PersonFactory()

    def test_has_age_in_dog_years(self):
        self.assertEqual(self.person.dog_years, self.person.age / 7)

Functional Tests

Functional tests are higher level tests that are closer to how an end-user would interact with your application. They are typically used for web and GUI applications.

Write tests as scenarios. Testcase and test method names should read like a scenario description.
Use comments to write out stories, before writing the test code.

import unittest

class TestAUser(unittest.TestCase):

    def test_can_write_a_blog_post(self):
        # Goes to the her dashboard
        ...
        # Clicks "New Post"
        ...
        # Fills out the post form
        ...
        # Clicks "Submit"
        ...
        # Can see the new post
        ...