/ python

Notes - Book - The Hacker's Guide to Python By Julien Danjou

Contents

1: Starting your project
2: Modules and Libraries
3: Documentation
4: Distribution
5: Virtual Environments
6: Unit Testing
7: Methods and decorators
8: Functional Programming
9: The AST
10: Performances and optimizations
11: Scaling and architecture
12: RDBMS and ORM
13: Python 3 support strategies
14: Write less, code more

Starting your project

Project Layout

One common mistake is leaving unit tests outside the package directory. These tests should definitely be included in a sub-package or your software.

setup.py is the standard name for Python installation script.

distuitls -> Python distribution utilities

Having a functions.py file or exceptions.py file is a terrible approach. It doesn't help anything at all with code organization and forces a reader to jump between files for no good reason.

Organize your code based on features, not type.

Don't create hooks/__init__.py where hooks.py would have been enough. If you create a directory, it should contain several other Python files that belongs to the category/module the directory depends

Coding style & automated checks

Encode files using ASCII or UTF-8

One module import per import statement per line, at the top of the file, after comments and docstrings, grouped first by standard, then third-party and finally local library imports

Name classes in CamelCase

Suffix exceptions with Error (if applicable)

Name functions in lowercase with words separated by underscores

Use a leading underscore for _private attributes or methods

Use pep8 checks. Also use pylint. If you already have a codebase, a good approach is to run them with most of the warnings disabled and fix issues one category at a time

Modules and Libraries

The import system

sys.path variable tells Python where to look for modules to load. You can also use the PYTHONPATH variable for this.

Some useful Standar Libraries

  • atexit allows you to register functions to call when your program exits
  • argparse provides functions for parsing command line arguments
  • bisect provides a bisection algorithms for sorting lists
  • calendar provides a number of date-related functions
  • codecs provides a variety of useful data structures
  • collections provides a variety of useful data structures
  • copy provides functions for copying data
  • csv
  • datetime
  • fnmatch provides functions for matching Unix-style filename patterns
  • glob provides functions for matching Unix-style path patterns
  • io provides functions for handling I/O streams. In Python 3, it also contains StringIO (which is in the module of the same name in Python 2), which allows you to treat strings as files
  • json
  • logging
  • multiprocessing
  • operator
  • os
  • random
  • re
  • select provides the select() and poll() functions for creating event loops
  • shutil provides access to high-level file functions
  • signal provides functions for handling POSIX signals
  • tempfile
  • threading
  • urllib
  • uuid

The entire standard library is written in Python.

External libraries

There's no way you can know for sure whether a library that is zealously maintained today will still be like that in a few months.

Openstack checklist for deciding if a library is likely to be supported in the future:

  • Python 3 compatibility
  • Active development
  • Active maintenance
  • Packaged with OS distribution

It is sometimes better to write your own API - a wrapper tha encapsulates your external libraries and keeps them out of your source code

Frameworks

Difference between frameworks and external libraries is that applications make use of frameworks by building on top of them: your code will extend the framework rather than vice versa. Unlike a library, which is basically an add-on you can bring in to give your code some extra oomph, a framework forms the chassis of your code: everything you do is going to build on that chassis in some way, which can be a double-edged sword

Interview with Doug Hellmann

  • When creating a new application, I create some code and run it by hand, then write tests to make sure I've covered all of the edge cases after I have the basic aspect of a feature working. Creating the tests may also lead to some refactoring to make the code easier to work with.

  • While designing and app, I think about how the user interface works, but for libraries, I focus on how a developer will use the API

  • I have also found that writing the documentation for a library before writing any code at all gives me a way to think through the features and workflows for using it without committing to the implementation details

  • I like to use namedtuple for creating small class-like data structures that just need to hold data but don't have any associated logic

  • If I have more than a handful of imports, I reconsider the design of the module and think about splitting it up into a package

  • Application are collection of "glue code" holding libraries together for a specific purpose. Design based on implementing those features as a library first and then building the application ensures that code is properly organized into logical units, which in turn makes testing simpler. It also means the features of an application are accessible through the library and can be remixed to create other applications. Failing to take this approach means the features of the application are tightly bound to the user interface, which makes them harder to modify and reuse.

  • Design libraries and APIs from the top down

  • Single Responsibility Principle (SRP) for each layer

  • Convert filtering loops to generator expressions

  • Use a dict() as a lookup table instead of a long if:then:else block

  • functions should always return the same type of object (e.g. an empty list instead of None)

  • Reduce the number of arguments to a function by combining related values into an object with either a tuple or a new class

  • You may end up fighting with the framework if you try to use different patterns or idioms than it recommends

Managing API Changes

When building an API, it's rare to get everything right the first try. Your API will have to evolve, adding, removing or changing the features it provides.

The first thing and the most important step when modifying an API is to heavily document the change. This includes:

  • documenting the new interface

  • documenting that the old interface is deprecated

  • documenting how to migrate to the new interface

Example

class Car(object):
    def turn_left(self):
        """Turn the car left.

		.. deprecated:: 1.1
        	Use :func:`turn` instead with the direction argument set to left
       	"""
		self.turn(direction='left')

    def turn(self, direction):
    	"""Turn the car in some direction.

		:param direction: The direction to turn to.
        :type direction: str
        """
        # Write the actual code here instead
        pass

Python provides an interesting module called warnings. This module allows your code to issue various kinds of warnings, such as PendingDeprecationWarning and DeprecationWarning.

import warnings

class Car(object):
	def turn_left(self):
    	"""Turn the car left.

		.. deprecated:: 1.1
           Use :func:`turn` instead with the direction argument set to "left"
        """
        warnings.warn("turn_left is deprecated, use turn instead", DeprecationWarning)
        self.turn(direction='left')

Run test suites with -W error option, which transforms warnings into exceptions. This means that every time an obsolete function is called, an error will be raised, and it will be easy for developers using your library to know exactly where their code needs to be fixed

Interview with Christophe de Vienne

  • Coming up with good use cases makes it easier to design and API

  • Most web frameworks assume they're running on a multi-threaded server and treat all this information as TSD (Thread-Specific Data)

  • Document early and include your documentation build in continuous

  • Use docstrings to document classes and functions in your API. Use PEP 257

Documentation

reStructuredText or reST

Sphinx

doctest is a standard Python module which searches your documentation for code snippets and runs them to test whether they accurately reflect what your code actually does. Every paragraph starting with >>> (i.e. the primary prompt) is treated as a code snippet to test

It's easy to end up leaving your examples unchanged as your API evolves; doctest helps you make sure this doesn't happen

Documentation-Driven Development (DDD): write your documentation and examples first, and then write your code to match your documentation

Distribution

distutils

setuptools is the distribution library to use for the time being, but keep an eye out for distlib in the future

pbr (Python Build Reasonableness). Use it to write your next setup.py

Virtual Environments

To have access to your system installed packages, enable them when creating virtual environment by passing the --system-site-packages flag to the virtualenv command

Virtual environments are very useful for automated run of unit test suite >> tox

The -m flag loads the module. Eg.

python3 -m venv

For creating a virtual environment,

python3 -m venv myvenv

Unit Testing

Writing code that is not tested is essentially useless, as there's no way to conclusively prove that it works

Your tests should be stored inside a tests submodule of your application or library

Use a hierarchy in your tests that mimics the hierarchy you have in your module tree. This means that the tests covering the code of mylib/foobar.py should be inside mylib/tests/test_foobar.py

To deliberately fail a test right away, use the fail(msg) method

import unittest

class TestFail(unittest.TestCase):
    def test_rang(self):
        for x in range(5):
            if x > 4:
                self.fail("Testing manual fail")

To run a test conditionally based on the presence of a particular library, you can raise the unittest.SkipTest exception. When this exception is raised by a test, it is simply marked as having been skipped. Alternatives are unittest.TestCase.skipTest() method and using the unittest.skip decorator

class TestSkipped(unittest.TestCase):
	@unittest.skip("Do not run this")
    def test_fail(self):
    	self.fail("this should not be run")

	@unittst.skipIf(mylib is None, 'mylib is not available')
    def test_mylib(self):
    	self.assertEqual(mylib.foobar(), 42)

	def test_skip_at_runtime(self):
    	if True:
        	self.skipTest('Finally I dont want to run it')

Fixtures represent components that are set up before a test and cleaned up after the test is done

Mocking

Mock objects are simulated objects that mimic the behaviour of real application objects, but in particular and controlled ways

Standard library >> mock. In Python 3.3+, it has been merged into the python standard library as unittest.mock

try:
	from unittest import mock
except:
	import mock

Basic Mock usage:

>>> import Mock
>>> m = mock.Mock()
>>> m.some_method.return_value = 42
>>> m.some_method()
42
>>> def print_hello():
...		print('hello world !')
...
>>> m.some_method.side_effect = print_hello
>>> m.some_method()
hello world !
>>> def print_hello():
... 	print('hello world !')
...		return 43
...
>>> m.some_method.side_effect = print_hello
>>> m.some_method()
hello world !
43
>>> m.some_method.call_count
3

Even using jus this set of features, you should be able to mimic a lot of your internal objects in order to fake various data scenarios

Mock uses the action/assertion pattern: this means that once your test has run, you will have to check that the actions you are mocking were correctly executed.

>>> import mock
>>> m = mock.Mock()
>>> m.some_method('foo', 'bar')
>>> m.some_method.assert_called_once_with('foo', 'bar')
>>> m.some_method.assert_called_once_with('foo', mock.ANY)
>>> m.some_method.assert_called_once_with('foo', 'baz')
... Throws error !!

Using mock.patch

>>> import mock
>>> import os
>>> def fake_os_unlink(path):
... 	raise IOError('Testing!')
...
>>> with mock.patch('os.unlink', fake_os_unlink):
... 	os.unlink('foobar')
...
Traceback ...
IOError: Testing!

With the mock.patch method, it's possible to change any part of an external piece of code - makig it behave in the required way in order to test all conditions in your software

There is also a decorator version of mock.patch

def get_fake_get(status_code, content):
	m = mock.Mock()
    m.status_code = status_code
    m.content = content
    def fake_get(url):
    	return m
    return fake_get

class WhereIsPythonError(Error):
	pass

def check_for_something():
	try:
    	r = requests.get('http://python.org')
    except IOError:
    	pass
    else:
    	if r.status_code == 200:
        	return 'Check successful !'
    raise WhereIsPythonError('Something bad happened')

class TestPythonError(unittest.TestCase):
    @mock.patch('requests.get', get_fake_get(404, 'Whatever))
    def test_ioerror(self):
        self.assertRaises(WhereIsPythonError, check_for_something)

Use testscenarios to run a class test against a different set of scenarios generated as run-time

import mock
import requests
import testscenarios

class CustomTestError(Exception):
	pass

def check_something_online():
	r = requests.get('http://some.url')
    if r.status_code == 200:
    	return 'Test data' in r.content
    raise CustomTestError('Something bad happened')

def get_fake_get(status_code, content):
	m = mock.Mock()
    m.status_code = status_code
    m.content = content
    def fake_get(url):
    	return m
    return fake_get

class MyTestErrorCode(testscenarios.TestWithScenarios):
	scenarios = [
    	('Not found', dict(status=404)),
        ('Client error', dict(status=400)),
        ('Server error', dict(status=500))
    ]

    def test_some_external_stuff(self):
    	with mock.patch('requests.get',
        				get_fake_get(
                        	self.status,
                            'Test data string')):
        	self.assertRaises(WhereIsPythonError, check_something_online)

Construct the scenario list as a list of tuples that consists of the scenario name as the first argument, and the dictionary of attributes to be added to the test class for this scenario as the second argument

Tox

Creates a virtual environment, installs setuptools and the installs all of the dependencies required for both your application/library runtime and unittests.

tox.ini

By default tox can simulate many environments: py27, py34 etc. To add an environment or to create a new one, you just need to add another section named [testenv:_envname_]

Sample tox.ini file

[tox]
envlist=py27,py34,pep8

[testenv]
deps=nose
	 -r requirements.txt
commands=pytest

[testenv:pep8]
deps=flake8
commands=flake8

To run tox in parallel use detox which runs all of the default environments for the envlist in parallel

Testing Policy

You should have a zero tolerance policy on untested code. No code should be merged unless there is a proper set of unit tests to cover it

Methods and decorators

Creating Decorators

A decorator is essentially a function that takes another function as an argument and replaces it with a new, modified argument.

The primary us case for decorators is factoring common code that needs to be called before, after or around multiple functions.

Use the functools module's update_wrapper to update the attributes to the wrapper itself

It can get tedious to use update_wrapper manually when creating decorats, so functools provides a decorator for decorators called wraps.

import functools

def check_is_admin(f):
	@functools.wraps(f)
    def wrapper(*args, **kwargs):
    	if kwargs.get('username') != 'admin':
        	raise Exception('This user is not allowed here')
        return f(*args, **kwargs)
    return wrapper

The inspect module allows us to retrieve a function's signature and operate on it:

import functools
import inspect

def check_is_admin(f):
	@functools.wraps(f)
    def wrapper(*args, **kwargs):
    	func_args = inspect.getcallargs(f, *args, **kwargs)
        if func_args.get('username') != 'admin':
        	raise Exception('This user is not allowed here')
        return f(*args, **kwargs)
    return wrapper

@check_is_admin
def get_food(username, type='chocolate'):
	return type + ' nom nom nom!'

In this case, inspect.getcallargs returns {'username': 'admin', 'type': 'chocolate'}. The advantage of this approach is that our decorator doesnt have to check if the username parameter is a positional or a keyword argument: all it has to do is look for it in the dictionary

How methods work in Python

A method is a function that is stored as a class attribute

In Python 3, the concept of unbound method has been removed entirely, and trying to call a method that is not tied to any particular object would raise error about missing positional argument 'self'

If you have a reference to a method and want to find out which object it's bound to, use the method's __self__ property

>>> m = Pizza(42).get_size
>>> m.__self__
<__main__.Pizza object at #########>
>>> m = m.__self__.get_size
True

Static Methods

methods which belong to a class, but don't actually operate on class instances

When we see @staticmethod, we know that the method does not depend on the state of the object

Class method

methods that are bound directly to a class rather than its instances

However you choose to access this method (by class name or object), it will be always bound to the class it is attached to, and its first argument will be the class itself (remember classes are objects too!)

implement your abstract methods using Python's built-in abc module

import abc

class BasePizza(object):
	__metaclass__ = abc.ABCMeta

    @abc.abstractmethod
    def get_radius(self):
    	"""Method that should do something"""

It is also possible to use the @staticmethod and @classmethod decorators on top of @abstractmethd:

import abc

class BasePizza(object):
	__metaclass__ = abc.ABCMeta

	default_ingredients = ['cheese']

	@classmethod
    @abc.abstractmethod
    def get_ingredients(cls):
    	"""Returns the ingredient list"""
        return cls.default_ingredients

class DietPizza(BasePizza):
	def get_ingredients(self):
    	return [Egg()] + super(DietPizza, self).get_ingredients()

There's no way to force subclasses to implement abstract methods as a specific kind of method

The truth about super

Multiple inheritance is still used in many places, and especially in code where the mixin pattern is involved

A mixin is a class tha inherits from two or more other classes, combining their features together

mro() >> method resolution order used to resolve attributes

super() is actually a constructor, and you instantiate a super object each time you call it. It takes either one or two arguments: the first argument is a class, and the second argument is either a subclass or an instance of the first argument. The object returned by the construcor functions as a proxy for the parent classes of the first arguments.

Descriptor protocol is the mechanism in Python that allows an object that's stored as an attribute to return something other thank itself. (__get__)

In Python 3, super() can be called from within a method without any arguments

class B(A):
	def foo(self):
    	super().foo()

super is the standard way of accessing parent attributes in subclasses, and you should alway use it. It allows cooperative calls of parent methods without any surprises, such as parent methods not being called or being called twice when using multiple inheritance

Functional Programming

Functional programming allows you to write more concise and efficient code.

When you write code using functional style, your functions are designed not to have side effect: they take an input and produce an output without keeping state or modifying anything not reflected in the return value > purely functional

Generators

an object that returns a value on each call of its next() method until it raises StopIteration.

Iterator protocol

generator statements >> yield statement

To check if a function is a generator or not, use the inspect.isgeneratorfunction

Python 3 >> inspect.getgeneratorstate

  • waiting to be run for the first time - GEN_CREATED
  • currently being executed by the interpreter - GEN_RUNNING
  • waiting to be resumed by a call to next() - GEN_SUSPENDED
  • finished running - GEN_CLOSED

Generators allow you to handle large data sets with minimal consumption of memory and processing cycles by generating values on-the fly.

One-line generators - sytax is similar to list comprehensions

>>> (x.upper() for x in ['hello', 'world'])
<generator object>
>>> gen = (x.upper() for x in ['hello', 'world'])

Using first

>>> from first import first
>>> first([0, False, None, [], ()])
42
>>> first([-1, 0, 1])
-1
>>> first([-1, 0, 2], key=lambda x: x > 0)
2

lambda was actually added to Python in the first place to facilitate functional programming functions such as map() and filter()

Use partial
functools.partial is typically useful in replacement of lambda, and is to be considered as a superior alternative. lambda is to be considered an anomaly in Python language, due to its limited body size of one line long single expression

Use operator module

The AST

Abstract Syntax Tree
A tree representation of the abstract structure of the source code of any programming language

Performances and optimizations

Data Structures

Often, there is a temptation to code your own custom data structures - this is invariably a vain, useless, doomed idea. Python almost always has better data structures and code to offer - learn to use them

The set data structures have methods which can solve many problems that would otherwise need to be addressed by writing nested for/if blocks

def has_invalid_fields(fields):
	for field in fields:
    	if field not in ['foo', 'bar']:
        	return True
    return False

This can be written without a loop:

def has_invalid_fields(fields):
	return bool(set(fields) - set(['foo', 'bar']))

Each time that you try to access a non-existent item from your dict, the defaultdict will use the function that was passed as argument to its constructor to build a new value - instead than raising a KeyError

OrderedDict

Counter

Profiling

cProfile >> standard tool for profiling

dis module >> a disassembler of Python byte code. It prints the list of bytecode instructions that are run by the function

A common wrong habit is defining of functions inside functions for no reason. This has a cost - as the function is going to be redefined over an over for no reason. The function calling in Python is already inefficient. The only case in which it is required to define a function within a function is when building a function closure.

Ordered list and bisect

bisect module - provides bisection algorithm

bisect.bisect(sorted_list, new_item) - allows you to retrieve the index where a new list element should be inserted, while keeping the list sorted

bisect.insort(sorted_list, new_item) - in case you wish to insert the element immediately

Namedtuple and slots

Classes in Python can define a __slots__ attribute that will list the only attributes allowed for instances of this class. It seems that by using the __slots__ attribute of Python classes, we can halve our memory usage - this means that when creating a large amount of simple objects, the __slots__ attribute is an effective and efficient choice.

The usage of the namedtuple class factory is almost as efficient as using an object with __slots__, the only difference being that it is compatible with the tuple class. It can therefore be passed to many native Python functions and libraries that expect an iterable type as an argument.

Memoization

caching

Python 3.3+ >> functools.lru_cache decorator

import functools
import math

@functools.lru_cache(max_size=2)
def memoized_sin(x):
	return math.sin(x)

Scaling and architecture

RDBMS and ORM

Python 3 support strategies

The only way to be sure that your code works under both Python version is to have unit testing (use tox to simplify this)

Remember string vs unicode

Write less, code more

Context managers

Use context management protocol if you identify the following pattern:

  • Call method A
  • Execute some code
  • Call method B

Use contextlib >> contextmanager. Work on usage of enter and exit methods

Remember that with statement supports having multiple arguments so you should write

with open('file1', 'r') as source, open('file2', 'w') as dest:
	destination.write(source.read())