[Python Review] Part V: Packages
Stepping up Python skills
Packages
This part is about
- how to organize your code into a package structure
- the installation of third party packages
- preparing to give your own code away to others.
Since the landscape of Python packaging tools is evolving, the main focus of this section is on some general code organization principles that will prove useful no matter what tools you later use to give code away or manage dependencies.
If writing a larger program, you don’t really want to organize it as a large of collection of standalone files at the top level. Here is how you can organize the files in hierarchy.
Modules
Any Python source file is a module.
# foo.py
def grok(a):
...
def spam(b):
...
An import
statement loads and executes a module.
# program.py
import foo
a = foo.grok(2)
b = foo.spam('Hello')
...
Packages vs Modules
For larger collections of code, it is common to organize modules into a package.
# From this
pcost.py
report.py
fileparse.py
# To this
porty/
__init__.py
pcost.py
report.py
fileparse.py
You pick a name and make a top-level directory. porty
in the example above (picking this name is the most important first step).
Add an __init__.py
file to the directory. It may be empty.
Put your source files into the directory.
Using a package
A package serves as a namespace for imports.
This means that there are now multilevel imports.
import porty.report
port = porty.report.read_portfolio('port.csv')
# Or
from porty import report
port = report.read_portfolio('portfolio.csv')
from porty.report import read_portfolio
port = read_portfolio('portfolio.csv')
There are two main problems with this approach.
- imports between files in the same package break.
- main scripts placed inside the package break.
Problem: imports
Imports between files in the same package must now include the package name in the import.
Remember the structure.
porty/
__init__.py
pcost.py
report.py
fileparse.py
Modified import example:
# report.py
from porty import fileparse
def read_portfolio(filename):
return fileparse.parse_csv(...)
These imports are absolute, not relative.
# report.py
import fileparse # BREAKS. fileparse not found
Use relative imports inside a package
Instead of directly using the package name, you can use .
to refer to the current package.
# report.py
from . import fileparse
def read_portfolio(filename):
return fileparse.parse_csv(...)
Using from . import modname
makes it easy to rename the package.
Problem: Main Scripts
Running a package submodule as a main script breaks.
bash $ python porty/pcost.py # BREAKS
Reason: You are running Python on a single file and Python doesn’t see the rest of the package structure correctly (sys.path
is wrong).
All imports break.
Solution: run your program in a different way, using the -m
option.
bash $ python -m porty.pcost # WORKS
__init__.py
files
The primary purpose of these files is to stitch modules together.
Example: consolidating functions
# porty/__init__.py
from .pcost import portfolio_cost
from .report import portfolio_report
This makes names appear at the top-level when importing.
from porty import portfolio_cost
portfolio_cost('portfolio.csv')
instead of using the multilevel imports.
from porty import pcost
pcost.portfolio_cost('portfolio.csv')
Another solution for scripts
As mentioned, you now need to use -m package.module to run scripts within your package.
bash % python3 -m porty.pcost portfolio.csv
There is another alternative: Write a new top-level script.
#!/usr/bin/env python3
# pcost.py
import porty.pcost
import sys
porty.pcost.main(sys.argv)
This script lives outside the package. For example, looking at the directory structure:
pcost.py # top-level-script
porty/ # package directory
__init__.py
pcost.py
...
Application structure
Code organization and file structure is key to the maintainability of an application.
There is no “one-size fits all” approach for Python.
However, one structure that works for a lot of problems is something like this.
porty-app/
README.txt
script.py # SCRIPT
porty/
# LIBRARY CODE
__init__.py
pcost.py
report.py
fileparse.py
The top-level porty-app
is a container for everything else –- documentation, top-level scripts, examples, etc.
Again, top-level scripts (if any) need to exist outside the code package. One level up.
Third-party packages
Python has a large library of built-in modules.
There are even more third party modules. Check them in the Python Package Index or PyPi. Or just do a Google search for a specific topic.
How to handle third-party dependencies is an ever-evolving topic with Python. This section merely covers the basics to help you wrap your brain around how it works.
The Module Search Path
sys.path
is a directory that contains the list of all directories checked by the import statement. Look at it:
>>> import sys
>>> sys.path
... look at the result ...
If you import something and it’s not located in one of those directories, you will get an ImportError exception.
Standard Library Modules
Modules from Python’s standard library usually come from a location such as /usr/local/lib/python3.6
. You can find out for certain by trying a short test:
>>> import re
>>> re
<module 're' from '/usr/local/lib/python3.6/re.py'>
Simply looking at a module in the REPL is a good debugging tip to know about. It will show you the location of the file.
Third-party modules
Third party modules are usually located in a dedicated site-packages
directory. You’ll see it if you perform the same steps as above:
>>> import numpy
>>> numpy
<module 'numpy' from '/usr/local/lib/python3.6/site-packages/numpy/__init__.py'>
Again, looking at a module is a good debugging tip if you’re trying to figure out why something related to import
isn’t working as expected.
Installing modules
The most common technique for installing a third-party module is to use pip
. For example:
bash % python3 -m pip install packagename
This command will download the package and install it in the site-packages
directory.
Problems
- You may be using an installation of Python that you don’t directly control.
- A corporate approved installation
- You’re using the Python version that comes with the OS.
- You might not have permission to install global packages in the computer.
- There might be other dependencies.
Use virtual environment!
Virtual environments
A common solution to package installation issues is to create a so-called “virtual environment” for yourself. Naturally, there is no “one way” to do this–in fact, there are several competing tools and techniques. However, if you are using a standard Python installation, you can try typing this:
bash % python -m venv mypython
After a few moments of waiting, you will have a new directory mypython
that’s your own little Python install. Within that directory you’ll find a bin/
directory (Unix) or a Scripts/
directory (Windows). If you run the activate
script found there, it will “activate” this version of Python, making it the default python command for the shell. For example:
bash % source mypython/bin/activate
(mypython) bash %
From here, you can now start installing Python packages for yourself. For example:
(mypython) bash % python -m pip install pandas
For the purposes of experimenting and trying out different packages, a virtual environment will usually work fine. If, on the other hand, you’re creating an application and it has specific package dependencies, that is a slightly different problem.
Handling Third-Party Dependencies in Your Application
If you have written an application and it has specific third-party dependencies, one challange concerns the creation and preservation of the environment that includes your code and the dependencies.
The current (2020) recommendation is to use Poetry.
Refer to the Python Packaging User Guide for the most up-to-date guide.
Distribution
At some point you might want to give your code to someone else, possibly just a co-worker. This section gives the most basic technique of doing that. For more detailed information, consult the Python Packaging User Guide.
setup.py
file
Creating a Add a setup.py
file to the top-level of your project directory.
# setup.py
import setuptools
setuptools.setup(
name="porty",
version="0.0.1",
author="Your Name",
author_email="you@example.com",
description="Practical Python Code",
packages=setuptools.find_packages(),
)
MANIFEST.in
Creating If there are additional files associated with your project, specify them with a MANIFEST.in
file. For example:
# MANIFEST.in
include *.csv
Put the MANIFEST.in
file in the same directory as setup.py
.
Creating a source distribution
To create a distribution of your code, use the setup.py
file. For example:
bash % python setup.py sdist
This will create a .tar.gz
or .zip
file in the directory dist/
. That file is something that you can now give away to others.
Installing your code
Others can install your Python code using pip
in the same way that they do for other packages. They simply need to supply the file created in the previous step. For example:
bash % python -m pip install porty-0.0.1.tar.gz
Comment
The steps above describe the absolute most minimal basics of creating a package of Python code that you can give to another person. In reality, it can be much more complicated depending on third-party dependencies, whether or not your application includes foreign code (i.e., C/C++), and so forth. We’ve only taken a tiny first step.
Refer to the official guide to see how to upload your package to PyPi.
For a deeper discussion and selection of virtual environment, application dependency management tools, check another post dedicated to this topic.