My journey to type checking 7521 lines of Python

I've spent the last couple of months adding type hints to the RQL package. In the first part, I talked you the reasons I think Mypy may be useful. In this part, I'll talk about the process I've followed for adding type annotations to RQL.

Recap of the first part:

Type hints are mainly useful for the maintainers, the human readers of large code bases. The people who read and debug and maintain large software systems they did not write.

The major benefit for you won't be fewer bugs or unit tests, but the ability to more easily read and reason about code written by someone else.


Outline:

First, I'll talk about what Mypy is and what should be it's initial setup.

Secondly, I'll briefly show you how I deconstructed the project (import structure & classes hierarchy) to have another perspective of how the project is built.

I will also show 3 different ways to generate type annotations and where I think those type annotations should reside.

Finaly, I'll talk about things that still need to be improved. Mainly, the use of Any and type:ignore.

Feel free to jump straight to the part that's most interesting.


Mypy:

Mypy is a library that provides optional static type checking. Unlike other programming languages, where the static type-checking takes place at compilation time, Mypy CLI does the type-check on-demand. At Logilab, the type-check was done, locally when developing, but also in the Continuous Integration pipeline, using Gitlab and Tox.


Initial setup:

We need to ensure that Mypy is really being run everywhere it should be and everytime someone pushes code. We don't want to spend time applying type annotations that won't be checked. If you don't check type annotations on a regular basis, they may become wrong after a certain period of time. That means:

1. Configuring Mypy

Configuration is done using a mypy.ini file. At the begining, we do not want strict typing checks. Then, we'll try to work towards more strigency, for example: dissallowing dynamic typing or disallowing untyped definitions. We're working with a legacy code base with no type definitions, let's start with more relaxed options.

2. Ignoring external libraries:

I've done it many weeks later, but I could have done it right from the beginning: tell Mypy to ignore missing type hints from external libraries. Maybe in another phase, we'll annotate them, but for now, they should be ignored.

3. Set a Tox environment and CI job:

Add Tox environment to automate the boring stuff and reduce the boilerplate. And add a Gitlab CI job. that will be in charge of running Mypy every time a new changeset is sent to the Gitlab server.


Import structure

I thought a diagram showing the import structure would help me understand the project. I also thought making a graph of the RQL's import structure would help me add types a bit faster. My idea was to add annotations from bottom to top. I was wrong. More than once, after adding type annotations to a module, I was obliged to go back to where it was already done. For example, method overrides.

PyDeps is the module dependency visualization I used, the result being:


If you open the diagram in another tab, you'll notice that we can see external libraries too, for example: logilab-database, yapps, pygments. Those are the ones that need to be ignored during type checking.


Classes hierachy:

Within Pylint, there is a tool named Pyreverse that analyses Python code and extracts UML class diagrams and package dependencies. Using Pyreverse, we can have a clear view of all the classes and the relation between them:


Type hints addition:

There are different options for adding type annotations.

1. Manual

We can do it manually: finding what are the types flowing through our program, I've been there, it's painful. Literally:

  • Adding breakpoints here and there
  • Running the function or running the tests
  • Once inside PDB, asking the type function to tell me who is who

In [73]: def g(x):
    ...:     breakpoint()
    ...:     x[5] = 'foo'
    ...:     y = 42
    ...:     # a long list of non understandable code
    ...:     return y
    ...: 

In [76]: # let's assume we launched the tests and somewhere, g is called

In [77]: g([16, 72, 38, 45, 21, 34])
> <ipython-input-75-ef8444a3ab59>(3)g()
-> x[5] = 'foo'
(Pdb) type(x)  # please, what is the type of the parameter x?
<class 'list'>
(Pdb) n
> <ipython-input-75-ef8444a3ab59>(4)g()
-> y = 42
(Pdb) n
> <ipython-input-75-ef8444a3ab59>(6)g()
-> return y
(Pdb) type(y) # please, what is the type of the returned value?
<class 'int'>
(Pdb)

After that first run, we may be temped to write our function like one that takes a list and returns an integer:

In [80]: # g takes a list and return an integer

In [81]: def g(x: list) -> int:
    ...:     x[5] = 'foo'
    ...:     y = 42
    ...:     # a long list of non understandable code
    ...:     return y
    ...: 

I said "temped" because, maybe the function takes a dictionary and returns an integer. Only an extensive set of tests will tell us what's the best answer.

2. Reveal:

I never really use them, but another option is to use reveal_type or reveal_locals functions. You just put a reveal_type(expr) in the code, and run it with Mypy. I never really use those 2 functions because most of the time, the revealed type is Any:



3. PyAnnotate:

The third and really helpful option I want to suggest is, using a tool that will give you an exhaustive list of suggestions, for example PyAnnotate. PyAnnotate will help you list all the call arguments and all the return types observed at runtime.

Similar projects exists, MonkeyType created by Instagram. Pytype from Google. And Pyre created at Facebook... Hem, I mean, Meta. I've used PyAnnotate just because it felt like the most easiest to setup and to use. I should probably give a try to the other options.

In the repository, there is a configuration example you can use with Pytest. And after running your tests, you should find a new file named type_info.json. This file will contain a json that looks like:

    {
        "path": "rql/__init__.py",
        "line": 128,
        "func_name": "RQLHelper.compute_solutions",
        "type_comments": [
            "(rql.stmts.Union, None, None, int) -> Set",
            "(rql.stmts.Insert, None, None, int) -> Set",
            "(rql.stmts.Union, Dict[str, function], None, int) -> Set",
            "(rql.stmts.Delete, None, None, int) -> Set",
            "(rql.stmts.Set, None, None, int) -> Set",
            "(rql.stmts.Union, Dict[str, function], Dict[str, str], int) -> Set[str]",
            "(rql.stmts.Union, None, None, int) -> pyannotate_runtime.collect_types.NoReturnType",
            "(rql.stmts.Insert, None, None, int) -> pyannotate_runtime.collect_types.NoReturnType"
        ],
        "samples": 65
    },
    {
        "path": "rql/__init__.py",
        "line": 152,
        "func_name": "RQLHelper.simplify",
        "type_comments": [
            "(rql.stmts.Union) -> None"
        ],
        "samples": 14
    },    

Those generated type-hints are suggestions. You will have to tweak them first, then apply them. Once applied, you should run mypy on the updated files to verify that everything stays green.


How much should be typed?

I won't deny, I was gradually adding annotation to everything, everywhere. There was a feeling of endlessness.

Then I asked myself: "How do I know the RQL package has enough annotations? How do I know that I've done enough?"

I decided to follow a path: adding Mypy coverage reports to the project. The lower the percentage, the happier you should be:

Mypy Type Check Coverage Summary
================================

Script: index

+---------------------+-------------------+----------+
| Module              | Imprecision       | Lines    |
+---------------------+-------------------+----------+
| rql                 |  29.63% imprecise |  324 LOC |
| rql._exceptions     |   0.00% imprecise |   48 LOC |
| rql.analyze         |  62.35% imprecise |  603 LOC |
| rql.base            |  13.19% imprecise |  288 LOC |
| rql.interfaces      |  11.54% imprecise |   78 LOC |
| rql.nodes           |  24.20% imprecise | 1438 LOC |
| rql.parser          |  77.19% imprecise | 1245 LOC |
| rql.parser.__main__ |  19.30% imprecise |   57 LOC |
| rql.pygments_ext    |  23.21% imprecise |   56 LOC |
| rql.rqlgen          |   1.62% imprecise |  247 LOC |
| rql.rqltypes        |  12.92% imprecise |  178 LOC |
| rql.stcheck         |  49.25% imprecise |  867 LOC |
| rql.stmts           |  30.98% imprecise | 1372 LOC |
| rql.undo            |  27.39% imprecise |  387 LOC |
| rql.utils           |  20.72% imprecise |  333 LOC |
+---------------------+-------------------+----------+
| Total               |  38.64% imprecise | 7521 LOC |
+---------------------+-------------------+----------+

Mypy does type inference, it can guess the types of values based on the context. We see in the picture below that even though there is no type annotations, the program is not flagged as 100% imprecise:

I finally gave up, to focus on what other Logilabians, suggested: find and list what are the RQL's most imported/used classes and functions. If those classes and functions are typed, we'll consider we've reach a milestone.

Type information inline or via stubs?

Mypy gives you the ability to add type annotations to your project without ever modifying the original source code. You do it using what's called "stub files". Stub files are python-like files, that only contain type-checked variable, function, and class definitions. Nothing else.

I went the other way, I've added type annotations directly within the original source code. In the beginning, it really felt like the easiest thing to do, having everything in the same file. But now that there is more and more types, I think I'm better improve readability.

I can for example write type aliases whenever necessary. Interesting inspiration can be found here. I can also write a snippet that will strip all those annotations and put them inside .pyi files.

From now on, I think I'll always use stub files.


Type Checking:

The typing module defines a TYPE_CHECKING constant that is False at runtime but treated as True while type checking. This technique is used to tell mypy something without the code being evaluated at runtime, for example resolving import cycles. Example here, another one here.


Improvements:

1. Less Any:

I've been into situations where I'm not sure what should be the correct type annotation. Instead of leaving an empty type annotation, I put Any, to show other developers that the lack of a restrictive type hint is a conscious choice: either I don't know, either I'm not sure.

Normally, I should use Any only if it's impossible for me to know what the type is going to be.

From the documentation we can read:

Notice that no type checking is performed when assigning a value of type Any to a more precise type. For example, the static type checker did not report an error when assigning a to s even though s was declared to be of type str and receives an int value at runtime!

So, to take advantage of Mypy, I should really avoid using Any and see if I can replace it with a proper annotation.

2. Less type: ignore comments:

It was also impossible to progress without using # type: ignore comments, to work around the trickier cases. There are still too many type: ignore comments. I should have another look and see if I can improve the type annotations and remove some of those type: ignore comments.


Thanks:

Special thanks to Romaric who took the time to read this article and provide insightful critiques.


More on this topic:

Really hope you've learned something reading this article. As always, the topic is wide and cannot be completely covered here. To discover more about this subject, I strongly recommend the following links: