To type or not to type? That is the question

duck type

22 Aug 2021

I've spent the last couple of months adding type hints to the RQL package. In this first part, I'll try to tell you when I think Mypy may be useful. In a second part, I'll talk about the how, the process I've followed for adding type annotations to RQL. In a last part, I'll talk about what I've learned during the process.

What?

Yes, in case you were living under a rock, here are the news: you can now write Python as if it was a static programming language. The official PEP, the official website. The official documentation. And the official repository ¹. The Alore programming language, the project Mypy borrows heavily from.

Why?

I remember when I was asked to work on this task. The (my) skepticism was real. Around me, reactions were mixed (confused, disgusted, interesed). I won't deny, sometimes, I'm really wondering if I'm not better go and learn to write OCaml. Python devs are so used to quickly prototype their programs, without worrying about the types of their variables, without worrying about their return types, etc. You cannot just come and tell them that type hints are nice! They need to be convinced.

More than once, the discussion ended with the "explicit vs implicit", "static vs dynamic" holy wars, etc... I'm not trying to fuel those debates. There are pros and cons on each sides. You have to know what are the tools, what suits you best, depending on what you're trying to achieve, depending on what is your position (user, author, maintainer).

Assuming you're a person who already write Python programs, and you want to discover more about Mypy. If you only look at some introductory examples, they will help you understand what's going on. They won't convince you to jump on the type-hints band wagon. The problem with most of the examples is, they're too short, or too basic.

No disrepect to the authors, I guess, the intent was to make things easier to understand. Short and simple to get to the point. But short and simple would barely convince you to try.

So, what are the situations in which adding types to your program will be useful, necessary, not a waste of time, priceless, meaningful? To this question, I would say this:

Type hints are mainly useful for the maintainers, the human readers of large code bases. The people who read and debug and maintain large software systems they did not write.

The major benefit for you won't be fewer bugs or unit tests, but the ability to more easily read and reason about code written by someone else.

I need to know the types:

When you read/debug/maintain a large software system you've not written, you'll often be in the position in which you ask yourself: "Those function arguments, those variables, what are their types?", or "Is this function returning something?".

Remember, you were reading the source code, now you need to: modifiy the source code, add a couple of: print(foo), type(bar), isinstance(baz), qux.__class__, corge.__dict__. You will probably add breakpoints here and there, to dive inside pdb. Then start a REPL, then launch the program, then do some instrospection, then see what's going on. And finally, discover that the type is not what you were expecting.

phew

What about just reading the type?

Let's be honest, as a reader, as the one who review millions of lines of code, you just want to be able to read what are the types. And I'm not talking about docstrings. Everybody should document their programs and write docstrings. But let's face it, docstrings may be missing, misleading, out of date, not clear enough, etc.... Let's compare the 2 following shorts snippets:

    def get_description(self, tr): # What is tr ???
        foo = self.get_foo()
        if foo != "Bar":
            return tr(foo) # ah, tr is a function. What's returned by tr?
        return "Bar" # The method seems to return a string

Imagine having to find answers to those questions many times in a day?

    def get_description(self, tr: rt.TranslationFunction) -> Optional[str]:
        foo = self.get_foo()
        if foo != "Bar":
            return tr(foo)
        return "Bar"

You see that in the second snippet, all the important informations are on the function header. You don't even read the function body and you already know that: "get_description is a method that takes a function as its only parameter and return a string or nothing". That's a game changer when you're a maintainer.

It's not a surprise, all the big companies, the ones with large software systems, are already using type annotations.

Too verbose and confusing to read?

Adding type annotation can quickly lead to a source code that is too verbose and disgusting to read. The following is something I've written for RQL. I think I can do better (define type aliases, use stub files, use dataclasses, refactor, ...)

class ScopeNode(BaseNode):
    def __init__(self):
        self.defined_vars: Dict[str, "rql.nodes.Variable"] = {}
        self.with_: List["rql.nodes.SubQuery"] = []
        self.solutions: rt.SolutionsList = []
        self._varmaker = None
        self.where: Optional["rql.base.Node"] = None
        self.having: Iterable["rql.base.Node"] = ()
        self.schema: Optional[Any] = None
        self.aliases: Dict[str, "rql.nodes.ColumnAlias"] = {}

Stub files?

If you want to keep your source code free of type annotations, but still want to use type annotations, you can write stub files. A stub file is just a file that will contain all your variables, function signatures, annotated. Function bodies in stub files are just a single ellipsis. Another really short example:

    # foo.py
    
    def get_description(self, tr):
        foo = self.get_foo()
        if foo != "Bar":
            return tr(foo)
        return "Bar"

    # foo.pyi <--- notice the extension used for stub files
    
    def get_description(self, tr: rt.TranslationFunction) -> Optional[str]: ...

I want type annotations, what's next?

Mypy is a static type checker for Python. Let's see what Mypy can to for us.

1. Generating type annotations:

Let's suppose you want to add type annotations to your code base. How do you proceed?

Mypy includes the stubgen tool that can automatically generate stub files. For example, here is an example of a .pyi file generated by the stubgen tool for the dateparser package.

from typing import Any, Optional

def parse(
    date_string: Any, 
    date_formats: Optional[Any] = ..., 
    languages: Optional[Any] = ..., 
    locales: Optional[Any] = ..., 
    region: Optional[Any] = ..., 
    settings: Optional[Any] = ...): ...

Keep in mind, most types annotations will default to Any. You'll need to update pyi files and write more precise types (We agree that Any everywhere is not really useful). Here is an example of the same stub file with more precise type annotations:

import datetime
import sys
from typing import Set, Tuple

from dateparser.date import DateDataParser

if sys.version_info >= (3, 8):
    from typing import Literal, TypedDict
else:
    from typing_extensions import Literal, TypedDict

__version__: str

_default_parser: DateDataParser

_Part = Literal["day", "month", "year"]
_ParserKind = Literal["timestamp", "relative-time", "custom-formats", "absolute-time", "no-spaces-time"]

class _Settings(TypedDict, total=False):
    DATE_ORDER: str
    PREFER_LOCALE_DATE_ORDER: bool
    TIMEZONE: str
    TO_TIMEZONE: str
    RETURN_AS_TIMEZONE_AWARE: bool
    PREFER_DAY_OF_MONTH: Literal["current", "first", "last"]
    PREFER_DATES_FROM: Literal["current_period", "future", "past"]
    RELATIVE_BASE: datetime.datetime
    STRICT_PARSING: bool
    REQUIRE_PARTS: list[_Part]
    SKIP_TOKENS: list[str]
    NORMALIZE: bool
    RETURN_TIME_AS_PERIOD: bool
    PARSERS: list[_ParserKind]

def parse(
    date_string: str,
    date_formats: list[str] | Tuple[str] | Set[str] | None = ...,
    languages: list[str] | Tuple[str] | Set[str] | None = ...,
    locales: list[str] | Tuple[str] | Set[str] | None = ...,
    region: str | None = ...,
    settings: _Settings | None = ...,
) -> datetime.datetime | None: ...

Do we agree that the :

parse function parameters are more precise.
settings parameter is annotated with a more detailed type.²
returned value is more precise.
refactoring of this function will be easier.

2. Type checking:

Before running your tests, before running your program, Mypy can detect some types of bugs:

I pretend foo should return a string, yet, there are cases where foo does return None. That should be fixed. Notice how, the program is running fine. Mypy will type check the program and tell you what's wrong. Mypy won't stop you from running the program.

Conclusion:

Most people, including myself, don't work on projects of this size (4 million lines of Python code), and hence have not to deal with the constraints of such a big code base.

Most people can perfectly write their Python programs without using type annotations. Mypy is just another tool inside your toolbox. I just want you to try, that will certainly make you a better software developer.

In the next article, I will talk a little bit more about the actual process I've followed for adding type annotations to the RQL package.

To learn more about this topic:

Which, by the way, is where you'll find more interesting answers than what's inside the documentation, imho.

Yes. I won't deny, there is suddenly more lines to read