Type Variances

Posted on Mar 21, 2026

The culprit

I was minding my own business writing some type-annotated Python code when my type checker of trust did me dirty and nerd-sniped me.

error: Return type "list[A]" of "list_method" incompatible with return type "list[A | B]" in supertype "ParentClass"  [override]

Oof. Let’s provide some context.

Say we have a simple inheritance structure, where ChildClass inherits from ParentClass. Both classes only come with two methods: sequence_method and list_method. As the name suggests, one returns a Sequence object and one returns a list object. Here comes the more or less subtle difference between both classes: the parent returns containers, in which every element can either be an instance of class A or class B. The child class narrows down this option space by only ever returning elements which are instances of class A in its containers.

Ok – enough of the prose – here’s the aforementioned in code:

from typing import Sequence
from dataclasses import dataclass

@dataclass
class A:
    field: str

@dataclass
class B:
    field: int

class ParentClass:

    def sequence_method(self, flip: bool) -> Sequence[A | B]:
        if flip:
            return [A("hey"), B(5)]
        return (A("hey"), B(5))

    def list_method(self) -> list[A | B]:
        return [A("hey"), B(5)]

class ChildClass(ParentClass):

    def sequence_method(self, flip: bool) -> Sequence[A]:
        if flip:
            return [A("hey"), A("ho")]
        return (A("hey"), A("ho"))

    def list_method(self) -> list[A]:
        return [A("hey"), A("ho")]

Since I was caught off guard when noticing that one of my pre-commit hooks – admittedly only run pro forma, overconfidently not expecting any issues – complained regarding typing, I wanted to be sure this wasn’t a type checking bug and ran a couple of type checkers.

mypy

error: Return type "list[A]" of "list_method" incompatible with return type "list[A | B]" in supertype "ParentClass"  [override]
Found 1 error in 1 file (checked 1 source file)

ty

error[invalid-method-override]: Invalid override of method `list_method`
  --> example.py:29:9
   |
27 |         return (A("hey"), A("ho"))
28 |
29 |     def list_method(self) -> list[A]:
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Definition is incompatible with `ParentClass.list_method`
30 |         return [A("hey"), A("ho")]
   |
  ::: example.py:19:9
   |
17 |         return (A("hey"), B(5))
18 |
19 |     def list_method(self) -> list[A | B]:
   |         -------------------------------- `ParentClass.list_method` defined here
20 |         return [A("hey"), B(5)]
   |
info: This violates the Liskov Substitution Principle
info: rule `invalid-method-override` is enabled by default

pyrefly

ERROR Class member `ChildClass.list_method` overrides parent class `ParentClass` in an inconsistent manner [bad-override]
  --> example.py:29:9
   |
29 |     def list_method(self) -> list[A]:
   |         ^^^^^^^^^^^
   |
  `ChildClass.list_method` has type `BoundMethod[ChildClass, (self: ChildClass) -> list[A]]`, which is not assignable to `BoundMethod[ChildClass, (self: ChildClass) -> list[A | B]]`, the type of `ParentClass.list_method`
 INFO 1 error

And so it had to be me, not them. Reason enough to take a step back and revisit the basics.

LSP

In 1994, Barbara Liskov published ‘A Behavioral Notion of Subtyping’, in which she states that

$$ T_1 \text{ is a subtype of } T_2 \rightarrow (\forall x:T_2,\psi(x) \rightarrow \forall y:\ T_1,\psi(y)) $$

where

$T_1$ and $T_2$ are types
$x$ is instantiating $T_2$ and $y$ is instantiating $T_1$, respectively
$\psi$ is a provable property

In other words, if a type $T_1$ is a subtype of $T_2$, every property that is provable for instances of $T_2$ is also provable for instances of $T_1$.

Some provable properties could, for instance, be:

$\psi_1(x) = \mathbb{I}[x \text{ can only take on values from } \mathcal{V}]$
$\psi_2(x) = \mathbb{I}[x \text{ has a method called walk, which accepts a parameter of type } T_k \text{ and returns a parameter of type } T_j]$

Shifting the lens on program behaviors – hence the keyword behavioral subtyping – rather than provable properties, Liskov and her co-author Jeannette Wing say:

[…] the objects of the subtype ought to behave the same as those of the supertype as far as anyone or any program using supertype objects can.

Parenthesis: What is a type, actually?

Liskov defines a type $T$ as a triplet $(\mathcal{O}, \mathcal{V}, \mathcal{M})$ where

$\mathcal{O}$ is the space of objects that can instantiate the type
$\mathcal{V}$ is the space of values that the type can take on
$\mathcal{M}$ is the space of methods that the type offers to its user

In order to see why the object space is necessary to distinguish two types, consider the following example.

	$T_1$	$T_2$
$\mathcal{V}$	$\mathbb{Z}$	$\mathbb{Z}$
$\mathcal{O}$	$\{o\}$ (singleton)	$\{o_i \| i \in \mathbb{Z}\}$
$\mathcal{M}$	{`len(): int`, `append(x: int) -> None`}	{`len(): int`, `append(x: int) -> None`}

Based on $\mathcal{V}$ and $\mathcal{M}$ alone, both types $T_1$ and $T_2$ would seem indistinguishable. Yet their observable behavior may differ noticeably:

object_1 = T()
object_2 = T()

object_1.append(12)
assert object_1.len() == object_2.len()

Here, one could turn the assertion into a provable property $\psi$.

So we now have a guiding principle regarding types and subtypes, generally speaking.

As the name of the theorem suggests, we can use the information on subtyping to derive information on substitution.

If S is a subtype of T, then objects of type T in a program can be replaced with objects of type S without changing the desirable properties of that program.

Importantly, we should think about what exactly that means for the type’s methods. For the sake of concreteness, let’s suppose that our type $T_1$ has one method $m$, i.e. $\mathcal{M} = \{m\}$, based on which we can define the following provable properties:

$\psi_1(o) = \mathbb{I}[o \text{ has a method m which ‘works’ with input values } \{4, 5, 6\}]$
$\psi_2(o) = \mathbb{I}[o \text{ has a method m which returns values from } \{4, 5, 6\}]$

If $T_2$ actually wants to be a valid subtype of $T_1$ and therefore be a substitution candidate for it, it needs to have these provable properties, too. Yet, we note that both desired $T_1$ properties follow, for instance, from these two properties:

$\psi_1(o) = \mathbb{I}[o \text{ has a method m which ‘works’ with input values } \{2, 3, 4, 5, 6, 7\}]$
$\psi_2(o) = \mathbb{I}[o \text{ has a method m which returns values from } \{5, 6\}]$

So, in other words, the subtype’s methods need to have laxer or equal pre-conditions (‘contravariance’) and stricter or equal post-conditions than those of the supertype’s corresponding methods (‘covariance’). Elisa Baniassad illustrates this expectation beautifully in her equally beautifully titled work Making the Liskov Substitution Principle Happy and Sad:

Substitutability in practice

Now the above LSP defines in which case a certain type can be substituted for another. Yet, evaluating whether the sub-typing implication is contradicted may require a whole lot of work: we need to figure out all provable properties about the supertype candidate and evaluate it on the subtype candidate. How would we even go about this in practice? Can we even capture all provable properties of a type?

Fortunately, the questions asked in practice are a little easier to tackle. Static type checkers typically make their lives easier by simply focusing on particular kinds of provable properties of a type. So, in essence, they can’t say whether a type is actually a fair subtype of another, they can only sometimes rule it out. One might say that they come with false negatives but without false positives in classifying whether a type is a valid subtype of another. More concretely, PEP 544 talks about how properties to be validated to determine whether a type complies with a Protocol:

A concrete type X is a subtype of protocol P if and only if X implements all protocol members of P with compatible types. In other words, subtyping with respect to a protocol is always structural.
A protocol P1 is a subtype of another protocol P2 if P1 defines all protocol members of P2 with compatible types.

The mypy docs keep it simple:

Class Dog is a structural subtype of class Animal if the former has all attributes and methods of the latter, and with compatible types.

Note how these two conditions are substantially more forgiving than Liskov’s behavioral subtyping condition from before. To my understanding we might therefore call what we are doing with Python type checkers structural subtyping.

Concretely, assessing whether a Python type $T_1$ is a valid subtype candidate of another type $T_2$ – from a structural subtyping perspective – might naively look as follows:

Does $T_1$ lack some of the attribute names of $T_2$?
Is there an attribute name from $T_1$ with a type that cannot be substituted by the type of the same attribute name from $T_2$?
Does $T_1$ lack some method names of $T_2$?
Is there a method name which is present in both $T_1$ and $T_2$ with a parameter that is more restrictive in $T_1$ than in $T_2$?
Is there a method name which is present in both $T_1$ and $T_2$ with a return value that is laxer in $T_1$ than in $T_2$?

Where it all went south

Now, talking about ClassA, Dog, Rectangle, int and str examples is all nice and simple. But what happens when we create types by not only defining atomic types from scratch or via simple inheritance/protocols but rather rely on composition of types, i.e. generics?

A simple composition of types we can think of is a union, i.e. a disjunction. We can conveniently express such a composition of types $T_1$ and $T_2$ with T1 | T2 or typing.Union[T1, T2] in Python.

Somewhat intuitively, we can draw the following analogy:

$S_1$ is a subset of the union $S_1 \cup S_2$
$T_1$ is a subtype of $T_1$ | $T_2$

Another simple composition of types we can think of is the parametrization of a generic container type.

Also here, it holds somewhat intuitively that

tuple[Dog, ...] is a subtype of tuple[Animal, ...] where Dog inherits from Animal.
Sequence[Dog] is a subtype of Sequence[Animal] where Dog inherits from Animal.

We can also combine both compositions and observe that

tuple[T_1, ...] is a subtype of tuple[T_1 | T_2, ...]

While this intuitive extension from atomic types to generics might inspire a certain confidence, don’t count the chickens before they hatch:

list[T1] is not a subtype of list[T1 | T2]

So, what is the governing rule, here?

Variances¹²

Having encountered contravariance and covariance already, it is now time to talk about the third kind of variance: invariance.

As the name suggests, if two types are invariant, we can’t say anything about their subtyping or substitutability properties.

Let’s recap our variances:

Covariance: $T_1 \leq T_2 \rightarrow TC[T_1] \leq TC[T_2]$
Contravariance: $T_1 \leq T_2 \rightarrow TC[T_1] \geq TC[T_2]$
Invariance: $[T_1 \leq T_2 \not \rightarrow TC[T_1] \leq TC[T_2]] \wedge [T_1 \leq T_2 \not \rightarrow TC[T_1] \geq TC[T_2]]$

where $TC$ is a ’type constructor’. A type constructor could, for instance, be the process of defining a method³ or the process of creating compound or container types.

These definitions can also be found in prose in Python’s documentation on subtying for generics.

So somehow, there are different type constructors – such as generics – that lead to different kinds of subtyping behaviors. But how do we know which type constructor corresponds to what type of variance?

On the one hand, we can usually just look it up in the docs:

Most immutable containers, such as Sequence and FrozenSet are covariant. Union is also covariant in all variables
Callable is an example of type that behaves contravariantly in types of arguments.
List is an invariant generic type.
By default, all user-defined generics are invariant.

So, here we have it: list is invariant. Knowing that the relevant and necessary information is available somewhere in the docs is a great start. But how could we have known this up front, without looking up the conclusion?

It turns out that we already have all the equipment we need and only need to take a closer look at list. In its simplest form, we would expect insert and get methods on it. Let’s compare the method interfaces for these two functions for list[A] and list[A | B]:

	`list[A]`	`list[A \| B]`	change in condition
`insert` parameter	`A`	`A \| B`	widening
`insert` return value	/	/	/
`get` parameter	/	/	/
`get` return value	`A`	`A \| B`	widening

Yet, if we think back to our lovely smiley diagram, we realize that both widening a pre-condition and widening a post-condition is not compatible with a subtyping relationship in either direction. This exactly corresponds to the definition of invariance we just established: neither is a subtype of the other.

I reckon I demonstrated a whole lot of self-discipline by not calling this paragraph types of variance. ↩︎
See e.g. this .NET documentation for variances. ↩︎
Concretely, our Python $TC$ for a method could be TC[T1] := Callable[[T1, int], str]. ↩︎

Type Variances

The culprit

LSP

Substitutability in practice

Where it all went south

Variances12

Variances¹²