Toward a Corpus Study of the Dynamic Gradual Type

Gradual typing allows developers to explore the middle ground between static and dynamic typing. They can introduce type annotations to the most critical parts of their codebase (however that may be defined) and incrementally add types to the rest of the code. Gradual typing comes with a special top type, Any or Dyn, which eases type adoption but weakens static guarantees. Any must be used sparingly.

However, to reduce the occurrence of Any, gradual type systems must be robust enough to reason about typing idioms of the dynamic host language so that developers are not forced to weaken their already isolated static guarantees with the Any type. To profer typing solutions to dynamic programming idioms, language designers must be able to identify these idioms (patterns).

Consider this sample code in Python (Mypy) that tries to type a callback parameter. Because of the nature of the function’s definition, being that it should work with any and all functions, the developer tries to reflect this in its type, leaving it intentionally vague.

1
2
3
4
5
from typing import Callable, Any
import inspect, functools

def set_def_ctx(func: Callable[..., Any]) -> Callable[..., Any]: 
	return functools.partial(func, ctx="DEFAULT")

The issue with this typing construct is that we lose precision; our type checker is no longer aware of the shape of the function passed to our parameter, func. This imprecision may not be immediately apparent, but what happens when we try to use the result of our function call?

1
2
3
4
5
6
7
def process_data(data: str, other: str, **kwargs) -> List[str]:
    return [f"{data}{other}", kwargs.get('ctx', '')]
    
wrapped_func = set_def_ctx(process_data)

# we lose precision
res = wrapped_func("foo", "bar") # Type: Any instead of List[str]

Our type checker is flustered; it assumes the weak dynamic type (Any) in place of our actual type (List[str]), and this imprecision will propagate up our type analysis as we employ res. We want to avoid introducing weaknesses.

Mypy now provides typing support, in the form of type variables, for patterns where the return type of the callable should be passed down. But for other patterns, such as dependent dictionaries, the fix is not so easy.

We posit that Any can lead us to more patterns. We find 360,000 typing instances that contain the Any type for various reasons. We distill some of these into patterns and are studying these patterns and their implications for language designers. Can they lead to more precise typing solutions, like with our type variable example? Do precise approaches already exist, and may the frequency of Any s call for improvement of existing documentation? Or maybe they expose some room for additional tooling to support developers such as MonkeyType1.

In our paper, Towards a Corpus Study of the Gradual Dynamic Type2, we present the design of our corpus study, and perform a study on 221 repositories that use a specific Python type checker, Mypy. We find eight patterns of interest — eight areas where the type system deservedly needs to grow to accommodate Python idioms. For each of these patterns, we present examples and, in some cases, an approach to flag them at scale. We also discuss implications for Mypy language designers.

Moving forward, we have plan to scale our study to comb through almost 70,000 repos that use Mypy.


  1. The team at Instagram maintains a tiny utility to generate type annotations for simple Python programs by monitoring runtime types. https://github.com/Instagram/MonkeyType ↩︎

  2. Our paper can be found here ↩︎