Implement mypy plugin for type checking validataclasses#140
Conversation
This is a minor breaking change, but it's unlikely to affect a lot of code.
ninanomenon
left a comment
There was a problem hiding this comment.
I am far from understanding all this. I left two comments. Over all looks good to me.
| assert self.assignment_stmt is not None | ||
| lvalue = self.assignment_stmt.lvalues[0] | ||
| assert isinstance(lvalue, NameExpr) | ||
| return lvalue |
There was a problem hiding this comment.
I use asserts only for conditions that must always be true. So if they fail, it means the code of the plugin is incorrect. They serve both as a basic sanity check and for type narrowing.
In this case for example: lvalue only exists if assignment_stmt is set. If it isn't, it doesn't make sense to even call the property. Having the assert here means I don't need an | None in the return type, which in turn means in the places where I need the lvalue, I can just access the property and be sure that it's set.
Similarly for the assertion that it's a NameExpr: AssignmentStmt.lvalues is typed as list[Expression], because in theory an assignment statement can have an lvalue that is not a NameExpr (e.g. in example.foo = 42, the lvalue is a MemberExpr). However, we can safely assume that in our case it is always a NameExpr (because we've analyzed the assignments before, and skipped all assignments where it's not a NameExpr and/or reported an error for them, since those don't make sense in a validataclass).
So, if we didn't have the asserts here, we would still need to handle these theoretical cases and then have Expression | None as the return type, but then we'd also have to handle the same theoretical cases in the code where we need to access the lvalue (and need it to be a NameExpr), etc... In other words, thanks to the asserts we can simplify a lot of code.
Does that make sense? ^^
There was a problem hiding this comment.
Yes makes sense. I think it really unlikely that things go so wrong that this will be an actual problem. BUUUUUT
Did you know python has the flags -O and -OO which enables more optimizations. One of them is to remove all asserts from the bytecode. Thats why I avoid to use asserts outside of tests as much as possible.
There was a problem hiding this comment.
Yes, I know that. :)
That's why I only use asserts in places where it is absolutely fine if they're optimized away. They mainly serve two purposes here, type narrowing and help during development and debugging. The assertions will always be true in production, so they can safely be optimized away. I'm really using assertions here as they are intended ^^'
I should also note that mypy internally uses a lot of assertions too (for the same reasons).
Thank you a lot for your review :)
This is a large part of making validataclass mypy-compatible (#116).
Please note that the target branch is
dev-mypyinstead of main.This PR implements a mypy plugin for parsing and type-checking validataclasses. It is necessary for using mypy in a project with validataclass without disabling important mypy rules.
There are a lot of tests for this plugin using the pytest-mypy-plugins library. These tests are taken into account for the coverage report. There currently are no unit tests for the plugin because it's difficult to test the individual parts of the plugin, but the plugin is still well tested.
The PR is quite large and also includes a handful of other changes (including some breaking changes and deprecations) that were necessary for or related to the development of the plugin. They will be listed below.
The plugin is not fully complete yet for time reasons. There are some (rather unimportant) edge cases that are not handled yet, as well as some other features that will be added in a future update. One example is proper detection of which fields are required when constructing a validataclass (e.g.
ExampleClass()should report an error if the class has a required field). For this reason, you will find a bunch of TODO comments in the code.There is also no documentation on how to use the plugin yet (
I'll probably add that to this PRsee below) but the plugin itself is very well documented. The plugin also doesn't support other validataclass-like decorators or extensions (like the validataclass-search-queries library). Since these are important for our own projects, I will implement a way to support these in a follow-up PR. This follow-up PR will also contain the documentation for how to use the plugin.List of changes outside of the plugin
These are changes not directly related to the plugin, some of them are minor breaking changes or deprecations. (Ignoring changes to tests, the test environment etc.)
_NoDefaulttype to_NoDefaultTypeand don't delete it (51fcdfe)validataclass_field(): Changedefaultto keyword-only argument (6a23032)validataclass_field(): Add proper typing with overloads (d838332, 6b5cbf9)dataclasses.Field[T]object, but this is in line with how typeshed types thedataclasses.field()function and makes sense unless the function is used outside the right context.validataclass_field(): Deprecate using raw default values anddefault=dataclasses.MISSING, only allowing objects likeDefault(...)orNoDefault. (0770764)How the plugin works
The plugin uses some hacks to workaround limitations of how mypy works. Coming up with this solution took a lot of time and brain juice, but the solution itself is actually not too complicated.
The main problem we're trying to solve here: In a validataclass definition, we have assignments like
example_field: int = IntegerValidator(). From a typing perspective, these assignments are wrong because you're assigning an expression of the typeIntegerValidatorto an attribute that is typed asint. In runtime, the@validataclassdecorator transforms these "incorrect" assignments and converts the class to a Python dataclass. Basically, we need to tell mypy that actually the type on the right-hand side isint, or more precisely, whatever type the validator outputs (combined with the default type if the field has one).Essentially, there are two stages relevant to the plugin: The semantic analysis of the code and the type checking. The plugin is called by mypy at specific points within these stages using hooks.
First, during the semantic analysis, mypy calls our plugin using a "decorator hook" whenever it encounters a class definition with the
@validataclassdecorator. This is where theValidataclassTransformerclass of the plugin comes in: It has access to the entire class definition and can modify it in-place, meaning it can replace the expression on the right-hand side of the assignments with something else. In theory, we could directly replace the expression with something of the correct type (just pretend the right-hand side is some random integer and the types are correct).For several reasons, it's not that simple though. Instead, we wrap the right-hand side in a call expression calling a "virtual function" (meaning it only exists for the type checker, not in actual code). The assignment is transformed from
example_field: T = any_expressiontoexample_field: T = _virtual_field_wrapper(any_expression).(There are also some additional information included in the call, namely the class name, field name and a list of base classes, so the actual call looks more like this:
example_field: T = _virtual_field_wrapper(any_expression, "ExampleClass", "example_field", [])).Then, during the type checking pass, mypy resolves the types of all expressions in the code and checks if they are correct in the given context, e.g. whether the right-hand side expression in an assignment is type-compatible with the assigned attribute. Here, it calls the "function hook" whenever it encounters a function call to the "virtual" function we've used as a wrapper. This function hook has access to the arguments of the call and can modify the return type of this call (e.g. if the hook returns "int", mypy will treat the call expression as an int too).
The function hook is handled by the
FieldTypeResolverclass of the plugin. It parses the (wrapped) right-hand side expression of the assignment as a validataclass field definition and determines the type of the field validator (e.g.IntegerValidator) as well as the field default object (if there is one, e.g.Default[int]). Then it simulates a call to thevalidate()method of the validator and asks mypy for the return type that this call would have (here:int). The same is done for the default object, and the resulting types are combined (type union) and returned.The result: mypy reads the assignment as
example_field: int = [some expression of type int]and accepts it. If we used the wrong field type (or validator), for exampleexample_field: str = IntegerValidator(), mypy will now report an error because the right side (int) is incompatible with the left (str).There are several more things that are happening in the plugin, though. For example, things get way more complicated for classes with inheritance (imagine: base class with IntegerValidator, subclass only overrides the default of the field to
Default(None)-> plugin needs to consider both and classes and return the typeint | None). For several annoying reasons (e.g. mypy's caching system), we need stuff like theParsedFieldCache(not related to mypy's own cache!). Explaining this here would be too much, but I hope the code is documented well enough to understand what happens there.The plugin also has its own mypy error codes! Most of the time this is just
validataclass, but there are two other error codes. You can use them exactly like the built-in error codes, e.g. you can write# type: ignore[validataclass]in the code or disable an error code completely in the mypy config.Notes for reviewers
Any reviews are appreciated! It's okay if you don't understand some of the mypy specifics (also feel free to ask if you have questions).
My recommended order of reviewing the plugin code would be as follows (this is roughly the order in which things happen within the plugin):
validataclass/mypy/plugin/plugin.py: Defines the mypy plugin and the hooks, otherwise just calls functions from other classes.validataclass/mypy/plugin/validataclass_transformer.py: Handles the "decorator hook", i.e. it transforms validataclass class definitions and wraps the assignments.validataclass/mypy/plugin/field_type_resolver.py: Handles the "function hook" for the virtual wrapper function, i.e. it analyzes the assignment right-hand side expressions and returns the "real" type of the field.validataclass/mypy/plugin/parsed_field_cache.py: Relatively simple class that caches information about parsed fields so that they can be accessed again when parsing subclasses.