fix: lazy cross-module imports in generated Python code#5134
fix: lazy cross-module imports in generated Python code#5134dgandhi62 wants to merge 15 commits into
Conversation
cbd1506 to
b990488
Compare
01c71e2 to
535bb9b
Compare
| .update(typeSubmodulePythonName) | ||
| .update('.') | ||
| .update(toImport) | ||
| .update('.*') |
There was a problem hiding this comment.
Since one lazy proxy now serves all types from a module, the hash needs to be stable regardless of which type triggered the import.
|
@dgandhi62, please do me a favor, and write the PR body by hand. I want you to reflect on these changes, and describe and motivate them from your own understanding, after looking at the code. Do not copy/paste the PR body your agent came up with, and do not read and rephrase the PR body your agent came up with. PEP 563You refer to PEP 563; when I look at that PEP the first thing I see is the text "The features proposed in this PEP never became the default behaviour, and have been replaced with deferred evaluation of annotations, as proposed by PEP 649 and PEP 749". How does that square with the rest of this PR? Type annotations vs run-time valuesIf "types" are in annotation position, we should be able to simply use strings, right? And if they're strings, they shouldn't be evaluated? So are all types in annotation position properly being rendered as strings? I thought so but please confirm? If so, do we need to do anything else? Except maybe make sure that the objects that the objects referenced in a string are in-scope at the time If "types" are in value position, like in your example: from .aws_iam import IGrantable as _IGrantable_ef567890
class Bucket(_IGrantable_ef567890): # <-- will be evaluated as part of the class def
...Then changing that to the following doesn't really matter: _aws_iam_ef567890 = _LazyImport("aws_cdk.aws_iam")
class Bucket(_aws_iam_ef567890.IGrantable): # <-- will immeditely resolve the LazyImport
...Because that lazy import will immediately be evaluated anyway. Is there something we can do to defer the building of that class (and the evaluation of its ancestors) until it is evaluated for the first time? And can you figure out why we have that Absolute imports instead of relativeWhat is being optimized here? The runtime behavior of the Python module, or the complexity of the code necessary to generate it? If you want to do this, it should not be part of this change but a separate change. |
|
After reading up on the PEP a little, it seems it has been deprecated as of Python 3.14, and in any case the behavior would be to treat every annotation as if it had been written in string form, even if it hasn't (an explanation of that in the PR body would have been nice, so I wouldn't have had to chase that down). I suppose we could use that, but since we use codegen, we can also just control the annotation rendering as a string directly, no? |
|
Just seeing the review. Okay, taking a look. Will do |
1e03acf to
10fd1dd
Compare
e9866b7 to
630189c
Compare
|
Responding to your questions
The way I understand it is the imports are divided into
|
|
I have also updated the pr with my own understanding |
Problem
Every generated
init.pyfile has import statements at the top that look like this, for exampleand cross-module imports that could look like this. for example
These load immediately whenever each module loads, which in turn load their own dependencies. This causes a huge cascade of imports at module load time. A large chunk of these imports only exist to satisfy type annotations however, or runtime type checks, and therefore this creates a need for optimization.
Solution
We can replace the eager imports to lazy with a TYPE_CHECKING pattern that would defer
importlib.import_module()until first attribute access. They would look something like this -mypy and pyright go into the if statement whereas the rest will go into the else. Types are now accessed as attributes rather than being imported individually.
Note that _LazyImport is a class we define in the python runtime that defers loading with its own _getattr function. The resolved module is cached after first access whereas the failed modules do not. We also add an optional
packageparameter which helps us maintain support for relative imports (Source - https://docs.python.org/3/library/importlib.html). The one edge case is if a child submodule references a type from the root init.py. For this, we maintain absolute imports.