Skip to content

fix: lazy cross-module imports in generated Python code#5134

Open
dgandhi62 wants to merge 15 commits into
mainfrom
python-imports-4
Open

fix: lazy cross-module imports in generated Python code#5134
dgandhi62 wants to merge 15 commits into
mainfrom
python-imports-4

Conversation

@dgandhi62
Copy link
Copy Markdown
Contributor

@dgandhi62 dgandhi62 commented May 28, 2026

Problem

Every generated init.py file has import statements at the top that look like this, for example

from .aws_iam import IGrantable as _IGrantable_ef567890
from .aws_kms import IKey as _IKey_abc12345
...

and cross-module imports that could look like this. for example

import scope.jsii_calc_base
...

These load immediately whenever each module loads, which in turn load their own dependencies. This causes a huge cascade of imports at module load time. A large chunk of these imports only exist to satisfy type annotations however, or runtime type checks, and therefore this creates a need for optimization.

Solution

We can replace the eager imports to lazy with a TYPE_CHECKING pattern that would defer importlib.import_module() until first attribute access. They would look something like this -

_LazyImport = jsii._LazyImport

if typing.TYPE_CHECKING:
    from .. import composition as _composition_4f38e801
    import scope.jsii_calc_lib as _scope_jsii_calc_lib_c61f082f
else:
    _composition_4f38e801 = _LazyImport(".composition", __name__)
    _scope_jsii_calc_lib_c61f082f = _LazyImport("scope.jsii_calc_lib")

mypy and pyright go into the if statement whereas the rest will go into the else. Types are now accessed as attributes rather than being imported individually.

Note that _LazyImport is a class we define in the python runtime that defers loading with its own _getattr function. The resolved module is cached after first access whereas the failed modules do not. We also add an optional package parameter which helps us maintain support for relative imports (Source - https://docs.python.org/3/library/importlib.html). The one edge case is if a child submodule references a type from the root init.py. For this, we maintain absolute imports.

@mergify mergify Bot added the contribution/core This is a PR that came from AWS. label May 28, 2026
Comment thread packages/jsii-pacmak/lib/targets/python/type-name.ts Outdated
@dgandhi62 dgandhi62 force-pushed the python-imports-4 branch 3 times, most recently from 01c71e2 to 535bb9b Compare June 2, 2026 15:53
.update(typeSubmodulePythonName)
.update('.')
.update(toImport)
.update('.*')
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since one lazy proxy now serves all types from a module, the hash needs to be stable regardless of which type triggered the import.

@rix0rrr
Copy link
Copy Markdown
Collaborator

rix0rrr commented Jun 3, 2026

@dgandhi62, please do me a favor, and write the PR body by hand. I want you to reflect on these changes, and describe and motivate them from your own understanding, after looking at the code.

Do not copy/paste the PR body your agent came up with, and do not read and rephrase the PR body your agent came up with.

PEP 563

You refer to PEP 563; when I look at that PEP the first thing I see is the text "The features proposed in this PEP never became the default behaviour, and have been replaced with deferred evaluation of annotations, as proposed by PEP 649 and PEP 749".

How does that square with the rest of this PR?

Type annotations vs run-time values

If "types" are in annotation position, we should be able to simply use strings, right? And if they're strings, they shouldn't be evaluated? So are all types in annotation position properly being rendered as strings? I thought so but please confirm? If so, do we need to do anything else? Except maybe make sure that the objects that the objects referenced in a string are in-scope at the time check_type gets called on them?

If "types" are in value position, like in your example:

from .aws_iam import IGrantable as _IGrantable_ef567890

class Bucket(_IGrantable_ef567890):  # <-- will be evaluated as part of the class def
  ...

Then changing that to the following doesn't really matter:

_aws_iam_ef567890 = _LazyImport("aws_cdk.aws_iam")

class Bucket(_aws_iam_ef567890.IGrantable):  # <-- will immeditely resolve the LazyImport
  ...

Because that lazy import will immediately be evaluated anyway.

Is there something we can do to defer the building of that class (and the evaluation of its ancestors) until it is evaluated for the first time?

And can you figure out why we have that from .aws_iam import IGrantable as _IGrantable_ef567890 to begin with? It must be there for a reason but I'm not sure what problem it is trying to solve. If we perhaps have solved that problem in a different way already, we might get rid of this and simplify that way?

Absolute imports instead of relative

What is being optimized here? The runtime behavior of the Python module, or the complexity of the code necessary to generate it? If you want to do this, it should not be part of this change but a separate change.

@rix0rrr
Copy link
Copy Markdown
Collaborator

rix0rrr commented Jun 3, 2026

After reading up on the PEP a little, it seems it has been deprecated as of Python 3.14, and in any case the behavior would be to treat every annotation as if it had been written in string form, even if it hasn't (an explanation of that in the PR body would have been nice, so I wouldn't have had to chase that down).

I suppose we could use that, but since we use codegen, we can also just control the annotation rendering as a string directly, no?

@dgandhi62
Copy link
Copy Markdown
Contributor Author

Just seeing the review. Okay, taking a look. Will do

@dgandhi62 dgandhi62 force-pushed the python-imports-4 branch 2 times, most recently from 1e03acf to 10fd1dd Compare June 3, 2026 16:25
@dgandhi62
Copy link
Copy Markdown
Contributor Author

Responding to your questions

  1. Yes all type annotations are rendered as quoted strings. The only time they resolve is when typing.get_type_hints() is called inside the type-checking stubs at runtime. Those stubs resolve the string annotations by looking up the names in module globals. So the names need to exist at runtime, but they can be lazy proxies that only resolve when actually accessed. Python (to my knowledge) doesn't seem to have a lazy proxy and the library just gives us an in-built __getattr__, which makes its own module's attrs lazy (we used this in the first pr). However, here we are referencing other modules within our module, and it does not help with that. So we define a lazy class ourselves.

  2. The benefit here (and main saving in time) is not about deferring the base class proxy resolving immediately, but rather saving time on all the annotation-only imports (and their own cascades). I think deferring the base class may not be possible (or may be a lot harder). The reason being, class keyword is an executable and iGrantable needs to be ready to resolve as soon as python sees class.

The way I understand it is the imports are divided into
base class imports + annotation-only imports.
We are saving on the latter.

  1. Absolute imports are still needed for the one case where a child references a type from the root init file from its package. Since we are now lazy importing entire modules and not types, we can't do that. I've changed the code to reflect absolute imports only for this case, and it maintains relative imports for the rest.

  2. You're right, PEP 563 isn't needed. Removed it.

@dgandhi62 dgandhi62 requested a review from rix0rrr June 3, 2026 20:45
@dgandhi62
Copy link
Copy Markdown
Contributor Author

I have also updated the pr with my own understanding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution/core This is a PR that came from AWS.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants