Fix project resolution in admin ingest and improve TF operator transparency#50
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the infrastructure and CLI for the CDC data ingestion process. Key changes include exposing the GCP project ID and CDC service name via Terraform outputs, allowing the IngestionJobClient to resolve the project ID dynamically, and adding support for a configurable CDC web service image. The CLI now also provides more detailed console links by parsing operation names. Feedback suggests avoiding variable shadowing in the CLI logic and making the hardcoded GCP region configurable to improve flexibility.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the infrastructure and CLI to support multi-region deployments by exposing and utilizing GCP project_id and region from Terraform outputs. Key changes include updating the IngestionJobClient to accept these parameters, enhancing the CLI's console link generation, and adding pyopenssl as a dependency. Feedback suggests removing the hardcoded us-central1 default in the IngestionJobClient and instead raising an error if the region is not provided, which would prevent accidental resource management in the wrong region.
gmechali
left a comment
There was a problem hiding this comment.
Thanks Christie! Couple small questions but no objections
| click.secho(job_url, fg="blue", underline=True) | ||
| click.secho("Execution Console Link: ", fg="cyan", bold=True, nl=False) | ||
| click.secho(exec_url, fg="blue", underline=True) | ||
| elif ( |
There was a problem hiding this comment.
When would it go through here?
There was a problem hiding this comment.
I briefly looked into this and from my initial understanding the endpoint actually only returns operations, not executions.
I might submit this as is and then remove the previous block after doing some more testing and also refactor the method a little bit.
This PR addresses a bug where the
admin ingest startcommand would infer the target GCP project from the user's ambientgcloudcontext instead of using the project explicitly configured in the Terraform state. It also removes the hardcoded region (us-central1) limitation by reading the region from Terraform outputs.It also improves operator transparency and observability by parsing Cloud Run "operations" to provide direct, clickable links to the job in the GCP Console.
Finally, it resolves SSL/impersonation issues encountered when running the CLI tool in isolated environments (like
uvxon macOS) by addingpyopensslas a dependency.Key Changes
CLI (
datacommons-admin)ingest startnow readsproject_idandregionfrom Terraform outputs instead of falling back togoogle.auth.default()or hardcoded defaults.operationspaths returned by the Cloud Run API. This allows the CLI to output direct, clickable links to the job in the GCP Console even for long-running async operations.main.tfandterraform.tfvarstemplates to make the web service image configurable (defaulting tostable) and to exposeproject_id,cdc_service_name, andregionoutputs.Terraform
project_id,cdc_service_name, andregionas outputs in the modules to support the CLI.Dependencies
datacommons-admin: Addedpyopenssl>=24.0.0to resolve SSL validation warnings in isolated environments.How to Test
You can test all these commands directly from this branch without installing them locally by using
uvx.1. Generate Scaffolding:
uvx --from "git+https://github.com/clincoln8/datacommons.git@uvx#subdirectory=packages/datacommons-cli" datacommons admin init2. Initialize Database:
uvx --from "git+https://github.com/clincoln8/datacommons.git@uvx#subdirectory=packages/datacommons-cli" datacommons admin init-db3. Start Ingestion Job:
uvx --from "git+https://github.com/clincoln8/datacommons.git@uvx#subdirectory=packages/datacommons-cli" datacommons admin ingest start