Skip to content

Refactor: Use consistent URL representation for all storage paths (including file://) #1326

@dimitri-yatsenko

Description

@dimitri-yatsenko

Summary

Following review feedback on PR #1311, we should refactor the storage layer to use consistent URL representation for all data sources, including local files.

Current Behavior

  • Remote paths use URLs: s3://bucket/path, gs://bucket/path
  • Local paths use raw filesystem paths: /path/to/file
  • is_remote_url() function distinguishes between the two

Proposed Change

  1. Accept both formats from users: /path/to/file and file:///path/to/file
  2. Normalize to URLs internally: Convert all paths to URL format (file:// for local)
  3. Store URLs consistently in the database
  4. Leverage fsspec uniformity: fsspec already treats all backends (including local) uniformly via URLs

Benefits

  • Coherent internal representation
  • Simpler codebase - no special-casing for local vs remote
  • Better alignment with fsspec's design philosophy
  • Avoids potential bugs from inconsistent handling

Implementation Notes

  • Add file:// to REMOTE_PROTOCOLS (or rename to URL_PROTOCOLS)
  • Create helper to normalize user paths to URLs
  • Update StorageBackend to work with URLs consistently
  • Ensure backward compatibility for existing stored paths

References

Metadata

Metadata

Labels

enhancementIndicates new improvements

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions