Skip to content

Discuss: add an artifact-type suffix (Talk/Poster/Demo) to standardized filenames #1404

Description

@jonfroehlich

Idea

Add an artifact-type suffix to the standardized filename scheme so a downloaded file is self-describing out of context. Today the scheme (from get_filename_without_ext_for_artifact in website/utils/fileutils.py) is:

LastName_TitleInTitleCase_VenueYear.ext
e.g. Froehlich_MakingInTheHCIL_CSDepartmentExternalReview2014.pdf

A talk, paper, and poster can all produce the same basename. That's not a storage problem (each type is saved in its own subdir — talks/, posters/, publications/ — and serve_pdf only serves /media/publications/ and queries only Publication, so they never collide). But once a file is downloaded, Froehlich_MakingInTheHCIL_CHI2024.pdf in someone's Downloads folder doesn't reveal whether it's the talk, the paper, or the poster.

Proposed shape (one option):

LastName_TitleInTitleCase_VenueYear_Talk.pdf
LastName_TitleInTitleCase_VenueYear_Poster.pdf

get_filename_without_ext_for_artifact already takes an optional suffix param (LastName_Title_suffix_VenueYear) — though we'd likely want the type at the end, so the exact placement is part of the discussion.

Why this is a Discuss (not just a do)

It changes generate_filename, which is the scheme used everywhereArtifact.save(), the backfill_original_filenames comparison, and the restandardize_artifact_filenames idempotency gate (#1401). Consequences:

  • Every artifact file gets re-renamed once (more churn, more chances for -<timestamp> collisions).
  • Every external link to a current standardized PDF shifts. Publication links are covered by the Store original uploaded filename and show it (admin-only) for talks/posters/publications #1391 original_pdf_filename + serve_pdf fallback; talk/poster links have no such fallback.
  • Need to decide: types to cover (Talk/Poster/Demo/Video?), placement (end vs the existing mid suffix slot), label text (Talk vs InvitedTalk vs talk_type-derived), and whether to apply retroactively or only to new uploads.

Sequencing

Best done after #1401 lands and prod is standardized, so it's a single well-understood scheme change with the #1391 provenance safety net already in place. Until then, leave the scheme as-is.

Decisions to make

  1. Worth doing at all? (cosmetic/provenance benefit vs. re-rename churn + link risk)
  2. Which types, and what label text? (static Talk/Poster vs. derived from talk_type)
  3. Suffix placement (trailing _Talk vs. the existing mid suffix slot).
  4. Retroactive (re-rename everything) or forward-only (new uploads only)?

Spun out of discussion on #1401.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions