Skip to content

fix(trashbin): avoid full filecache scan when restoring versions (#31682)#41641

Open
DeepDiver1975 wants to merge 2 commits into
masterfrom
fix/issue-31682-trash-version-scan
Open

fix(trashbin): avoid full filecache scan when restoring versions (#31682)#41641
DeepDiver1975 wants to merge 2 commits into
masterfrom
fix/issue-31682-trash-version-scan

Conversation

@DeepDiver1975

Copy link
Copy Markdown
Member

Summary

Trashbin::getVersionsFromTrash() looks up a deleted file's versions with View::searchRaw('<name>.v%[.d<timestamp>]'), which reaches Cache::search() as:

WHERE `storage` = ? AND `name` ILIKE ?

The pattern is a leading literal followed by a trailing %, which is already non-sargable; on MySQL/MariaDB the ILIKECOLLATE ..._general_ci LIKE rewrite (AdapterMySQL) additionally defeats any index on name. As a result every version restore — and every ExpireTrash / ExpireVersions cron run that restores versions — performs a full table scan of oc_filecache. On installations with a large filecache this is a heavy, repeated DB load spike. (Fixes #31682.)

Change

The version files for a given deleted file all live in a single, known directory under files_trashbin/versions (the folder the file was in, or the versions root). This PR replaces the whole-cache name search with an index-backed directory listing of that one directory (View::getDirectoryContent(), which uses the existing (parent, name) index) and filters the <name>.v<version>[.d<timestamp>] entries in PHP.

The directory to list is passed in from both call sites, so version files of files deleted from inside a folder are still found (the previous whole-cache search matched those by name regardless of location — the directory listing must look in the right sub-folder, not just the versions root).

Scope

  • No DB schema / db_structure.xml change.
  • No change to the generic Cache::search() / AdapterMySQL ILIKE behavior — the fix is contained to the trashbin version-restore hot path.

Behavior / compatibility

  • Returns the same versions, in the same shape, as before.
  • Literal PHP matching is slightly stricter than the prior SQL LIKE: filenames containing _ or % are no longer treated as wildcards (a latent edge-case bug in the old code), so if anything the result set is more correct, never broader.

Tests

Adds StorageTest::testGetVersionsFromTrash() covering:

  • the timestamped lookup in the versions root,
  • the non-timestamped lookup inside a sub-folder (the regression guard for nested deleted files),
  • and that version files belonging to other files / other deletion timestamps are correctly ignored.

Note: owncloud/core is in maintenance mode; this targets installations still on classic ownCloud 10.x.


🤖 This PR was prepared by the Claude Code review agent from the analysis of #31682. Please review carefully before merging.

`Trashbin::getVersionsFromTrash()` looked up a deleted file's versions with
`View::searchRaw('<name>.v%[.d<timestamp>]')`, which reaches
`Cache::search()` as `WHERE storage = ? AND name ILIKE ?`. The leading
literal + trailing `%` is non-sargable, and on MySQL the `ILIKE`->`COLLATE`
rewrite defeats any `name` index entirely, so every version restore — and
every `ExpireTrash`/`ExpireVersions` cron run that restores versions — full
scans `oc_filecache`. On large installations this is a heavy, repeated load
spike (issue #31682).

The version files for a given deleted file all live in a single known
directory under `files_trashbin/versions` (the folder the file was in, or
the versions root). Replace the whole-cache name search with an index-backed
`getDirectoryContent()` of that one directory (uses the existing
`(parent, name)` index) and filter the `<name>.v<version>[.d<timestamp>]`
entries in PHP. The directory to list is now passed in from both call sites
so version files of files deleted from inside a folder are still found.

Behavior is preserved: the same versions are returned, in the same shape.
Literal PHP matching is also slightly stricter than the previous SQL `LIKE`
(filenames containing `_`/`%` are no longer treated as wildcards).

Adds a regression test covering both the timestamped root lookup and the
non-timestamped sub-folder lookup, and asserting that version files of other
files / other timestamps are ignored.

Fixes #31682

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Thomas Müller <1005065+DeepDiver1975@users.noreply.github.com>
@update-docs

update-docs Bot commented Jun 19, 2026

Copy link
Copy Markdown

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@DeepDiver1975 DeepDiver1975 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Automated code review by Claude Code review agent

Overview

This PR fixes a real and well-diagnosed performance bug (#31682): Trashbin::getVersionsFromTrash() ran a non-sargable name ILIKE '<file>.v%[.d<ts>]' search that, combined with the MySQL ILIKECOLLATE rewrite, full-scanned oc_filecache on every version restore / trash+version cron run. Replacing it with an index-backed getDirectoryContent() of the single known versions directory + PHP-side filtering is the right, appropriately-scoped approach (no schema change, no change to the generic Cache::search()).

The core fix looks correct, including the subtle part: passing $dir from both call sites so version files of files deleted from inside a sub-folder are still found (the old whole-cache search matched those by name regardless of location). 👍

🔴 Blocking — the new test will fail CI

In testGetVersionsFromTrash(), the first assertion is:

$versions = $method->invoke(null, 'test.txt', $timestamp, $this->user);
\sort($versions);
$this->assertEquals(['1', '10', '2'], $versions);

sort() defaults to SORT_REGULAR, which sorts the numeric strings '1','2','10' numerically['1', '2', '10']. But the test asserts ['1', '10', '2']. assertEquals compares sequential arrays element-by-element (no canonicalization by default), so this fails:

mismatch at [1]: expected '10', got '2'
mismatch at [2]: expected '2',  got '10'

Fix — make the expectation match numeric sort order:

$this->assertEquals(['1', '2', '10'], $versions);

(The second assertion, ['5', '6'], is fine.) An even more robust option is sort($versions, SORT_STRING) paired with a string-sorted expectation, or assertEqualsCanonicalizing() — but the minimal correct change is just reordering the expected array. Since the suite is @group DB it can't be run in the agent's sandbox, which is exactly why this slipped through; worth running locally/CI before merge.

Minor / non-blocking

  • $prefix matching is a substring-prefix, not a segment boundary. \strpos($name, $filename . '.v') !== 0 correctly anchors at the start, and the '.backup.v9' ignore-case in the test confirms a longer name isn't matched. Good. One residual edge: a sibling file literally named <filename>.vNN... where <filename> is a prefix of another real filename is already handled because you match the full $filename + .v. No action needed — just noting it was checked.
  • Comment/style: matches surrounding code (tabs, \-prefixed globals). Good.
  • Behavior note in the PR description is accurate: literal PHP matching is stricter than SQL LIKE for filenames containing _/%; that's a latent correctness improvement, not a regression.

Test coverage

Good instinct adding both the timestamped-root and non-timestamped-subfolder cases plus the "ignore other files/timestamps" entries — that subfolder case is precisely the regression risk of this change. Once the assertion above is corrected, coverage is solid. Consider also asserting the empty-result case (lookup for a filename with no versions) since getDirectoryContent on a non-existent dir returns [].

Verdict

Approach and core logic are correct; one blocking test bug (assertEquals ordering) must be fixed before this can pass CI. Everything else is minor.

The first assertion in testGetVersionsFromTrash() expected
['1', '10', '2'], but sort() uses SORT_REGULAR and sorts the numeric
version strings numerically -> ['1', '2', '10'], so the test failed.
Correct the expected order and add a case asserting an empty result for
a file that has no stored versions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Thomas Müller <1005065+DeepDiver1975@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sudden DB load peak caused by (apparent) old files cleanup.

1 participant