Skip to content

fix: FilterError when comparing string metadata dates with datetime objects#11700

Open
milljer wants to merge 1 commit into
deepset-ai:mainfrom
milljer:fix-filter-date-comparison
Open

fix: FilterError when comparing string metadata dates with datetime objects#11700
milljer wants to merge 1 commit into
deepset-ai:mainfrom
milljer:fix-filter-date-comparison

Conversation

@milljer

@milljer milljer commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

document_matches_filter raised a FilterError when comparing a string date in Document.meta (e.g. "2024-01-01") against a datetime filter value using ordering operators (>, >=, <, <=), even though storing dates as ISO strings in metadata is a common, valid pattern.

Root cause: _prepare_ordering_comparison in haystack/utils/filters.py unconditionally called _parse_date on both operands whenever either one was a string, but _parse_date only accepts str and raised when given an already-parsed datetime.

Fix: only call _parse_date on an operand that isn't already a datetime.

How did you test it?

  • Added a regression test reproducing the issue (> operator with datetime filter value and ISO 8601 string Document value) in test/utils/test_filters.py.
  • Ran hatch run test:unit -k test_filters (91 passed, including the 4 existing tests that cover non-date string/int comparisons still raising FilterError as before).
  • Ran hatch run test:types and hatch run fmt, both clean.

Notes for the reviewer

The fix only skips _parse_date when an operand is already a datetime; it still attempts _parse_date on non-string, non-datetime operands (e.g. int) so existing behavior — raising FilterError for nonsensical string/non-date comparisons — is preserved.

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:.
  • I have documented my code.
  • I have added a release note file.
  • I have run pre-commit hooks and fixed any issue.

…to datetime objects

_prepare_ordering_comparison unconditionally parsed both operands as
date strings, so a datetime filter value would fail to parse and
raise FilterError. Only parse operands that are actually strings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@milljer milljer requested a review from a team as a code owner June 20, 2026 02:45
@milljer milljer requested review from sjrl and removed request for a team June 20, 2026 02:45
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

@milljer is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/utils
  filters.py
Project Total  

This report was generated by python-coverage-comment-action

@zeweihan

Copy link
Copy Markdown

Good fix — the _parse_date-only-on-strings guard is the right call, and the regression test covering > with a datetime filter value against an ISO-string document value locks it in.

One thing worth flagging from a regulated-domain (legal/compliance) angle: this FilterError is actually the good failure mode. The crash tells you something is misconfigured and is debuggable. The dangerous sibling is the pure-string case where ordering comparison silently succeeds but gives the wrong answer.

Specifically, ISO-8601 strings ("2024-01-01") compare correctly under lexicographic ordering only by happy coincidence of the YYYY-MM-DD format. But document metadata in the wild frequently carries:

  • "01/15/2024" (US locale) vs "2024-01-15" — lexicographic ordering is wrong, no error raised
  • "Jan 1, 2024" — parsed by some _parse_date implementations, silently misordered by others
  • null / empty string for "date not applicable" — ordering against a real date gives nonsense

For legal/regulatory corpora this carries compliance weight, not just convenience weight. The filter effective_date < today AND expiry_date > today is the exact predicate for "clauses currently in force", and a date filter that silently returns the wrong set means you either retrieve superseded clauses (the amended v2 is sitting next to the executed v1 and the filter can't tell which is current) or miss active ones — both are the kind of "plausible-but-wrong" answer that ships to a partner or regulator with full confidence and no signal that anything degraded.

A small extension that would harden this: when _prepare_ordering_comparison ends up comparing two strings that are not ISO-8601 parseable, raise (or at least warn) rather than falling through to lexicographic comparison. The existing FilterError message is already a good template — it names the operator and points at ISO formatting — so reusing it for the silent-mismatch case would convert a latent corruption into an informed one without changing the happy path. The PR here fixes the visible half; that would close the invisible half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: FilterError when comparing string metadata dates with datetime objects using ordering operators

2 participants