Skip to content

fix(hnsw): de-duplicate dead-node UIDs on vector delete#9756

Open
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:sp/fix-hnsw-deadnode-dup
Open

fix(hnsw): de-duplicate dead-node UIDs on vector delete#9756
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:sp/fix-hnsw-deadnode-dup

Conversation

@shaunpatterson

Copy link
Copy Markdown
Contributor

Problem

When a vfloat (vector) value is deleted, addIndexMutations records the uid in the HNSW dead-node list (posting/index.go). That list is a single postingDataKey(<pred>__vector_dead, 1) holding a JSON array — which is read, appended to, and re-marshalled in full on every delete:

deadNodes, err = hnsw.ParseEdges(string(deadData.Value.([]byte)))
...
deadNodes = append(deadNodes, uid)   // unconditional
deadNodesBytes, _ := json.Marshal(deadNodes)
pl.addMutation(ctx, txn, edge)

The uid is appended unconditionally, so a uid already recorded as dead — delete/reinsert churn, or a Raft re-apply of the same mutation — is appended again. That grows the posting with duplicates and re-marshals the whole blob for no effect. removeDeadNodes de-dupes on read (it builds a set), so the duplicates were functionally invisible but still bloated the posting and its parse cost.

Fix

Skip the rewrite when the uid is already present, via a small addDeadNode helper that reports whether the list changed:

func addDeadNode(deadNodes []uint64, uid uint64) ([]uint64, bool) {
	if slices.Contains(deadNodes, uid) {
		return deadNodes, false
	}
	return append(deadNodes, uid), true
}

Behaviour is unchanged for the first delete of a uid; repeated deletes become a no-op instead of appending a duplicate and rewriting the posting.

Scope

This intentionally does not address the deeper design issue that the dead list grows unbounded with distinct deletions and is never garbage-collected — that needs the index.Remove() path the existing // TODO look into better alternatives calls for, and is left as-is.

Test

TestAddDeadNode covers first-insert, distinct-append, and already-present (no-op) cases.

  • go test ./posting/ — pass
  • go vet / gofmt — clean (no new findings)
  • golangci-lint (repo config) — no findings in the changed code

🤖 Generated with Claude Code

When a vfloat (vector) value is deleted, addIndexMutations records the uid
in the HNSW dead-node list — a single posting (DataKey(<pred>__vector_dead, 1))
holding a JSON array that is read, appended to, and re-marshalled in full on
every delete. The uid was appended unconditionally, so a uid that is already
recorded as dead (delete/reinsert churn, or a Raft re-apply of the same
mutation) was appended again, growing the posting with duplicates and
re-marshalling the whole blob for no effect. removeDeadNodes de-dupes on read,
so the duplicates were invisible but still bloated the posting and its parse.

Skip the rewrite when the uid is already present (new addDeadNode helper).
Behaviour is unchanged for the first delete of a uid; repeated deletes become
a no-op instead of appending a duplicate.

This does not address the deeper design issue that the dead list grows
unbounded with distinct deletions and is never garbage-collected — that needs
the index.Remove() path the existing TODO calls for and is left as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shaunpatterson shaunpatterson requested a review from a team as a code owner June 19, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant