Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/docs/configuration/environment-variables.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ The following environment variables allow you to configure your Sourcebot deploy
| `SOURCEBOT_TELEMETRY_DISABLED` | `false` | <p>Enables/disables telemetry collection in Sourcebot. See [this doc](/docs/misc/telemetry) for more info.</p> |
| `DEFAULT_MAX_MATCH_COUNT` | `10000` | <p>The default maximum number of search results to return when using search in the web app.</p> |
| `ALWAYS_INDEX_FILE_PATTERNS` | - | <p>A comma separated list of glob patterns matching file paths that should always be indexed, regardless of size or number of trigrams.</p> |
| `SOURCEBOT_CHAT_ATTACHMENT_MAX_IMAGE_BYTES` | `10485760` (10 MiB) | <p>Maximum size in bytes of a single image attachment uploaded to Ask Sourcebot. Enforced server-side at upload time.</p> |
| `SOURCEBOT_CHAT_ATTACHMENT_ORPHAN_TTL_HOURS` | `24` | <p>How long in hours an uploaded-but-unsent attachment is retained before being deleted by the orphan sweep. Set to `0` to disable the sweep.</p> |
| `NODE_USE_ENV_PROXY` | `0` | <p>Enables Node.js to automatically use `HTTP_PROXY`, `HTTPS_PROXY`, and `NO_PROXY` environment variables for network requests. Set to `1` to enable or `0` to disable. See [this doc](https://nodejs.org/en/learn/http/enterprise-network-configuration) for more info.</p> |
| `HTTP_PROXY` | - | <p>HTTP proxy URL for routing non-SSL requests through a proxy server (e.g., `http://proxy.company.com:8080`). Requires `NODE_USE_ENV_PROXY=1`.</p> |
| `HTTPS_PROXY` | - | <p>HTTPS proxy URL for routing SSL requests through a proxy server (e.g., `http://proxy.company.com:8080`). Requires `NODE_USE_ENV_PROXY=1`.</p> |
Expand Down
92 changes: 92 additions & 0 deletions packages/backend/src/attachmentPruner.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import { AttachmentStatus, PrismaClient } from "@sourcebot/db";
import { createLogger, env } from "@sourcebot/shared";
import { unlink } from "fs/promises";
import path from "path";
import { setIntervalAsync } from "./utils.js";

const BATCH_SIZE = 1_000;
const ONE_HOUR_MS = 60 * 60 * 1000;

const logger = createLogger('attachment-pruner');

/**
* Periodically deletes PENDING (uploaded-but-never-linked) attachment blobs
* older than the configured TTL, along with their stored bytes. These are the
* orphans produced when a user selects a file in the chat box but never sends
* the message. COMMITTED attachments are never touched here; their byte
* lifecycle is handled by the chat-delete sweep in the web app.
*
* @note Mirrors the local-FS layout used by `LocalFsStorageBackend` in the web
* package (`DATA_CACHE_DIR/attachments/<storageKey>`). When an S3 driver is
* added (Followup B), this deletion path must be generalized accordingly.
*/
export class AttachmentPruner {
private interval?: NodeJS.Timeout;
private readonly attachmentsDir = path.join(env.DATA_CACHE_DIR, 'attachments');

constructor(private db: PrismaClient) {}

startScheduler() {
const ttlHours = env.SOURCEBOT_CHAT_ATTACHMENT_ORPHAN_TTL_HOURS;
if (ttlHours <= 0) {
logger.info('SOURCEBOT_CHAT_ATTACHMENT_ORPHAN_TTL_HOURS is 0, attachment orphan pruning is disabled.');
return;
}

logger.debug(`Attachment pruner started. Pruning PENDING attachments older than ${ttlHours} hours.`);

// Run immediately on startup, then every hour.
this.pruneOrphanedAttachments();
this.interval = setIntervalAsync(() => this.pruneOrphanedAttachments(), ONE_HOUR_MS);
Comment on lines +38 to +40

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle the startup prune promise.

Line 39 kicks off pruneOrphanedAttachments() without awaiting or catching it. Any DB/filesystem failure there becomes an unhandled rejection, and this backend already exits on unhandledRejection, so the first prune can take the worker down during startup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/attachmentPruner.ts` around lines 38 - 40, The startup
call to pruneOrphanedAttachments() is firing without any await or error
handling, which can surface as an unhandled rejection and crash the worker
during initialization. Update the startup path in attachmentPruner so the first
prune is either awaited from an async initialization flow or wrapped with
explicit catch/log handling, while keeping the recurring setIntervalAsync
scheduling intact.

}

async dispose() {
if (this.interval) {
clearInterval(this.interval);
this.interval = undefined;
}
}

private async pruneOrphanedAttachments() {
const cutoff = new Date(Date.now() - env.SOURCEBOT_CHAT_ATTACHMENT_ORPHAN_TTL_HOURS * ONE_HOUR_MS);
let totalDeleted = 0;

while (true) {
const batch = await this.db.attachment.findMany({
where: {
status: AttachmentStatus.PENDING,
createdAt: { lt: cutoff },
},
select: { id: true, storageKey: true },
take: BATCH_SIZE,
});

if (batch.length === 0) {
break;
}

await Promise.all(batch.map(async (attachment) => {
try {
await unlink(path.join(this.attachmentsDir, attachment.storageKey));
} catch (error) {
if ((error as NodeJS.ErrnoException)?.code !== 'ENOENT') {
logger.warn(`Failed to delete bytes for orphaned attachment ${attachment.id}: ${error}`);
}
}
}));

const result = await this.db.attachment.deleteMany({
where: { id: { in: batch.map((attachment) => attachment.id) } },
});
Comment on lines +55 to +80

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Re-check that a batch is still orphaned before deleting bytes.

This loop selects rows once as old PENDING attachments, but then Line 70 deletes bytes before any second status/link check and Line 78 deletes by id alone. If a user sends the message while this batch is in flight, you can unlink a now-committed attachment; even if the row deletion later fails on a new FK, the committed image is already broken.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/attachmentPruner.ts` around lines 55 - 80, The orphan
cleanup in attachmentPruner’s batch loop deletes files based only on the initial
findMany result, so a PENDING attachment can be unlinked even if it becomes
linked or non-orphaned before deletion. Re-check each attachment’s current state
in the same batch before calling unlink, ideally by verifying the row still
matches the orphan criteria in this method before file removal and before
deleteMany. Use the existing attachmentPruner loop, the db.attachment queries,
and the unlink call to keep only still-orphaned attachments eligible for byte
deletion.

totalDeleted += result.count;

if (batch.length < BATCH_SIZE) {
break;
}
}

if (totalDeleted > 0) {
logger.debug(`Pruned ${totalDeleted} orphaned PENDING attachment(s).`);
}
}
}
4 changes: 4 additions & 0 deletions packages/backend/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import 'express-async-errors';
import { existsSync } from 'fs';
import { mkdir } from 'fs/promises';
import { Api } from "./api.js";
import { AttachmentPruner } from "./attachmentPruner.js";
import { ConfigManager } from "./configManager.js";
import { ConnectionManager } from './connectionManager.js';
import { INDEX_CACHE_DIR, REPOS_CACHE_DIR, SHUTDOWN_SIGNALS } from './constants.js';
Expand Down Expand Up @@ -55,10 +56,12 @@ const accountPermissionSyncer = new AccountPermissionSyncer(prisma, settings, re
const repoIndexManager = new RepoIndexManager(prisma, settings, redis, promClient);
const configManager = new ConfigManager(prisma, connectionManager, env.CONFIG_PATH);
const auditLogPruner = new AuditLogPruner(prisma);
const attachmentPruner = new AttachmentPruner(prisma);

connectionManager.startScheduler();
await repoIndexManager.startScheduler();
auditLogPruner.startScheduler();
attachmentPruner.startScheduler();

if (env.PERMISSION_SYNC_ENABLED === 'true' && !await hasEntitlement('permission-syncing')) {
logger.warn('Permission syncing is not supported in current plan. Please contact team@sourcebot.dev for assistance.');
Expand Down Expand Up @@ -99,6 +102,7 @@ const listenToShutdownSignals = () => {
await repoPermissionSyncer.dispose()
await accountPermissionSyncer.dispose()
await auditLogPruner.dispose()
await attachmentPruner.dispose()
await configManager.dispose()

await prisma.$disconnect();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
-- CreateEnum
CREATE TYPE "AttachmentStatus" AS ENUM ('PENDING', 'COMMITTED');

-- CreateTable
CREATE TABLE "Attachment" (
"id" TEXT NOT NULL,
"orgId" INTEGER NOT NULL,
"storageKey" TEXT NOT NULL,
"filename" TEXT NOT NULL,
"mediaType" TEXT NOT NULL,
"sizeBytes" INTEGER NOT NULL,
"checksum" TEXT NOT NULL,
"uploadedById" TEXT NOT NULL,
"status" "AttachmentStatus" NOT NULL DEFAULT 'PENDING',
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,

CONSTRAINT "Attachment_pkey" PRIMARY KEY ("id")
);

-- CreateTable
CREATE TABLE "ChatAttachment" (
"id" TEXT NOT NULL,
"chatId" TEXT NOT NULL,
"attachmentId" TEXT NOT NULL,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,

CONSTRAINT "ChatAttachment_pkey" PRIMARY KEY ("id")
);

-- CreateIndex
CREATE INDEX "Attachment_status_createdAt_idx" ON "Attachment"("status", "createdAt");

-- CreateIndex
CREATE INDEX "ChatAttachment_attachmentId_idx" ON "ChatAttachment"("attachmentId");

-- CreateIndex
CREATE UNIQUE INDEX "ChatAttachment_chatId_attachmentId_key" ON "ChatAttachment"("chatId", "attachmentId");

-- AddForeignKey
ALTER TABLE "Attachment" ADD CONSTRAINT "Attachment_orgId_fkey" FOREIGN KEY ("orgId") REFERENCES "Org"("id") ON DELETE CASCADE ON UPDATE CASCADE;

-- AddForeignKey
ALTER TABLE "Attachment" ADD CONSTRAINT "Attachment_uploadedById_fkey" FOREIGN KEY ("uploadedById") REFERENCES "User"("id") ON DELETE CASCADE ON UPDATE CASCADE;
Comment on lines +42 to +43

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Keep committed attachments when the uploader is deleted.

Line 43's ON DELETE CASCADE removes the Attachment row, and Line 49 then cascades that into ChatAttachment. Deleting a user will therefore strip images out of every surviving chat they uploaded to. This FK should preserve committed attachment rows instead of deleting them.

Suggested direction
-    "uploadedById" TEXT NOT NULL,
+    "uploadedById" TEXT,

-ALTER TABLE "Attachment" ADD CONSTRAINT "Attachment_uploadedById_fkey" FOREIGN KEY ("uploadedById") REFERENCES "User"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+ALTER TABLE "Attachment" ADD CONSTRAINT "Attachment_uploadedById_fkey" FOREIGN KEY ("uploadedById") REFERENCES "User"("id") ON DELETE SET NULL ON UPDATE CASCADE;

Update the Prisma model alongside the migration so historical chats can still resolve their attachments after uploader deletion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/db/prisma/migrations/20260627000032_add_chat_attachments/migration.sql`
around lines 42 - 43, The Attachment foreign key on uploadedById currently
cascades deletes from User, which causes committed attachments and their
ChatAttachment links to disappear when an uploader is removed. Update the
migration and the corresponding Prisma model for Attachment/uploadedById so
deletion of a User does not delete Attachment rows; use a non-cascading delete
behavior that preserves historical attachments for existing chats.


-- AddForeignKey
ALTER TABLE "ChatAttachment" ADD CONSTRAINT "ChatAttachment_chatId_fkey" FOREIGN KEY ("chatId") REFERENCES "Chat"("id") ON DELETE CASCADE ON UPDATE CASCADE;

-- AddForeignKey
ALTER TABLE "ChatAttachment" ADD CONSTRAINT "ChatAttachment_attachmentId_fkey" FOREIGN KEY ("attachmentId") REFERENCES "Attachment"("id") ON DELETE CASCADE ON UPDATE CASCADE;
71 changes: 71 additions & 0 deletions packages/db/prisma/schema.prisma
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ enum ChatVisibility {
PUBLIC
}

/// Lifecycle status of an uploaded attachment blob.
/// PENDING: uploaded but not yet linked to a chat (orphan until a message
/// referencing it is sent). COMMITTED: linked to at least one chat.
enum AttachmentStatus {
PENDING
COMMITTED
}

/// @note: The @map annotation is required to maintain backwards compatibility
/// with the existing database.
/// @note: In the generated client, these mapped values will be in pascalCase.
Expand Down Expand Up @@ -272,6 +280,7 @@ model Org {
connections Connection[]
repos Repo[]
apiKeys ApiKey[]
attachments Attachment[]
isOnboarded Boolean @default(false)
imageUrl String?

Expand Down Expand Up @@ -456,6 +465,7 @@ model User {
chats Chat[]
sharedChats ChatAccess[]
repoVisits RepoVisit[]
uploadedAttachments Attachment[]

oauthTokens OAuthToken[]
oauthAuthCodes OAuthAuthorizationCode[]
Expand Down Expand Up @@ -608,6 +618,67 @@ model Chat {
messages Json // This is a JSON array of `Message` types from @ai-sdk/ui-utils.

sharedWith ChatAccess[]

attachments ChatAttachment[]
}

/// A user-uploaded binary attachment blob (e.g. an image). The bytes live in
/// the configured StorageBackend (keyed by `storageKey`), never in the DB.
/// Attachments are NOT chat-bound: they are uploaded before any chat
/// association exists, and linked to chats via `ChatAttachment`. Permissions
/// are derived entirely from the linked chat(s); there are no independent ACLs.
model Attachment {
id String @id @default(cuid())

org Org @relation(fields: [orgId], references: [id], onDelete: Cascade)
orgId Int

/// Opaque key the StorageBackend uses to locate the bytes.
storageKey String

/// Original (sanitized) filename supplied by the uploader.
filename String

/// Final media type of the stored bytes (validated by magic bytes at upload).
mediaType String

/// Size of the stored bytes.
sizeBytes Int

/// Hex SHA-256 of the stored bytes (integrity / debugging; not used for dedup).
checksum String

/// The user who uploaded this blob. Uploads require authentication, so this
/// is always set (anonymous users cannot upload binary attachments).
uploadedBy User @relation(fields: [uploadedById], references: [id], onDelete: Cascade)
uploadedById String

status AttachmentStatus @default(PENDING)

createdAt DateTime @default(now())

chats ChatAttachment[]

@@index([status, createdAt])
}

/// Join table linking an `Attachment` blob to a `Chat`. This is the linker
/// that makes chat duplication metadata-only (no byte copy) and keeps
/// attachment access purely chat-derived. Deleting a chat cascades these rows;
/// a separate sweep deletes `Attachment`s left with zero links (and their bytes).
model ChatAttachment {
id String @id @default(cuid())

chat Chat @relation(fields: [chatId], references: [id], onDelete: Cascade)
chatId String

attachment Attachment @relation(fields: [attachmentId], references: [id], onDelete: Cascade)
attachmentId String

createdAt DateTime @default(now())

@@unique([chatId, attachmentId])
@@index([attachmentId])
}

/// Represents a user's access to a chat that has been shared with them.
Expand Down
17 changes: 17 additions & 0 deletions packages/shared/src/env.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,23 @@ const options = {
SOURCEBOT_CHAT_PROMPT_CACHE_BREAK_DETECTION_ENABLED: booleanSchema.default('false'),
SOURCEBOT_MCP_TOOL_CALL_TIMEOUT_MS: numberSchema.int().positive().max(maxTimerDelayMs).default(60000),

/**
* Maximum size (in bytes) of a single image attachment uploaded to the
* Ask chat. Enforced server-side at upload time. Distinct from the
* inline-text cap (which lives as a web-package constant).
* @default 10 MiB
*/
SOURCEBOT_CHAT_ATTACHMENT_MAX_IMAGE_BYTES: numberSchema.int().positive().default(10 * 1024 * 1024),

/**
* How long (in hours) an uploaded-but-unlinked (PENDING) attachment
* blob is retained before the orphan sweep deletes it and its bytes.
* Covers "select a file then never send" abandonment. Set to 0 to
* disable the orphan sweep entirely.
* @default 24 hours
*/
SOURCEBOT_CHAT_ATTACHMENT_ORPHAN_TTL_HOURS: numberSchema.int().nonnegative().default(24),

DEBUG_WRITE_CHAT_MESSAGES_TO_FILE: booleanSchema.default('false'),
DEBUG_ENABLE_REACT_SCAN: booleanSchema.default('false'),
DEBUG_ENABLE_REACT_GRAB: booleanSchema.default('false'),
Expand Down
Loading