Skip to content

Join Fan-out issue: Publicly shared dataset/workflows rows duplicated in the hub. (Has RCA and suggested fix) #5957

Description

@Mrudhulraj

What happened?

Any dataset/workflow shared publicly and shared with explicit results will result in duplicate rows on hub page.

How to reproduce?

Consider we are sharing dataset publicly and to a user named user_A:

  1. Create a dataset/workflow from owner account and share it publicly and with user_A explicitly with read/write permissions.
    a. Publicly shared users have only READ privilege by default. -> No issue here
    b. The user_A to whom the dataset/workflow is shared has READ/WRITE privilege. -> Issue observed

  2. Log in to user_A and try to view the dataset/ workflow in the hub page.

Image
  1. user_A will see the same entity ID dataset/workflow being listed twice.

RCA:

  1. When listing entities in the hub, we are filtering with the following operations:
    a. User is having access to the corresponding entity
    b. User is an owner of the entity
    c. Apply OR operator with entity includePublic flag.
    d. This results in duplicate entries if user has access and the includePublic flag is enabled.
Code: amber\src\main\scala\org\apache\texera\web\resource\dashboard\DatasetSearchQueryBuilder.scala

override protected def constructFromClause(
      uid: Integer,
      params: DashboardResource.SearchQueryParams,
      includePublic: Boolean = false
  ): TableLike[_] = {
    val baseJoin = DATASET
      .leftJoin(DATASET_USER_ACCESS)
      .on(DATASET_USER_ACCESS.DID.eq(DATASET.DID))
      .leftJoin(USER)
      .on(USER.UID.eq(DATASET.OWNER_UID))

    // Default condition starts as true, ensuring all datasets are selected initially.
    var condition: Condition = DSL.trueCondition()

    if (uid == null) {
      // If `uid` is null, the user is not logged in or performing a public search
      // We only select datasets marked as public
      condition = DATASET.IS_PUBLIC.eq(true)
    } else {
      // When `uid` is present, we add a condition to only include datasets with direct user access.
      val userAccessCondition = DATASET_USER_ACCESS.UID.eq(uid)

      if (includePublic) {
        // If `includePublic` is true, we extend visibility to public datasets as well.
        condition = userAccessCondition.or(DATASET.IS_PUBLIC.eq(true))
      } else {
        condition = userAccessCondition
      }
    }
    baseJoin.where(condition)
  }

This is in the fanout issue on join.

Will suggest the fix in the comments below for review.

Scenarios to be addressed:

Version/Branch

1.3.0-incubating-SNAPSHOT (main)

Commit Hash (Optional)

No response

What browsers are you seeing the problem on?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions