Skip to content

docs: restructure dataset schema page with introduction and guidance#2554

Open
jancurn wants to merge 17 commits into
masterfrom
claude/slack-session-oDBaR
Open

docs: restructure dataset schema page with introduction and guidance#2554
jancurn wants to merge 17 commits into
masterfrom
claude/slack-session-oDBaR

Conversation

@jancurn
Copy link
Copy Markdown
Member

@jancurn jancurn commented May 21, 2026

Summary

Restructures the dataset schema documentation to provide proper context before diving into details:

  • Adds introduction explaining what dataset schema is and its two components (fields and views)
  • Moves file structure section before examples so readers know where to put code
  • Reorganizes content into clear Fields and Views sections as parallel concepts
  • Adds guidance on why/when to use views (addressing user feedback)
  • Documents what views are NOT for (anti-patterns, export format misconceptions)
  • Adds multi-view example for different use cases (Marketing vs Pricing)
  • Consolidates reference tables at the end
  • Links to Google Maps Scraper as real-world example

The page now follows the same pattern as other actor definition pages (input_schema, output_schema).

Context

Based on feedback from Martin Sabo and Jaroslav Hejlek in #dev-docs - the documentation explained HOW to configure views but not WHY or WHEN to use them.

Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3

Test plan

  • Verify page renders correctly in preview
  • Check all code examples are valid JSON
  • Verify Google Maps Scraper link works
  • Confirm new headings appear in table of contents

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG

Adds a comprehensive "Why use views" section to dataset schema docs that
explains the purpose and benefits of views, when to use them, how to organize
views by use case, and what views are NOT for. Also includes a practical
multi-view example for an e-commerce scraper.

This addresses feedback that the documentation explained HOW to configure
views but not WHY or WHEN to use them.

Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@apify-service-account
Copy link
Copy Markdown
Contributor

apify-service-account commented May 21, 2026

✅ Preview for this PR (commit 140028ab) is ready at https://pr-2554.preview.docs.apify.com (see action run).

claude added 2 commits May 21, 2026 11:12
- Shorten the why/when content significantly
- Add link to Google Maps Scraper as real-world example
- Keep the anti-pattern note (useful guidance)
- Remove redundant explanations

https://claude.ai/code/session_018Upw3aA9syy5Jm84F1xp9f
Adds clarification that views only affect Console UI display, not how
data is exported to JSON, CSV, or other formats.

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@jancurn jancurn requested a review from TC-MO May 21, 2026 11:28
Reorganizes the page to provide better context before diving into details:

- Adds introduction explaining what dataset schema is and its two components
- Moves file structure section before examples
- Reorganizes into clear Fields and Views sections as parallel concepts
- Consolidates reference tables at the end
- Maintains all existing content but in a more logical flow

The page now follows the same pattern as other actor definition pages
(input_schema, output_schema) where concepts are introduced before examples.

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@jancurn jancurn changed the title docs: add guidance on why and when to use dataset views [WIP] docs: add guidance on why and when to use dataset views May 21, 2026
@jancurn jancurn changed the title [WIP] docs: add guidance on why and when to use dataset views docs: restructure dataset schema page with introduction and guidance May 21, 2026
}
}
}
"fields": { /* JSON Schema describing each item */ },
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of these comments, keep there the short examples

Addresses review feedback to show actual field/view examples instead of
comments in the schema components overview.

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
@jancurn jancurn changed the title docs: restructure dataset schema page with introduction and guidance [WIP] docs: restructure dataset schema page with introduction and guidance May 21, 2026
@jancurn jancurn changed the title [WIP] docs: restructure dataset schema page with introduction and guidance docs: restructure dataset schema page with introduction and guidance May 21, 2026
Copy link
Copy Markdown
Contributor

@TC-MO TC-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jan! Left some inline suggestions, mostly:

  • Bold reserved for UI elements (a few bullet lists use term as a label pattern)
  • "Output tab UI" > "Output tab" for consistency (the tab is in the UI)
  • A couple of gerund headings and one all-caps "NOT" to soften
  • Small tightening passes where prose restates what's just above
  • One technical fix: $schema in the example uses draft-07 but the reference table specifies Draft 2020-12, changed it for consistency's sake

One pattern that's worth applying consistently across the page (not just the bullets at LOC15-16): views is required, fields is optional, so required-first ordering
would apply to:

  • The example JSON at LOC20-38 (views before fields)
  • The major sections (move Views section above Fields section)
  • The reference table at LOC427-428 (views row before fields row)

Happy to make those changes myself if that is easier.

Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
}
```

The first view defined becomes the default tab.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cut this, it is already stated at LOC 213.

Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
Comment on lines +416 to +418
### Flatten in Actor code

Alternatively, flatten nested structures in your Actor code before calling `Actor.pushData()`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single sentence subsections are not the greatest practice. I would recommend removing H3 and folding into previous section.

Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
Comment thread sources/platform/actors/development/actor_definition/dataset_schema/index.md Outdated
jancurn and others added 11 commits May 31, 2026 23:32
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md

Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
- Remove redundant "UI" from "Output tab UI" throughout
- Change "two main components" to "two components"
- Remove sentence that rephrases the bullets
- Simplify field and AI agent descriptions
- Rename "Example with field metadata" to "Field metadata example"
- Update JSON Schema to 2020-12 draft for consistency
- Reword validation link to start with verb
- Remove bold from bullet points (reserved for UI elements)
- Change "What views are NOT for" to "What views are not for"
- Simplify Google Maps example sentence
- Fold single-sentence subsection into previous section
- Various prose cleanup for conciseness

https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants