Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/console/validate.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,7 @@ With `validate` command you can validate your tabular files (indivisual or the w
```bash script tabs=CLI
frictionless validate table.csv invalid.csv
```

The `--parallel` option enables multiprocessing for validation jobs that contain multiple
independent resources or tasks, such as Data Packages and Inquiries. It does not split validation
of a single file into multiple processes.
2 changes: 1 addition & 1 deletion docs/framework/inquiry.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Tasks in the Inquiry accept the same arguments written in camelCase as the corre
frictionless validate capital.inquiry-example.yaml
```

At first sight, it's no clear why such a construct exists but when your validation workflow gets complex, the Inquiry can provide a lot of flexibility and power. Last but not least, the Inquiry will use multiprocessing if there are more than 1 task provided.
At first sight, it's no clear why such a construct exists but when your validation workflow gets complex, the Inquiry can provide a lot of flexibility and power. If the `parallel` flag is provided, Inquiry validation can use multiprocessing to run independent tasks concurrently; it does not split validation of a single file/resource across multiple processes.

## Reference

Expand Down
9 changes: 8 additions & 1 deletion docs/guides/validating-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,11 @@ print(report)

As we can see, the result is in a similar format to what we have already seen, and shows errors as we expected: we have one invalid resource and one valid resource.

> Package validation can use multiprocessing if the `parallel` flag is provided. This runs
> independent resources in the package concurrently; it does not split validation of a single
> file/resource across multiple processes. Parallel execution is also disabled when foreign keys
> are used, because those checks can depend on multiple resources.

## Validating an Inquiry

> The Inquiry is an advanced concept mostly used by software integrators. For example, under the hood, Frictionless Framework uses inquiries to implement client-server validation within the built-in API. Please skip this section if this information feels unnecessary for you.
Expand Down Expand Up @@ -208,7 +213,9 @@ print(report)

At first sight, it might not be clear why such a construct exists, but when your validation workflow gets complex, the Inquiry can provide a lot of flexibility and power.

> The Inquiry will use multiprocessing if there is the `parallel` flag provided. It might speed up your validation dramatically especially on a 4+ cores processor.
> The Inquiry will use multiprocessing if there is the `parallel` flag provided. This runs
> independent inquiry tasks concurrently; it does not split validation of a single file/resource
> across multiple processes.

## Validation Report

Expand Down
2 changes: 1 addition & 1 deletion frictionless/console/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@

parallel = Option(
default=None,
help="Enable multiprocessing",
help="Enable multiprocessing for package/inquiry validation",
)

output_path = Option(
Expand Down
3 changes: 2 additions & 1 deletion frictionless/resource/resource.py
Original file line number Diff line number Diff line change
Expand Up @@ -611,7 +611,8 @@ def validate(
checklist: a Checklist object
name: limit validation to one resource (if applicable)
on_row: callbacke for every row
paraller: allow parallel validation (multiprocessing)
parallel: accepted for API compatibility; resource validation itself
is not split across multiple processes
limit_rows: limit amount of rows to this number
limit_errors: limit amount of errors to this number

Expand Down