Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions r/.Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ STYLE.md
^inst/__pycache__$
^bootstrap.R$
air.toml
AGENTS.md
209 changes: 209 additions & 0 deletions r/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# AI Agent Development Guidelines for the Arrow R Package

## Project Overview

The `arrow` R package provides an interface to the
[Apache Arrow](https://arrow.apache.org/) C++ library. It lives at `r/` inside
the `apache/arrow` monorepo, which also contains the C++ library (`cpp/`),
Python (`python/`), and other language implementations.

This means:

- The R package depends on the C++ library built from `cpp/`.
- Some bugs or features require changes in C++ (`cpp/`), not just R (`r/`).
Fix problems at the right layer.
- R functions often call C++ functions via bindings in `r/src/*.cpp`.

## AI Disclosure

Any pull request that used generative AI tools must disclose this. Include
a note in the PR description stating which AI tools were used.

## Code Generation — Do Not Edit Generated Files

The files `r/R/arrowExports.R` and `r/src/arrowExports.cpp` are
**auto-generated** by `r/data-raw/codegen.R`. Never edit them manually — your
changes will be silently overwritten.

To expose a C++ function to R:

1. Add the function to a `.cpp` file in `r/src/` with the `// [[arrow::export]]`
annotation above it.
2. Use the naming convention `ClassName__methodName` (double underscore), e.g.:

```cpp
// [[arrow::export]]
std::shared_ptr<arrow::DataType> Int8__initialize() {
return arrow::int8();
}
```

3. Run `r/data-raw/codegen.R` to regenerate the export files (this also runs
automatically during package configure when `ARROW_R_DEV=true`).
4. The R side can then call `Int8__initialize()` directly.

Other generated files that should not be edited by hand:

- `r/R/dplyr-funcs-doc.R` — generated documentation
- `r/man/*.Rd` — generated by roxygen2 (edit the roxygen comments in `r/R/*.R`
instead, then run `make doc`)

## Development Setup

### Loading and Building

```r
# Load the package for interactive development (from the r/ directory)
devtools::load_all()
```

### Documentation

After modifying roxygen2 comments in any `r/R/*.R` file, regenerate
documentation:

```bash
# From the r/ directory
make doc
```

This runs code formatting (via pre-commit), generates documentation helpers,
runs roxygen2, and stages the updated `.Rd` files. Always do this before
committing documentation changes.

### Code Formatting

The package uses the [air](https://posit-dev.github.io/air/) formatter
configured in `r/air.toml` (line width 120, excludes generated files). Run
formatting via pre-commit:

```bash
# From the r/ directory — format only changed files
make style

# Format all files
make style-all
```

The full R pre-commit suite (lint, format, C++ format, C++ lint):

```bash
pre-commit run --show-diff-on-failure --color=always --all-files r
```

### Testing

Tests use testthat (edition 3). Test files live in `r/tests/testthat/` and
follow the naming convention `test-<topic>.R`, with shared helpers in
`helper-*.R` files.

```r
# Run all tests
devtools::test()

# Run tests for a single file (e.g., test-array.R)
devtools::test(filter = "array")
```

```bash
# Full R CMD check
make check
```

### Package Check

```r
devtools::check()
```

## Code Style

Follow the [tidyverse style guide](https://style.tidyverse.org/). See also
`r/STYLE.md` for documentation-specific conventions.

- Use `snake_case` for functions and arguments.
- The base pipe `|>` does not support the `.` placeholder inside braces the way
`%>%` does. Use explicit function parameters instead.

## File Structure

| Path | Contents |
|------|----------|
| `r/R/*.R` | R source code |
| `r/src/*.cpp` | C++ bindings (using cpp11) |
| `r/src/arrowExports.cpp` | **Generated** — do not edit |
| `r/R/arrowExports.R` | **Generated** — do not edit |
| `r/data-raw/codegen.R` | Code generation script |
| `r/data-raw/docgen.R` | Documentation generation script |
| `r/tests/testthat/test-*.R` | Test files |
| `r/tests/testthat/helper-*.R` | Shared test helpers |
| `r/man/*.Rd` | **Generated** — edit roxygen comments in `r/R/`, then `make doc` |
| `r/vignettes/` | Package vignettes |

## Pull Requests

### Title Format

```
GH-<issue-number>: [R] <description>
```

For example: `GH-49607: [R] Add AGENTS.md file to R package`

### PR Description

Use all four template sections:

```markdown
### Rationale for this change

[1-2 sentences: why is this change needed?]

### What changes are included in this PR?

[1-2 sentences: what was changed?]

### Are these changes tested?

[How the changes were tested]

### Are there any user-facing changes?

[Yes/No with brief explanation]
```

Do not skip any section.

## Git Workflow

This project uses a fork-based workflow:

- **Push to your fork** (`origin`), never directly to `upstream`
(`apache/arrow`).
- **Create PRs** from your fork to `upstream/main`.
- Fetch latest upstream before creating a branch:

```bash
git fetch upstream
git checkout upstream/main
git checkout -b GH-<issue>-description
```
Loading