Skip to content

[R] read_ipc_stream fails to unify nested uint64 fields inside a Struct array across record batches #50339

Description

@jo-vogel

When reading an Arrow IPC stream that contains a nested Struct array named regions, some of its fields — e.g. regional_key — are explicitly typed as uint64 in the source API schema (described here: https://md.umwelt.info/swagger-ui/ at the bottom in Dataset/regions/RegionalKey).

Across different record batches in the stream, the R arrow package inconsistently casts this nested uint64 field:
In chunks with smaller values, it is converted into an R integer.
In chunks with larger values, it is converted into an R double to preserve precision.

Because this type unification fails globally across chunks for nested elements, the resulting R data frame ends up with conflicting types row-by-row within the list-column. This causes standard tidyverse tools like tidyr::unnest_wider() to fail due to a loss of precision mismatch.

In contrast, Python's pyarrow handling of the same IPC stream correctly unifies the entire nested regional_key field into a consistent float type across all records.

According to https://arrow.apache.org/docs/r/articles/data_types.html there is the option to use
options(arrow.int64_downcast = FALSE)
but this seems to work only at the top level and not for nested structures.

Minimal Reproducible Example

library(arrow)
library(httr2)
library(tidyr)

# 1. Fetch the exact IPC stream that contains the heterogeneous nested structures
req <- httr2::request("https://md.umwelt.info/search/all?format=arrow_ipc") |>
  httr2::req_url_query(
    query = "type:'/Daten und Messstellen/Wasser/Flüsse' AND measuring_station:true", 
    language = "de"
  )

resp <- httr2::req_perform(req)
raw_bytes <- httr2::resp_body_raw(resp)

# 2. Parse the stream into an R data frame via Arrow
df <- as.data.frame(arrow::read_ipc_stream(raw_bytes))

# 3. Attempt to unnest the 'regions' struct column
# This fails because 'regional_key' inside 'regions' alternates between integer and double across rows
df |> unnest_wider(any_of(c("regions")), names_sep = "_")

# Throws:
# Error in `unnest_wider()`:
# ! Can't convert from `..1` <double> to <integer> due to loss of precision.
# • Locations: 2

Expected behavior
If a uint64 field inside a regions struct chunk requires upcasting to an R double, the entire nested column across all record batches should be uniformly converted to double to maintain data structure integrity upon data frame extraction.

Environment

OS: Ubuntu 26.04 LTS

R Version: 4.5.2

arrow R Package Version: 24.0.0

Component(s)

R

Metadata

Metadata

Assignees

No one assigned

    Labels

    Component: RStatus: needs championHigh impact issues which aren't being worked on but require a volunteer to move the task forward.Type: bug

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions