You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 30, 2026. It is now read-only.
This issue is called out in one of the commits in PR #117
The second issue is specific to map():
ValueError: The features can't be aligned because the key score of features {'task_description': Value(dtype='string', id=None), 'seed_question': Value(dtype='string', id=None), 'seed_response': Value(dtype='string', id=None), 'num_samples': Value(dtype='int64', id=None), 'question': Value(dtype='string', id=None), '__index_level_0__': Value(dtype='int64', id=None), 'evaluation': Value(dtype='string', id=None), 'score': Value(dtype='string', id=None)} has unexpected type - Value(dtype='string', id=None) (expected either Value(dtype='float64', id=None) or Value("null").
It appears the the datasets, only in the case of num_proc>1,
when we hit the "error converting dtype" case and set the column
to None, it ends up being still considered a string column rather
than the new type.
This second issue deserves further investigation and may require
a fix to the datasets library.
The related code in filterblock.py as of that PR is:
def_map_dtype(samples, column, dtype, num_proc=1):
defconvert_column(sample):
try:
sample[column] =dtype(sample[column])
exceptValueErrorase:
logger.error(
"Error converting dtype: %s, filling with None to be filtered later", e
)
sample[column] =Nonereturnsample# FIXME: it appears multiprocessing map has issues with# None columns. If we pass num_proc>1 here and the error# case is triggered above, we get:# ValueError: The features can't be aligned ...# because the column is still considered a string not# the new dtype.num_proc=1returnsamples.map(convert_column, num_proc=num_proc)
We need to investigate this error more deeply to figure out the best fix
This issue is called out in one of the commits in PR #117
The related code in filterblock.py as of that PR is:
We need to investigate this error more deeply to figure out the best fix