Summary
#1438 / PR #1443 fixed json column handling on MariaDB by recovering attr.json from the :json: comment marker. External store codec columns (<filepath@store>, <blob@store>) have the same root cause but are not covered by that fix — their encode() returns a Python dict that reaches pymysql unserialized, raising TypeError: dict can not be used as parameter.
Reproduction
Using DataJoint 2.2.2 against MariaDB 10.3.28:
import datajoint as dj
schema = dj.Schema("test_filepath_mariadb")
dj.config["stores"] = {
"local": {
"protocol": "file",
"location": "/tmp/dj_test_store",
"stage": "/tmp/dj_test_store",
}
}
@schema
class FileTest(dj.Manual):
definition = """
id : int
---
data_file : <filepath@local>
"""
# Create a dummy file
import os
os.makedirs("/tmp/dj_test_store", exist_ok=True)
with open("/tmp/dj_test_store/test.bin", "wb") as f:
f.write(b"test")
FileTest.insert1({"id": 1, "data_file": "test.bin"})
# TypeError: dict can not be used as parameter
Same error for <blob@store> columns.
Root cause
The filepath codec's encode() returns a dict ({path, store, size, timestamp}). The blob codec behaves similarly. On insert, __make_placeholder() (table.py line ~1383) checks attr.json to decide whether to json.dumps() the value:
elif attr.json:
value = json.dumps(value)
On MySQL, the column type is reported as json from information_schema, so attr.json is set to True and the dict is serialized. On MariaDB, the column type is reported as longtext. PR #1443 recovers attr.json for plain json columns using the :json: comment marker, but codec columns have a :<filepath@store>: marker instead — their original_type resolves through the codec's get_dtype() → "json", which should hit the same category == "JSON" recovery path, but empirically attr.json remains False for these columns on MariaDB.
The unserialized dict reaches pymysql, which raises TypeError in converters.escape_dict().
Impact
This affects all external store columns on MariaDB:
<filepath@store> — file reference metadata
<blob@store> — serialized data references
Both insert and fetch are affected (fetch returns a raw JSON string instead of a dict on MariaDB).
Current workaround
We are patching pymysql's converters at import time to serialize dicts as JSON strings:
import json
import pymysql.converters
def _escape_dict_as_json(val, charset, mapping=None):
return pymysql.converters.escape_str(json.dumps(val, default=str), charset)
pymysql.converters.encoders[dict] = _escape_dict_as_json
pymysql.converters.conversions[dict] = _escape_dict_as_json
This must be applied before any DataJoint connection is created. On MySQL, it is inert (DataJoint serializes dicts before they reach pymysql). This workaround is in production at the Sainsbury Wellcome Centre.
Suggested fix
Extend the attr.json recovery from PR #1443 to cover codec columns whose get_dtype() resolves to "json". Alternatively, have __make_placeholder() call json.dumps() on any dict value destined for a longtext column, regardless of attr.json.
Environment
- DataJoint 2.2.2
- MariaDB 10.3.28
- pymysql 1.1.1
- Python 3.11
Summary
#1438 / PR #1443 fixed
jsoncolumn handling on MariaDB by recoveringattr.jsonfrom the:json:comment marker. External store codec columns (<filepath@store>,<blob@store>) have the same root cause but are not covered by that fix — theirencode()returns a Python dict that reaches pymysql unserialized, raisingTypeError: dict can not be used as parameter.Reproduction
Using DataJoint 2.2.2 against MariaDB 10.3.28:
Same error for
<blob@store>columns.Root cause
The
filepathcodec'sencode()returns a dict ({path, store, size, timestamp}). Theblobcodec behaves similarly. On insert,__make_placeholder()(table.py line ~1383) checksattr.jsonto decide whether tojson.dumps()the value:On MySQL, the column type is reported as
jsonfrominformation_schema, soattr.jsonis set toTrueand the dict is serialized. On MariaDB, the column type is reported aslongtext. PR #1443 recoversattr.jsonfor plainjsoncolumns using the:json:comment marker, but codec columns have a:<filepath@store>:marker instead — theiroriginal_typeresolves through the codec'sget_dtype()→"json", which should hit the samecategory == "JSON"recovery path, but empiricallyattr.jsonremainsFalsefor these columns on MariaDB.The unserialized dict reaches pymysql, which raises
TypeErrorinconverters.escape_dict().Impact
This affects all external store columns on MariaDB:
<filepath@store>— file reference metadata<blob@store>— serialized data referencesBoth insert and fetch are affected (fetch returns a raw JSON string instead of a dict on MariaDB).
Current workaround
We are patching pymysql's converters at import time to serialize dicts as JSON strings:
This must be applied before any DataJoint connection is created. On MySQL, it is inert (DataJoint serializes dicts before they reach pymysql). This workaround is in production at the Sainsbury Wellcome Centre.
Suggested fix
Extend the
attr.jsonrecovery from PR #1443 to cover codec columns whoseget_dtype()resolves to"json". Alternatively, have__make_placeholder()calljson.dumps()on any dict value destined for alongtextcolumn, regardless ofattr.json.Environment