These JSON files represent sessions of queries performed by either real users or simulated runs. They share a similar hierarchical structure.
Both user and simulated files have this general form:
{
"id": "...",
"sid": "...",
"rank": "...", //optional parameter
"interactions": [ ... ]
}- Optional parameter: If the query prediction candidates of the simulator are not ranked, you don't need to set the rank field. Be aware that all candidates for one query are considered as rank 1, and separate values are calculated for each of them.
| Field | Type | Description |
|---|---|---|
id |
string |
Unique identifier of the session/run. For user files, e.g., "Session_2"; for simulated runs, e.g., "Run_1_core-bm25-1-query-advanced_question-200td.log" |
sid |
string |
Session ID, critical for matching simulated runs with real user sessions |
rank |
string |
(optional) Describes the order of the candidate queries by the expected likelihood to reproduce the original query (ascending order) |
interactions |
List[dict] |
Chronologically ordered queries for this session/run |
Each element of interactions has the following structure:
{
"q": "...",
"serp": [ ... ],
"clicks": [ ... ]
}| Field | Type | Description |
|---|---|---|
q |
string |
Search query text |
serp |
List[int] |
Document IDs returned in the Search Engine Results Page (SERP) for this query, ordered by rank |
clicks |
List[int] |
Document IDs that were clicked |
{
"id": "Session_2",
"sid": "2",
"interactions": [
{
"q": "passivation",
"serp": [
{
"docid": "5184714",
"score": null
},
{
"docid": "717105",
"score": null
},
{
"docid": "4712986",
"score": null
},
{
"docid": "5096442",
"score": null
},
{
"docid": "2249793",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
}
],
"clicks": []
},
{
"q": "acid passivation",
"serp": [
{
"docid": "40843926",
"score": null
},
{
"docid": "4712986",
"score": null
},
{
"docid": "4488527",
"score": null
},
{
"docid": "42872005",
"score": null
},
{
"docid": "51414810",
"score": null
},
{
"docid": "132317361",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
}
],
"clicks": [24919459]
}
]
...
}- Contains all interactions in chronological order
- SERP arrays contain the ranked result lists with corresponding DocIDs (and scores if available)
- Click arrays show documents clicked by the user
{
"id": "Run_1_core-bm25-2-query-advanced_question-200td.log",
"rank": "1",
"sid": "2",
"interactions": [
{
"q": "passivation",
"serp": [
{
"docid": "5184714",
"score": null
},
{
"docid": "717105",
"score": null
},
{
"docid": "4712986",
"score": null
},
{
"docid": "5096442",
"score": null
},
{
"docid": "2249793",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
}
],
"clicks": []
},
{
"q": "acid passivation",
"serp": [
{
"docid": "40843926",
"score": null
},
{
"docid": "4712986",
"score": null
},
{
"docid": "4488527",
"score": null
},
{
"docid": "42872005",
"score": null
},
{
"docid": "51414810",
"score": null
},
{
"docid": "132317361",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
},
{
"docid": "",
"score": null
}
],
"clicks": [24919459]
}
]
...
}- If there are multiple candidate queries predicted for the original query, rank indicates the likelihood of success for this query to be similar to the original query
- sid matches the corresponding user session for evaluation.
The examples originate from the example files located in the data folder. In order to get a better understanding on how the files needs to be structured you can have a look at the corresponding files. Every .log files corresponds to one simulation run/original log files that was brought into the corresponding format.
- Matching by sid:
- Only simulated runs with the same sid should be compared to the user session
idis descriptive and mainly for logging
- Chronology preserved:
- Queries are listed in order
- SERP alignment:
serpcontains ranked documents
- Multiple runs per session:
rankdistinguishes multiple simulated SERPs per user session- Each run can be evaluated independently