The NFDI Search Engine is an ad hoc research gateway developed within the NFDI4DataScience context. It aggregates and searches scientific resources from multiple external data sources through a single web interface.
It provides a unified entry point to discover publications, researchers, datasets, projects, organizations, and related research resources across a broad set of integrated sources.
The system follows an ad hoc federated search approach. For each user query, multiple external data sources are queried in parallel. The responses are mapped to a common schema, aggregated, and presented in a unified interface.
There is no central index of external data. All results are retrieved on demand from the configured sources.
- Ad hoc federated search across multiple external research data providers
- Adapter based source integration with a shared internal data model
- Mapping of heterogeneous source responses to a common schema
- Aggregation of results into common categories such as publications and researchers
- Session based result handling for pagination and incremental loading
- User accounts with configurable source preferences
- A user submits a query through the web interface
- The backend executes the query against external repositories via their APIs
- Each adapter maps its response to the shared internal schema
- A controller deduplicates and aggregates results across sources
- Results are grouped and stored in the user session
- The interface renders the results and supports loading more items per category
The application is implemented as a Flask based web service.
Source integrations are implemented as adapters. Each adapter is a dedicated integration module that queries a specific external provider API and maps the response into the shared internal data model.
Each adapter is responsible for:
- Querying its upstream API or endpoint
- Handling authentication when required
- Mapping responses into the internal representation
The active sources are configured in config.py. Adding a new source typically means implementing a new adapter and registering it in the configuration.
Examples of integrated source types include scholarly repositories, research knowledge graphs, researcher identifier services, dataset repositories, and project or funding databases. The exact set of enabled sources depends on configuration and available credentials.
Configuration is managed through environment variables and config.py.
Typical configuration includes:
- A secret key for session handling
- Optional API keys for selected external providers
- Feature flags for optional components such as the chatbot
Sensitive values such as API keys should be provided through environment variables and not committed to the repository.
Clone the repository and move into the project directory.
git clone https://github.com/semantic-systems/nfdi-search-engine.git
cd nfdi-search-engine
- Python 3.11 or a compatible Python 3 version
- A local virtual environment is recommended
Before running the application, copy the example configuration files and adjust them as needed.
- Copy
.env.exampleto.env - Set at least the following variables:
SECRET_KEYfor session handling- Optional API keys for external sources you enable
- Optional chatbot or analytics related settings
The file config.py defines the available configuration options and enabled sources.
For logging configuration, you may optionally copy logging.conf.example to logging.conf and adjust log levels or handlers.
pip install -r requirements.txtpython main.pyAfter starting, the web interface will be available on the configured port.
A Dockerfile and Docker Compose configuration are provided for container based deployment.
docker compose up --buildBy default, the application is exposed on port 6000 on the host system.
This project is intended for:
- Developers who want to run or extend an ad hoc federated research search system
- Research infrastructure teams integrating multiple scholarly data sources
- Contributors working on adapters, backend logic, or the user interface
It is not intended to replace domain specific repositories or act as a long term archival system.
See the LICENSE file in this repository.

