NFDI Search Engine

The NFDI Search Engine is an ad hoc research gateway developed within the NFDI4DataScience context. It aggregates and searches scientific resources from multiple external data sources through a single web interface.

It provides a unified entry point to discover publications, researchers, datasets, projects, organizations, and related research resources across a broad set of integrated sources.

Overview

The system follows an ad hoc federated search approach. For each user query, multiple external data sources are queried in parallel. The responses are mapped to a common schema, aggregated, and presented in a unified interface.

There is no central index of external data. All results are retrieved on demand from the configured sources.

Main capabilities

Ad hoc federated search across multiple external research data providers
Adapter based source integration with a shared internal data model
Mapping of heterogeneous source responses to a common schema
Aggregation of results into common categories such as publications and researchers
Session based result handling for pagination and incremental loading
User accounts with configurable source preferences

Architecture in brief

A user submits a query through the web interface
The backend executes the query against external repositories via their APIs
Each adapter maps its response to the shared internal schema
A controller deduplicates and aggregates results across sources
Results are grouped and stored in the user session
The interface renders the results and supports loading more items per category

The application is implemented as a Flask based web service.

Source adapters

Source integrations are implemented as adapters. Each adapter is a dedicated integration module that queries a specific external provider API and maps the response into the shared internal data model.

Each adapter is responsible for:

Querying its upstream API or endpoint
Handling authentication when required
Mapping responses into the internal representation

The active sources are configured in config.py. Adding a new source typically means implementing a new adapter and registering it in the configuration.

Examples of integrated source types include scholarly repositories, research knowledge graphs, researcher identifier services, dataset repositories, and project or funding databases. The exact set of enabled sources depends on configuration and available credentials.

Configuration

Configuration is managed through environment variables and config.py.

Typical configuration includes:

A secret key for session handling
Optional API keys for selected external providers
Feature flags for optional components such as the chatbot

Sensitive values such as API keys should be provided through environment variables and not committed to the repository.

Local development

Clone the repository

Clone the repository and move into the project directory.

git clone https://github.com/semantic-systems/nfdi-search-engine.git
cd nfdi-search-engine

Requirements

Python 3.11 or a compatible Python 3 version
A local virtual environment is recommended

Environment setup

Before running the application, copy the example configuration files and adjust them as needed.

Copy .env.example to .env
Set at least the following variables:
1. SECRET_KEY for session handling
2. Optional API keys for external sources you enable
3. Optional chatbot or analytics related settings

The file config.py defines the available configuration options and enabled sources.

For logging configuration, you may optionally copy logging.conf.example to logging.conf and adjust log levels or handlers.

Install dependencies

pip install -r requirements.txt

Run the application locally

python main.py

After starting, the web interface will be available on the configured port.

Docker usage

A Dockerfile and Docker Compose configuration are provided for container based deployment.

Build and run using Docker Compose

docker compose up --build

By default, the application is exposed on port 6000 on the host system.

Intended audience

This project is intended for:

Developers who want to run or extend an ad hoc federated research search system
Research infrastructure teams integrating multiple scholarly data sources
Contributors working on adapters, backend logic, or the user interface

It is not intended to replace domain specific repositories or act as a long term archival system.

License

See the LICENSE file in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 800 Commits
.github/workflows		.github/workflows
nfdi_search_engine		nfdi_search_engine
sources		sources
static		static
templates		templates
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
app.py		app.py
config.py		config.py
docker-compose.yml		docker-compose.yml
logging.conf.example		logging.conf.example
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NFDI Search Engine

Overview

Main capabilities

Architecture in brief

Source adapters

Configuration

Local development

Clone the repository

Requirements

Environment setup

Install dependencies

Run the application locally

Docker usage

Build and run using Docker Compose

Intended audience

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NFDI Search Engine

Overview

Main capabilities

Architecture in brief

Source adapters

Configuration

Local development

Clone the repository

Requirements

Environment setup

Install dependencies

Run the application locally

Docker usage

Build and run using Docker Compose

Intended audience

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages