Hi,
We are currently utilizing the code-based EFI-EST tool for Sequence Similarity Network (SSN) analysis in FASTA mode. We appreciate its capabilities and flexibility.
We understand that the default configuration of the code-based tool uses UniProt Version 2025_02 and InterPro Version 104 as its primary databases for sequence comparisons and annotation retrieval.
Our research requires us to analyze sequences that are from a homemade database or publicly available from NCBI (e.g., GenBank/RefSeq), and these sequences are not necessarily curated within UniProt. Our goal is to perform SSN analysis on these specific sets of sequences.
We've noted that while the online EFI-EST tool's FASTA mode does accept custom sequence input, the resulting SSN nodes for these sequences often lack detailed Taxonomy information. This appears to be because the tool primarily retrieves such comprehensive metadata (including taxonomy) by cross-referencing against its internal databases (derived from UniProt/InterPro) using identified UniProt/UniRef IDs. If a user-provided sequence does not correspond to an ID for which comprehensive metadata is available in these pre-built databases, the detailed taxonomy may be missing.
Therefore, our primary questions regarding the code-based version are:
- Can we substitute the default FASTA sequence database (specified by
--fasta-db, e.g., data/efi/blastdb/uniref50.fasta) with our own custom FASTA file for BLAST comparisons when running the SSN analysis? If so, are there any specific formatting requirements for our custom FASTA headers, or recommended steps to configure this?
- If we use a custom FASTA database (containing sequences not in UniProt), how can we effectively incorporate associated metadata, such as Taxonomy information, for these custom sequences into the generated SSN nodes? Is there a mechanism or a specific file format (e.g., a tab-separated file) that the code-based tool can accept to parse and integrate these custom annotations (like Organism, Taxonomy ID, etc.) into the XGMML output, similar to how the
efi_db.sqlite provides annotations for the default database?
Any guidance, examples, or pointers to relevant documentation for using custom sequence databases and integrating external metadata would be immensely helpful.
Thank you very much for your time and support!
Hi,
We are currently utilizing the code-based EFI-EST tool for Sequence Similarity Network (SSN) analysis in FASTA mode. We appreciate its capabilities and flexibility.
We understand that the default configuration of the code-based tool uses UniProt Version 2025_02 and InterPro Version 104 as its primary databases for sequence comparisons and annotation retrieval.
Our research requires us to analyze sequences that are from a homemade database or publicly available from NCBI (e.g., GenBank/RefSeq), and these sequences are not necessarily curated within UniProt. Our goal is to perform SSN analysis on these specific sets of sequences.
We've noted that while the online EFI-EST tool's FASTA mode does accept custom sequence input, the resulting SSN nodes for these sequences often lack detailed Taxonomy information. This appears to be because the tool primarily retrieves such comprehensive metadata (including taxonomy) by cross-referencing against its internal databases (derived from UniProt/InterPro) using identified UniProt/UniRef IDs. If a user-provided sequence does not correspond to an ID for which comprehensive metadata is available in these pre-built databases, the detailed taxonomy may be missing.
Therefore, our primary questions regarding the code-based version are:
--fasta-db, e.g.,data/efi/blastdb/uniref50.fasta) with our own custom FASTA file for BLAST comparisons when running the SSN analysis? If so, are there any specific formatting requirements for our custom FASTA headers, or recommended steps to configure this?efi_db.sqliteprovides annotations for the default database?Any guidance, examples, or pointers to relevant documentation for using custom sequence databases and integrating external metadata would be immensely helpful.
Thank you very much for your time and support!