Replies: 9 comments
-
|
Yes, the EST v2.0 tool uses a default blast-num-matches value of 250. As far as I know, the default value for the equivalent parameter in the v1.0 EST tool (currently hosted on EFI-Web) is 1,000,000. Due to this change in default values, you will see drastically different results between the EFI-Web results and the v2.0 tools being run on the command line for sequence sets that are larger than 250 sequences. Setting the command line tool's value to 1 million should recapture the EFI-Web results. But there are other reasons why your local and EFI-Web results may be different. If results still differ, let us know which version of the EFI database you are using for your local runs. The public EFI-Web tools are currently using version 105. Unfortunately, the change in default values is not well documented. The logic of doing this change is:
I hope this helps and please let us know if you are still not recovering the results of the EFI-web data. Thanks for testing the code! |
Beta Was this translation helpful? Give feedback.
-
|
@ycchenchn Could I ask you to share how you are running this on the command line? The exact command and parameters would be useful. |
Beta Was this translation helpful? Give feedback.
-
|
@rbdavid Thank you very much for your detailed explanation regarding the Following your suggestion, I adjusted the Regarding the impact of Thank you again for your assistance and for developing these valuable tools. |
Beta Was this translation helpful? Give feedback.
-
|
@nilsoberg Sure, here are the exact command-line commands I used for my runs:
|
Beta Was this translation helpful? Give feedback.
-
|
@ycchenchn Thanks for the information. We are interested in real-world benchmarks. Could I ask if you are running the software on a PC or on a cluster? Do you know the approximate number of sequences you are using in the computations? |
Beta Was this translation helpful? Give feedback.
-
|
@nilsoberg Thank you for your follow-up. Sorry for not specifying the family ID earlier. The family I am working with is IPR002123. The full size is 219,292 sequences, the UniRef90 size is 115,176, and the UniRef50 size is 26,830. In params.yml:
In conf/est/docker.config (for the blastreduce process):
However, I am not sure how much this configuration exceeds the actual requirements. According to the Nextflow report, the blastreduce step used 14.762 GB of virtual memory (vmem), 11.842 GB of resident memory (rss), with peak values of 15.634 GB (peak_vmem) and 12.582 GB (peak_rss). The total run time is 1h2m35s. I can provide the full Nextflow report if needed. |
Beta Was this translation helpful? Give feedback.
-
|
@nilsoberg @rbdavid Hi, I have two follow-up questions regarding the differences between the web-based and code-based SSN pipelines:
Thank you very much for your help! |
Beta Was this translation helpful? Give feedback.
-
|
Could you email me at noberg@illinois.edu? I would like to help you out and it might be best to do that over email. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply! I have sent you an email as requested. Looking forward to your help. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am using both the web-based EFI-EST and the code-based (GitHub) version for SSN analysis in family mode. I noticed that, with the same input family and similar parameters (e.g., family ID, database, e-value, domain, taxonomy filter, etc.), the number of edges in the SSN generated by the code-based version (with the default blast-num-matches=250) is significantly lower than that from the web-based version. When I increase the blast-num-matches parameter in the code-based version, the number of edges increases and gets closer to the web-based result.
Thank you very much for your help!
Beta Was this translation helpful? Give feedback.
All reactions