-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy pathcladeserver.sh
More file actions
executable file
·167 lines (146 loc) · 7.5 KB
/
cladeserver.sh
File metadata and controls
executable file
·167 lines (146 loc) · 7.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
#!/bin/bash
usage(){
echo "
Written by Chloe
Last modified October 12, 2025
Description: Starts a CladeServer for taxonomic classification using QuickClade
architecture. CladeServer is a high-performance HTTP server that loads a
reference clade database once into memory and then handles multiple client
requests efficiently. This server-based approach dramatically reduces memory
requirements for clients and enables high-throughput taxonomic classification
for multiple users or batch processing workflows.
CladeServer receives text-encoded Clade objects (NOT raw FASTA) from SendClade
clients and performs fast k-mer frequency comparisons against the preloaded
reference database. The server architecture separates database loading from
query processing, allowing the expensive initialization to be done once while
serving many classification requests quickly.
Results can be returned in human-readable format or tab-delimited machine format
suitable for downstream analysis pipelines.
Usage Examples:
cladeserver.sh ref=refseqA48_with_ribo.spectra.gz
cladeserver.sh ref=refseqA48_with_ribo.spectra.gz port=3069 killcode=magical_girl_2025
cladeserver.sh ref=refseqA48_with_ribo.spectra.gz verbose=t localhost=f
cladeserver.sh ref=my_custom_db.spectra.gz port=8080 heap=10 verbose2=t
cladeserver.sh ref=bacteria_only.spectra.gz port=3069 prefix=/10.0.0
Server Parameters:
port=3069 Server listening port. Choose an available port for the HTTP
server. Default is 3069. Clients must specify this port
when connecting to the server.
killcode= Security code for remote server shutdown. When specified,
allows remote shutdown by accessing /kill/<killcode> endpoint.
Without a kill code, the server can only be stopped locally.
Choose a secure, unpredictable password.
localhost=t Allow connections from localhost (127.0.0.1). Set to false
to restrict localhost access in security-sensitive environments.
prefix=<string> Required address prefix for client connections. Only clients
connecting from IP addresses starting with this prefix will
be allowed. Useful for restricting access to specific subnets
or IP ranges, e.g., prefix=/10.0.0 or prefix=/192.168.1.
remotefileaccess=f
Allow remote file access through the server. When enabled,
clients can potentially access files on the server filesystem.
Keep disabled unless specifically needed for security.
Processing Parameters:
ref=<file> Reference clade database file (REQUIRED). Should be a .spectra
file generated by CladeLoader or similar BBTools clade utilities.
This database is loaded once at server startup and used for all
subsequent taxonomic classifications. Large databases may require
several minutes to load and significant memory.
hits=1 Default number of top taxonomic hits to return per query.
Clients can override this parameter in their requests. More
hits provide alternative classifications but increase response
size and processing time.
heap=1 Default number of intermediate comparison results to store
during processing. Higher values may improve accuracy for
complex queries but increase memory usage. Clients can
override this in individual requests.
format=human Default output format. Options are 'human' for readable
output with detailed information, or 'oneline'/'machine' for
tab-delimited format suitable for parsing. Clients can
specify format preferences in their requests.
banself=f Default setting for banning self-matches. When true, ignores
records with the same TaxID as the query, useful for accuracy
testing. Clients can override this per request.
bandupes=f Default setting for banning duplicate matches. When true,
prevents the same reference from appearing multiple times,
ensuring all hits represent distinct classifications.
printqtid=f Default setting for printing query TaxIDs when present in
sequence headers. Useful for benchmarking with labeled data
containing taxonomic information in headers.
Verbose Parameters:
verbose=f Enable standard verbose logging. Shows request processing,
timing information, and basic server statistics. Useful for
monitoring server activity and performance.
verbose2=f Enable detailed debug logging. Shows extensive debugging
information including HTTP headers, request parsing details,
and step-by-step processing. Generates significant log output;
use only for debugging specific issues.
Server Architecture:
CladeServer uses Java HTTP server infrastructure to handle concurrent requests
efficiently. The server creates separate handlers for different endpoints:
- /clade: Main classification endpoint for processing taxonomic queries
- /kill: Secure shutdown endpoint (requires kill code)
- /stats: Server statistics including uptime and query counts
- /: Help information and usage guidance
Memory Requirements:
Server memory usage depends primarily on reference database size. Typical
requirements range from 4-16GB for standard databases. The default memory
allocation is 8GB (-Xmx8g -Xms8g). Large custom databases may require
additional memory. Memory is allocated once at startup and reused for all
subsequent requests.
Security Considerations:
- Use killcode parameter for secure remote shutdown capability
- Configure localhost and prefix parameters to restrict access appropriately
- Keep remotefileaccess=false unless specifically required
- Monitor logs for unauthorized access attempts
- Choose non-standard ports for production deployments
Performance Notes:
Database loading occurs once at startup and may take several minutes for large
references. Once loaded, individual queries are processed quickly. The server
is designed for high-throughput scenarios where many classification requests
need to be processed efficiently. Concurrent requests are handled safely with
thread-safe data structures.
Server Endpoints:
POST /clade - Main classification endpoint
GET /kill/<code> - Shutdown server (requires kill code)
GET /stats - Server statistics and uptime
GET / - Usage help and server information
To shutdown remotely:
1. Start server with killcode: cladeserver.sh ref=db.spectra killcode=secret123
2. Shutdown via HTTP: curl http://server:port/kill/secret123
Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
For documentation and the latest version, visit: https://bbmap.org
"
}
if [ -z "$1" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
usage
exit
fi
resolveSymlinks(){
SCRIPT="$(cd "$(dirname "$0")" && pwd)/$(basename "$0")"
while [ -h "$SCRIPT" ]; do
DIR="$(dirname "$SCRIPT")"
SCRIPT="$(readlink "$SCRIPT")"
[ "${SCRIPT#/}" = "$SCRIPT" ] && SCRIPT="$DIR/$SCRIPT"
done
DIR="$(cd "$(dirname "$SCRIPT")" && pwd)"
if [ -f "$DIR/bbtools.jar" ]; then
CP="$DIR/bbtools.jar"
else
CP="$DIR/current/"
fi
}
setEnv(){
. "$DIR/javasetup.sh"
. "$DIR/memdetect.sh"
parseJavaArgs "--xmx=8g" "--xms=8g" "--mode=fixed" "$@"
setEnvironment
}
launch() {
CMD="java $EA $EOOM $SIMD $XMX $XMS -cp $CP clade.CladeServer $@"
echo "$CMD" >&2
eval $CMD
}
resolveSymlinks
setEnv "$@"
launch "$@"