BBTools/cladeserver.sh at master · bbushnell/BBTools · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
#!/bin/bash

usage(){
echo "
Written by Chloe
Last modified October 12, 2025

Description:  Starts a CladeServer for taxonomic classification using QuickClade
architecture.  CladeServer is a high-performance HTTP server that loads a
reference clade database once into memory and then handles multiple client
requests efficiently.  This server-based approach dramatically reduces memory
requirements for clients and enables high-throughput taxonomic classification
for multiple users or batch processing workflows.

CladeServer receives text-encoded Clade objects (NOT raw FASTA) from SendClade
clients and performs fast k-mer frequency comparisons against the preloaded
reference database.  The server architecture separates database loading from
query processing, allowing the expensive initialization to be done once while
serving many classification requests quickly.

Results can be returned in human-readable format or tab-delimited machine format
suitable for downstream analysis pipelines.

Usage Examples:
cladeserver.sh ref=refseqA48_with_ribo.spectra.gz
cladeserver.sh ref=refseqA48_with_ribo.spectra.gz port=3069 killcode=magical_girl_2025
cladeserver.sh ref=refseqA48_with_ribo.spectra.gz verbose=t localhost=f
cladeserver.sh ref=my_custom_db.spectra.gz port=8080 heap=10 verbose2=t
cladeserver.sh ref=bacteria_only.spectra.gz port=3069 prefix=/10.0.0

Server Parameters:
port=3069       Server listening port.  Choose an available port for the HTTP
                server.  Default is 3069.  Clients must specify this port
                when connecting to the server.
killcode=       Security code for remote server shutdown.  When specified,
                allows remote shutdown by accessing /kill/<killcode> endpoint.
                Without a kill code, the server can only be stopped locally.
                Choose a secure, unpredictable password.
localhost=t     Allow connections from localhost (127.0.0.1).  Set to false
                to restrict localhost access in security-sensitive environments.
prefix=<string> Required address prefix for client connections.  Only clients
                connecting from IP addresses starting with this prefix will
                be allowed.  Useful for restricting access to specific subnets
                or IP ranges, e.g., prefix=/10.0.0 or prefix=/192.168.1.
remotefileaccess=f
                Allow remote file access through the server.  When enabled,
                clients can potentially access files on the server filesystem.
                Keep disabled unless specifically needed for security.

Processing Parameters:
ref=<file>      Reference clade database file (REQUIRED).  Should be a .spectra
                file generated by CladeLoader or similar BBTools clade utilities.
                This database is loaded once at server startup and used for all
                subsequent taxonomic classifications.  Large databases may require
                several minutes to load and significant memory.
hits=1          Default number of top taxonomic hits to return per query.
                Clients can override this parameter in their requests.  More
                hits provide alternative classifications but increase response
                size and processing time.
heap=1          Default number of intermediate comparison results to store
                during processing.  Higher values may improve accuracy for
                complex queries but increase memory usage.  Clients can
                override this in individual requests.
format=human    Default output format.  Options are 'human' for readable
                output with detailed information, or 'oneline'/'machine' for
                tab-delimited format suitable for parsing.  Clients can
                specify format preferences in their requests.
banself=f       Default setting for banning self-matches.  When true, ignores
                records with the same TaxID as the query, useful for accuracy
                testing.  Clients can override this per request.
bandupes=f      Default setting for banning duplicate matches.  When true,
                prevents the same reference from appearing multiple times,
                ensuring all hits represent distinct classifications.
printqtid=f     Default setting for printing query TaxIDs when present in
                sequence headers.  Useful for benchmarking with labeled data
                containing taxonomic information in headers.

Verbose Parameters:
verbose=f       Enable standard verbose logging.  Shows request processing,
                timing information, and basic server statistics.  Useful for
                monitoring server activity and performance.
verbose2=f      Enable detailed debug logging.  Shows extensive debugging
                information including HTTP headers, request parsing details,
                and step-by-step processing.  Generates significant log output;
                use only for debugging specific issues.

Server Architecture:
CladeServer uses Java HTTP server infrastructure to handle concurrent requests
efficiently.  The server creates separate handlers for different endpoints:
- /clade: Main classification endpoint for processing taxonomic queries
- /kill: Secure shutdown endpoint (requires kill code)
- /stats: Server statistics including uptime and query counts
- /: Help information and usage guidance

Memory Requirements:
Server memory usage depends primarily on reference database size.  Typical
requirements range from 4-16GB for standard databases.  The default memory
allocation is 8GB (-Xmx8g -Xms8g).  Large custom databases may require
additional memory.  Memory is allocated once at startup and reused for all
subsequent requests.

Security Considerations:
- Use killcode parameter for secure remote shutdown capability
- Configure localhost and prefix parameters to restrict access appropriately
- Keep remotefileaccess=false unless specifically required
- Monitor logs for unauthorized access attempts
- Choose non-standard ports for production deployments

Performance Notes:
Database loading occurs once at startup and may take several minutes for large
references.  Once loaded, individual queries are processed quickly.  The server
is designed for high-throughput scenarios where many classification requests
need to be processed efficiently.  Concurrent requests are handled safely with
thread-safe data structures.

Server Endpoints:
POST /clade - Main classification endpoint
GET /kill/<code> - Shutdown server (requires kill code)
GET /stats - Server statistics and uptime
GET / - Usage help and server information

To shutdown remotely:
1. Start server with killcode: cladeserver.sh ref=db.spectra killcode=secret123
2. Shutdown via HTTP: curl http://server:port/kill/secret123

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
For documentation and the latest version, visit: https://bbmap.org
"
}

if [ -z "$1" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
	usage
	exit
fi

resolveSymlinks(){
	SCRIPT="$(cd "$(dirname "$0")" && pwd)/$(basename "$0")"
	while [ -h "$SCRIPT" ]; do
		DIR="$(dirname "$SCRIPT")"
		SCRIPT="$(readlink "$SCRIPT")"
		[ "${SCRIPT#/}" = "$SCRIPT" ] && SCRIPT="$DIR/$SCRIPT"
	done
	DIR="$(cd "$(dirname "$SCRIPT")" && pwd)"
	if [ -f "$DIR/bbtools.jar" ]; then
		CP="$DIR/bbtools.jar"
	else
		CP="$DIR/current/"
	fi
}

setEnv(){
	. "$DIR/javasetup.sh"
	. "$DIR/memdetect.sh"

	parseJavaArgs "--xmx=8g" "--xms=8g" "--mode=fixed" "$@"
	setEnvironment
}

launch() {
	CMD="java $EA $EOOM $SIMD $XMX $XMS -cp $CP clade.CladeServer $@"
	echo "$CMD" >&2
	eval $CMD
}

resolveSymlinks
setEnv "$@"
launch "$@"