You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2026-02-03
Cluster: ks-gcp-test (GCP)
Galaxy Image: ksuderman/galaxy-guerler:26.0.dev0
Tools: ~2,776 total (~2,673 from CVMFS, 103 local)
Test Configurations
Test
Workers
Configuration
1
4
parallel_tool_loading_workers: 4
2
8
parallel_tool_loading_workers: 8
3
16
parallel_tool_loading_workers: 16
Results: 4 Workers
Job Handler (4 workers)
Phase
Time
Notes
Parallel pre-parse
23.2 seconds
3.99x speedup
Serial tool creation
25.3 seconds
100% cache hit rate
TOTAL
48.5 seconds
App startup
57.7 seconds
Web Pod Initial Startup (4 workers)
Phase
Time
Notes
Parallel pre-parse
23.2 seconds
3.99x speedup
Serial tool creation
1,421.8 seconds (~23.7 min)
100% cache hit rate
TOTAL
1,445.0 seconds (~24 min)
App startup
1,465.3 seconds (~24.4 min)
Web Pod Worker Reload (4 workers)
Phase
Time
Notes
Parallel pre-parse
20.0 seconds
3.99x speedup
Serial tool creation
1.1 seconds
TOTAL
21.1 seconds
Results: 8 Workers
Job Handler (8 workers)
Phase
Time
Notes
Parallel pre-parse
22.7 seconds
7.87x speedup
Serial tool creation
25.2 seconds
100% cache hit rate
TOTAL
47.9 seconds
App startup
55.1 seconds
Web Pod Initial Startup (8 workers)
Phase
Time
Notes
Parallel pre-parse
22.6 seconds
7.87x speedup
Serial tool creation
1,283.9 seconds (~21 min)
100% cache hit rate
TOTAL
1,306.5 seconds (~22 min)
App startup
1,323 seconds (~22 min)
Web Pod Worker Reload (8 workers)
Phase
Time
Notes
Parallel pre-parse
16.4 seconds
Faster due to OS caching
Serial tool creation
0.75 seconds
TOTAL
17.1 seconds
Results: 16 Workers
Job Handler (16 workers)
Phase
Time
Notes
Parallel pre-parse
26.0 seconds
14.61x speedup
Serial tool creation
27.1 seconds
100% cache hit rate
TOTAL
53.1 seconds
App startup
60.5 seconds
Web Pod Initial Startup (16 workers)
Phase
Time
Notes
Parallel pre-parse
26.3 seconds
14.60x speedup
Serial tool creation
1,597.3 seconds (~26.6 min)
100% cache hit rate
TOTAL
1,623.6 seconds (~27 min)
App startup
1,654.5 seconds (~27.5 min)
Web Pod Worker Reload (16 workers)
Phase
Time
Notes
Parallel pre-parse
20.6 seconds
14.11x speedup
Serial tool creation
1.0 seconds
TOTAL
21.7 seconds
Comparison: Worker Count Scaling
Pre-parsing Phase
Metric
4 Workers
8 Workers
16 Workers
Wall clock time
~23 sec
~22 sec
~26 sec
Speedup factor
3.99x
7.87x
14.6x
Avg time per tool
26.5 ms
50.8 ms
114.4 ms
Sequential time estimate
74 sec
141 sec
317 sec
Analysis:
Speedup factor scales nearly linearly with workers (4x, 8x, 15x)
Wall clock time stays ~22-26 seconds regardless of worker count
Per-tool time increases with more workers due to I/O contention on CVMFS
4 workers has the lowest per-tool latency (26.5ms) indicating least contention
Serial Tool Creation Phase
Pod
4 Workers
8 Workers
16 Workers
Job Handler
25.3 sec
25.2 sec
27.1 sec
Web Pod
1,422 sec (~24 min)
1,284 sec (~21 min)
1,597 sec (~27 min)
Analysis:
Job handler serial phase is consistent (~25-27 sec) across all worker counts
Web pod serial phase varies significantly (21-27 min) with 8 workers being fastest
The serial phase is not parallelized, so variations may be due to I/O caching effects
Total Startup Time
Pod
4 Workers
8 Workers
16 Workers
Job Handler
57.7 sec
55.1 sec
60.5 sec
Web Pod
1,465 sec (~24.4 min)
1,323 sec (~22 min)
1,654 sec (~27.5 min)
Winner: 8 workers - Provides the best balance for web pod startup time.
Comparison: Parallel vs Serial Loading
Configuration
Job Handler
Web Pod
Serial loading (parallel=false)
45.8 sec
1,079 sec (~18 min)
Parallel loading (4 workers)
48.5 sec
1,445 sec (~24 min)
Parallel loading (8 workers)
47.9 sec
1,306 sec (~22 min)
Parallel loading (16 workers)
53.1 sec
1,624 sec (~27 min)
Conclusion: Parallel loading actually makes startup slower for the web pod because:
The bottleneck is the serial tool creation phase, not pre-parsing
Parallel pre-parsing adds ~23 seconds of overhead with no benefit
The web pod's serial phase is ~50-60x slower than the job handler's
Slowest Tools to Parse
With 4 Workers (lowest contention)
Tool
Time
picard
5,841 ms
text_processing
5,176 ms
scanpy_plot
2,224 ms
scanpy_plot
2,181 ms
scanpy_plot
2,127 ms
scanpy_plot
1,305 ms
scanpy_plot
1,241 ms
scanpy_plot
1,191 ms
maxquant
942 ms
scanpy_plot
864 ms
With 16 Workers (high contention)
Tool
Time
scanpy_plot
7,042 ms
scanpy_plot
6,386 ms
scanpy_plot
6,311 ms
scanpy_plot
6,012 ms
scanpy_plot
6,007 ms
scanpy_plot
5,904 ms
scanpy_plot
5,852 ms
maxquant
2,634 ms
pygenometracks
2,528 ms
multiqc
2,472 ms
Note: Per-tool times are 2-3x higher with 16 workers vs 4 workers due to I/O contention.
Key Observations
Parallel pre-parsing provides no benefit - Wall clock time for pre-parsing is ~22-26 seconds regardless of worker count. The speedup factor increases but per-tool latency also increases proportionally.
Serial creation is the real bottleneck - Despite 100% cache hit rate, the web pod's serial creation phase takes ~21-27 minutes vs ~25-27 seconds for job handler (~50-60x slower).
8 workers is the sweet spot - If using parallel loading, 8 workers provides the best web pod startup time (22 min vs 24-27 min for other configurations).
Serial loading is fastest for web pod - Without parallel loading, the web pod starts in ~18 minutes. Parallel loading adds overhead without benefit.
CVMFS I/O contention is significant - More workers = more contention = higher per-tool latency. This is evidenced by:
4 workers: 26.5 ms/tool
8 workers: 50.8 ms/tool (+92%)
16 workers: 114.4 ms/tool (+332%)
Worker reload is always fast - Subsequent worker reloads (postfork) complete in ~17-22 seconds regardless of worker count.
Recommendations
Disable parallel loading for web pod - Serial loading (18 min) is faster than any parallel configuration (22-27 min).
If parallel loading is needed, use 8 workers - This provides the best balance of speedup and contention.
Investigate web pod serial phase - The ~50-60x slowdown in serial tool creation for the web pod is the primary issue. Profile this phase to identify what additional work the web pod performs.
Optimize slowest tools - scanpy_plot (7 versions), picard, and text_processing account for significant parsing time.
Summary Table
Configuration
Job Handler Startup
Web Pod Startup
Web Pod Overhead vs Serial
Serial (no parallel)
45.8 sec
18.0 min
baseline
4 workers
57.7 sec
24.4 min
+36%
8 workers
55.1 sec
22.0 min
+22%
16 workers
60.5 sec
27.5 min
+53%
Next Steps
Profile the web pod's serial tool creation phase to identify the specific bottleneck
Investigate whether validation can be deferred or disabled for initial startup
Consider lazy-loading tools that aren't immediately needed
Test with local tool storage instead of CVMFS to isolate I/O impact
Consider disabling parallel tool loading entirely for production deployments