Summary Command
The summary command generates a consolidated quality control report from multiple samples analyzed with the collect command. qc does this step for you. If you want to run bactscout at scale on HPC or cloud, consider using the Scaling up Guide
Overview
After processing multiple samples individually or in batches, the summary command:
- Reads individual sample result CSVs
- Merges them into a single
final_summary.csv - Provides a unified view of all samples
- Enables easy comparison and filtering
Basic Usage
# Generate summary from QC output directory
pixi run bactscout summary bactscout_output/
# Specify custom output location
pixi run bactscout summary bactscout_output/ -o final_results/
# Use different configuration
pixi run bactscout summary bactscout_output/ -c config.yml
Arguments
| Argument | Required | Description |
|---|---|---|
input_dir |
✅ Yes | Directory containing sample subdirectories (output from collect) |
Options
| Option | Short | Default | Description |
|---|---|---|---|
--output |
-o |
bactscout_output |
Location to write final_summary.csv |
--config |
-c |
bactscout_config.yml |
Configuration file path |
Input Format
The summary command expects the standard output directory structure from collect:
bactscout_output/
├── Sample_001/
│ ├── sample_results.csv # Per-sample results
│ ├── fastp_report.json
│ ├── sylph_matches.csv
│ └── [other files]
├── Sample_002/
│ ├── sample_results.csv
│ └── ...
├── Sample_003/
│ ├── sample_results.csv
│ └── ...
└── final_summary.csv # Output file (created by summary command)
Output
final_summary.csv
A consolidated CSV file with one row per sample, containing:
sample_id— Sample identifiera_final_status— Overall QC status (PASSED / WARNING / FAILED)species— Detected species name(s) (top species or semicolon-separated list)species_coverage— Sylph-derived coverage values for reported speciescontamination_status— Contamination QC status (PASSED/WARNING/FAILED)coverage_status— Coverage QC status (PASSED/WARNING/FAILED)mlst_status— MLST QC status (PASSED/WARNING)read_q30_rate— Fraction (0-1) of bases with Q≥30duplication_rate— Read duplication rate (0-1)gc_content— GC content percentage
See Output Format for detailed column descriptions.
Usage Scenarios
Workflow: Individual Processing + Summary
Process all samples, then generate report:
# 1. Run QC on all samples
pixi run bactscout collect samples/ -o batch_results/ -t 4
# 2. Generate summary
pixi run bactscout summary batch_results/ -o batch_results/
Result: batch_results/final_summary.csv with all results
Troubleshooting
"No results found"
Ensure:
- Directory contains sample subdirectories
- Each subdirectory has sample_results.csv
- Path is correct and accessible
"File permission denied"
Check permissions:
ls -l bactscout_output/
Empty Summary
Verify sample directories exist and contain results:
find bactscout_output/ -name "sample_results.csv"
See Troubleshooting Guide for more help.