Usage guide

The pipeline runs in five sequential stages (A → E). Each stage reads the output of the previous one, so they must be run in order. A convenience script run_pipeline.py at the project root chains all five stages for one or more birds.


Quick start

Edit the paths at the top of run_pipeline.py, then run:

python run_pipeline.py

Outputs land under SAVE_PATH/<bird>/ on your local drive.


Stage A — Spectrogram ingestion

Stage A reads raw audio recordings and their segmentation metadata, computes a mel spectrogram for each detected syllable, and saves the results to <bird>/syllable_data/specs/syllables_<song_id>.h5.

Two annotation formats are supported:

evsonganaly (batch.txt.labeled metadata files):

from song_phenotyping.ingestion import (
    filepaths_from_evsonganaly,
    save_specs_for_evsonganaly_birds,
)

meta, audio = filepaths_from_evsonganaly(
    wav_directory="/data/raw/evsonganaly",
    bird_subset=["or18or24"],          # omit to process all birds
)
save_specs_for_evsonganaly_birds(
    metadata_file_paths=meta,
    audio_file_paths=audio,
    save_path="/data/pipeline_runs",
    songs_per_bird=50,                 # None = all songs
)

wseg (<bird>/song/*.wav.not.mat segmentation files):

from song_phenotyping.ingestion import (
    filepaths_from_wseg,
    save_specs_for_wseg_birds,
)

meta, audio = filepaths_from_wseg(
    seg_directory="/data/raw/wseg",
    bird_subset=["bu78bu77"],
)
save_specs_for_wseg_birds(
    metadata_file_paths=meta,
    audio_file_paths=audio,
    save_path="/data/pipeline_runs",
    songs_per_bird=50,
)

Each output HDF5 file contains the arrays spectrograms, manual, position_idxs, and hashes.

Stage A1 — Fixed-length slicing (optional)

If you prefer equal-duration windows instead of boundary-segmented syllables, use song_phenotyping.slicing as a drop-in replacement for Stage A:

from song_phenotyping.slicing import slice_syllable_files_from_evsonganaly

slice_syllable_files_from_evsonganaly(
    wav_directory="/data/raw/evsonganaly",
    save_path="/data/pipeline_runs",
    slice_length=50,   # ms
)

Stage B — Flattening

Stage B reshapes each (n_syllables, n_freq, n_time) spectrogram stack into a (n_features, n_syllables) matrix ready for UMAP:

from song_phenotyping.flattening import flatten_bird_spectrograms

flatten_bird_spectrograms(
    directory="/data/pipeline_runs",
    bird="or18or24",
)

Output: <bird>/syllable_data/flattened/flattened_<song_id>.h5


Stage C — UMAP embedding

Stage C projects the high-dimensional feature vectors into 2-D using UMAP, exploring a grid of hyperparameters:

from song_phenotyping.embedding import explore_embedding_parameters_robust

explore_embedding_parameters_robust(
    save_path="/data/pipeline_runs",
    bird="or18or24",
    n_neighbors_list=[10, 30, 50],
    min_dists=[0.1, 0.3],
    metrics=["euclidean", "cosine"],
)

One .h5 file per parameter combination is written to <bird>/syllable_data/embeddings/. Each file contains arrays embeddings, hashes, and labels.


Stage D — Clustering and labelling

Stage D runs HDBSCAN over every embedding, evaluates each clustering with a set of internal metrics, and writes a ranked master_summary.csv:

from song_phenotyping.labelling import label_bird, DEFAULT_HDBSCAN_GRID

label_bird(
    save_path="/data/pipeline_runs",
    bird="or18or24",
    metrics=["silhouette", "dbi", "ch"],
    hdbscan_params=[p.to_dict() for p in DEFAULT_HDBSCAN_GRID],
)

Cluster label files are written to <bird>/syllable_data/labelling/<umap_id>/.

Custom HDBSCAN grid

from song_phenotyping.labelling import HDBSCANParams

my_grid = [
    HDBSCANParams(min_cluster_size=5,  min_samples=2),
    HDBSCANParams(min_cluster_size=10, min_samples=3),
    HDBSCANParams(min_cluster_size=20, min_samples=5),
]
label_bird(..., hdbscan_params=[p.to_dict() for p in my_grid])

Stage E — Phenotyping

Stage E reads the top-ranked clusterings and computes phenotypic measures (vocabulary, entropy, transition matrices, repeat patterns):

from song_phenotyping.phenotyping import phenotype_bird, PhenotypingConfig

cfg = PhenotypingConfig(
    use_top_n_clusterings=5,
    generate_plots=True,
)
phenotype_bird(
    bird_path="/data/pipeline_runs/or18or24",
    config=cfg,
)

Outputs:

  • phenotype_results.csv — one row per clustering rank with all phenotype columns.

  • syllable_data/phenotype_detailed/automated_phenotype_data_rank*.pkl — full data structures for downstream PDF generation.


Visual inspection — HTML catalogs

After Stage E you can generate interactive HTML catalogs for visual inspection of the clustering results:

from song_phenotyping.catalog import generate_all_catalogs

results = generate_all_catalogs(
    bird_path="/data/pipeline_runs/or18or24",
    rank=0,   # use the best-ranked clustering
)
print(results["song_catalog"])         # continuous song view
print(results["syllable_types_auto"])  # per-label spectrogram grid

HTML files are written to <bird>/syllable_data/html/. Open them in any browser — no server required.


Configuration

Path configuration is loaded from config.yaml via ProjectConfig:

from song_phenotyping.tools.project_config import ProjectConfig

cfg = ProjectConfig.load()

# Resolve a bird directory on the local cache drive
bird_path = cfg.bird_dir("or18or24", experiment="evsong test")

# Access the Macaw server root (None if not mounted)
print(cfg.macaw_root)

Spectrogram parameters are controlled by SpectrogramParams. The defaults match the settings used for Bengalese finch recordings but can be overridden:

from song_phenotyping.tools.spectrogram_configs import SpectrogramParams

params = SpectrogramParams(
    min_freq=500,
    max_freq=8000,
    song_gap=200,    # ms between songs
)