Usage Guide

This page describes the full workflow from raw audio to scored results, including the prescreen quality-control step.


Preparing a pairings file

Create a plain CSV listing the nest-father × genetic-father pairings you want to score. A header row is optional.

nest_father,genetic_father
pk24bu3,wh88br85
ab12cd3,ef45gh6

Save this as e.g. pairings.csv in the project root.


Single-family workflow

Use this when processing one pairing at a time.

Step 1 — Build the batch

python prepare_batch.py --nest-father pk24bu3 --genetic-father wh88br85

Writes batches/pk24bu3_wh88br85_<YYYYMMDD>/batch.h5.

Options:

  • --snippets-per-bird N — override config (default 6)

  • --workers N — parallel spectrogram workers

  • --exclude-csv path/to/prescreen_<date>.csv — skip snippets labelled not_song or rendering_error in a prior prescreen run

  • --existing-batch path/to/batch.h5 — carry over valid snippets from a prior run and only compute the shortfall per bird (use with --exclude-csv)

Step 2 — Export spectrograms and audio

python export_batch.py batches/pk24bu3_wh88br85_20260414

Writes PNG spectrograms and WAV clips to batches/.../export/.

Options:

  • --dpi 150 — higher-resolution PNGs

  • --no-audio — skip WAV export

  • --force — overwrite existing files

  • --workers N

Step 3 — Prescreen the batch

Run the prescreen app locally to label every spectrogram before expert scorers see the batch. Use keyboard shortcuts for fast review.

python prescreen_app.py batches/pk24bu3_wh88br85_20260414

Open http://localhost:5001 in a browser. For each spectrogram press:

  • S — song (keep)

  • N — not song (exclude)

  • E — rendering error (exclude)

Labels are saved to batches/.../prescreen_<YYYYMMDD>.csv after each decision. Close the browser and restart to resume mid-session.

Step 4 — Rebuild excluding non-song snippets, topping up where possible

Pass both --exclude-csv and --existing-batch so the script carries over valid snippets from the Phase 1 batch and only recomputes the shortfall.

python prepare_batch.py --nest-father pk24bu3 --genetic-father wh88br85 \
    --exclude-csv batches/pk24bu3_wh88br85_20260414/prescreen_20260414.csv \
    --existing-batch batches/pk24bu3_wh88br85_20260414/batch.h5

What this does:

  • Loads all valid (non-excluded) snippets from the existing HDF5 — no recomputation

  • For each bird below the target count, samples additional positions from audio files, respecting the min-gap constraint against existing positions

  • Writes a new HDF5 combining carried-over and newly computed snippets

Then re-export the cleaned batch:

python export_batch.py batches/pk24bu3_wh88br85_20260414 --force

When using run_pipeline.py --phase 2, the --existing-batch flag is passed automatically — no manual intervention needed.

Step 5 — Run the scoring app

python ranking_app.py batches/pk24bu3_wh88br85_20260414

Open http://localhost:5000/. Enter a scorer name, choose a trait, and drag song cards into ranked order. Rankings are saved after every round.

For remote scoring over EC2, see EC2 deployment for remote scorers.

Step 6 — Analyze rankings

python analyze_rankings.py batches/pk24bu3_wh88br85_20260414 --trait all

When --scoring-mode is given, the mode is included in all output filenames (e.g. <batch>_same_tutor_<trait>_elo.csv) so results from different modes do not overwrite each other.

Options:

  • --trait <name>stereotypy, repeat_propensity, or all (default)

  • --scoring-mode <mode>same_tutor or all; loads sessions from sessions/<mode>/ and prefixes outputs with the mode name

  • --k <float> — Elo K-factor (default 32)

  • --min-rounds <n> — warn if a scorer has fewer than n rounds

  • --no-plots — skip PNG plot generation

  • --output-dir <path> — override the default results/ directory

Outputs written to results/:

File

Description

<stem>_<trait>_elo.csv

Per-snippet Elo scores, sorted highest to lowest

<stem>_<trait>_birds.csv

Per-bird mean Elo ± SD, sorted by role then score

<stem>_<trait>_consistency.csv

Per-snippet rank consistency: mean and SD of normalised rank position across all (scorer, round) appearances

<stem>_<trait>_irr.csv

Pairwise Kendall τ-b between scorers on shared UIDs

<stem>_<trait>_summary.txt

Human-readable digest: role means, offspring vs. father gaps, IRR

<stem>_flagged.csv

Noise/call snippets flagged during scoring

<stem>_<trait>_bird_elo.png

Bar chart of per-bird mean Elo ± SD, coloured by role

<stem>_<trait>_snippet_elo.png

Strip plot of per-snippet Elo by role with median line

<stem>_<trait>_rank_consistency.png

Scatter: mean normalised rank vs. SD, one point per snippet

<stem>_<trait>_scorer_agreement.png

Pairwise scorer rank scatter with Kendall τ annotation

<stem> is <batch_id> when no scoring mode is given, or <batch_id>_<scoring_mode> otherwise.

Interactive exploration

Open explore_rankings.ipynb in Jupyter for an interactive walkthrough of all analyses. Set BATCH_DIR, SCORING_MODE, and TRAIT at the top and run all cells. The notebook covers:

  • Per-bird and per-snippet ranking tables

  • All four plots inline

  • Within-scorer vs. across-scorer consistency breakdown

  • Side-by-side scoring-mode comparison (same_tutor vs all)


Multi-family workflow

Use run_pipeline.py to process many pairings with a single command. The two-phase design pauses between build and rebuild so you can prescreen each batch in between.

Phase 1 — Build and export all batches

python run_pipeline.py pairings.csv --phase 1

For each pairing this runs prepare_batch.py then export_batch.py. At the end it prints the prescreen_app.py command for each batch.

Prescreen — review each batch

Run the prescreen app on each batch and label all spectrograms before continuing. Phase 2 will skip any pairing that has no prescreen CSV.

Phase 2 — Rebuild with exclusions and re-export

python run_pipeline.py pairings.csv --phase 2

For each pairing this finds the most recent prescreen CSV, runs prepare_batch.py --exclude-csv, then re-runs export_batch.py. At the end it prints the scp commands to upload each batch to EC2.

Pass-through options (both phases):

  • --workers N

  • --snippets-per-bird N

  • --dpi N


EC2 deployment for remote scorers

After Phase 2, upload each clean batch to EC2 so expert scorers can access the app from any machine without a local installation.

# Upload batch (run in PowerShell on local machine)
scp -i "C:\Users\Eric\.ssh\scoring-key" -r `
  "E:\scoring\batches\<batch_name>" `
  ubuntu@<public-ip>:~/supervised_phenotype_scoring/batches/

# SSH in and launch the app
ssh -i "C:\Users\Eric\.ssh\scoring-key" ubuntu@<public-ip>
# On EC2
conda activate supervised_phenotype_scoring
cd ~/supervised_phenotype_scoring
nohup python ranking_app.py batches/<batch_name> > app.log 2>&1 &

Share http://<public-ip>:5000 with scorers. No installation required on their end.

After scoring, retrieve session files and analyze locally:

# Retrieve sessions (PowerShell)
scp -i "C:\Users\Eric\.ssh\scoring-key" -r `
  ubuntu@<public-ip>:~/supervised_phenotype_scoring/batches/<batch_name>/sessions `
  "E:\scoring\batches\<batch_name>\"
# Analyze locally
python analyze_rankings.py batches/<batch_name> --trait all

See docs/ec2_deployment_guide.txt for the full EC2 setup walkthrough.


Scoring app interface

  • Drag cards to rank songs left (most of trait) to right (least).

  • Click a spectrogram image to play its audio clip.

  • Click A or B on any card to load it into a side-by-side comparison panel below (interactive Plotly heatmap — scroll to zoom, drag to pan).

  • Click flag noise/call to exclude a snippet from the ranking. Flagged snippets are shown with a red border and excluded from Elo scoring.


Output files reference

File

Created by

Description

batch.h5

prepare_batch.py

HDF5 batch: spectrograms, audio, manifest

batch_index.json

prepare_batch.py

Public batch metadata (no bird identity)

prescreen_<date>.csv

prescreen_app.py

Per-snippet labels: song / not_song / rendering_error

export/spectrograms/<uid>.png

export_batch.py

Spectrogram PNGs (magma colorscale)

export/audio/<uid>.wav

export_batch.py

8-second audio clips

sessions/<scorer>_<trait>_<date>.json

ranking_app.py

Per-scorer ranking records

results/<stem>_<trait>_elo.csv

analyze_rankings.py

Per-snippet Elo scores

results/<stem>_<trait>_birds.csv

analyze_rankings.py

Per-bird mean Elo ± SD by role

results/<stem>_<trait>_consistency.csv

analyze_rankings.py

Per-snippet rank consistency stats

results/<stem>_<trait>_irr.csv

analyze_rankings.py

Pairwise Kendall τ between scorers

results/<stem>_<trait>_summary.txt

analyze_rankings.py

Human-readable digest

results/<stem>_flagged.csv

analyze_rankings.py

Noise/call snippets flagged during scoring

results/<stem>_<trait>_bird_elo.png

analyze_rankings.py

Per-bird Elo bar chart

results/<stem>_<trait>_snippet_elo.png

analyze_rankings.py

Per-snippet Elo strip plot by role

results/<stem>_<trait>_rank_consistency.png

analyze_rankings.py

Mean rank vs. SD scatter

results/<stem>_<trait>_scorer_agreement.png

analyze_rankings.py

Pairwise scorer rank scatter