Ingestion (Stage A)

Syllable spectrogram extraction from raw audio and segmentation files (Stage A).

This module handles the first stage of the song phenotyping pipeline: reading segmentation metadata (evsonganaly .wav.not.mat batch files or WhisperSeg .wav.not.mat metadata files), locating the corresponding audio, computing short-time Fourier transform spectrograms for each labelled syllable, and saving the results to HDF5 files.

Two segmentation formats are supported:

evsonganaly

Produced by the EvSongAnaly MATLAB package. Audio and metadata (.wav.not.mat) files are co-located under dated subdirectories; a batch.txt.keep file lists the valid recordings.

wseg / WhisperSeg

Metadata (.wav.not.mat) files live in a <bird>/song/ hierarchy separate from the audio. The fname field inside each metadata file points to the original audio path.

Public API

song_phenotyping.ingestion.copy_audio_and_partner_rec(audio_path, copied_data_dir)[source]

Copy the audio file to copied_data_dir and also find/copy its .rec partner.

An existing local copy is reused if it appears identical to the source (same size and source mtime not newer than local mtime by more than 1 s). If the local copy looks stale or is missing it is overwritten.

Returns (local_audio_path, local_rec_path). local_rec_path is None when no matching .rec file can be found.

Parameters:
  • audio_path (str)

  • copied_data_dir (str)

Return type:

tuple[str | None, str | None]

song_phenotyping.ingestion.create_empty_segmented_data()[source]

Create empty segmented data structure.

Return type:

Dict[str, Any]

song_phenotyping.ingestion.create_segmented_audio_data(specs, wavs, ts, onsets, offsets, labels, tempos, valid_indices, file_identifier, inst_freq_list=None, group_delay_list=None)[source]

Create organized segmented audio data structure from processing results.

Args:

specs: List of spectrogram arrays wavs: List of waveform arrays ts: List of time reference arrays onsets: Array of onset times offsets: Array of offset times labels: Array of syllable labels tempos: Dict from tempo_estimates (or None); saved as-is into the HDF5 valid_indices: Indices of successfully processed syllables file_identifier: Base string for generating unique hashes

Returns:

Dictionary with organized segmented audio data

Parameters:
Return type:

Dict[str, Any]

song_phenotyping.ingestion.filepaths_from_evsonganaly(wav_directory=None, save_path=None, batch_file_naming='batch.txt.keep', bird_subset=None, copy_locally=False, preferred_subdirs=None)[source]

Discover file paths from evsonganaly batch.txt.keep files.

Walks wav_directory recursively, finds batch.txt.keep files, and extracts paired metadata (.wav.not.mat) and audio (.wav) paths for each bird. Bird IDs are detected via a letter–digit pattern applied to the directory path components.

When copy_locally is True the function first checks whether a populated local cache already exists under save_path. If so, it uses that directly (no server access needed). Otherwise it scans wav_directory, copies audio and metadata to save_path, and returns the local paths. Existing local files are only overwritten when the source appears to have changed (different size or newer mtime).

Parameters:
  • wav_directory (str) – Root directory containing dated subdirectories with audio and .wav.not.mat files (e.g. or18or24/18-08-2023/).

  • save_path (str, optional) – If provided, bird output subdirectories are created here.

  • batch_file_naming (str, optional) – Name (or substring) of the batch file. Default is 'batch.txt.keep'.

  • bird_subset (list of str, optional) – Restrict discovery to these bird IDs. None returns all birds.

  • copy_locally (bool, optional) – If True, copy audio and metadata files to save_path and use those local paths for all downstream processing. On subsequent runs with the same save_path the local cache is reused automatically. Default is False.

  • preferred_subdirs (list of str, optional) – If given, only scan directories whose name matches one of these values. None scans all subdirectories.

Returns:

  • metadata_file_paths (dict mapping str to list of str) – {bird_id: [path_to_not_mat_file, ...]}.

  • audio_file_paths (dict mapping str to list of str) – {bird_id: [path_to_wav_file, ...]}.

Return type:

tuple[dict[str, list[str]], dict[str, list[str]]]

See also

save_specs_for_evsonganaly_birds

Run Stage A using paths returned by this function.

song_phenotyping.ingestion.filepaths_from_local_cache(save_path, bird_subset=None)[source]

Discover metadata file paths from local cache by reading audio_paths.txt files. Returns LOCAL metadata paths for truly offline operation.

Parameters:
  • save_path (str)

  • bird_subset (list)

Return type:

Tuple[Dict[str, List[str]], Dict[str, List[str]]]

song_phenotyping.ingestion.filepaths_from_wseg(seg_directory, save_path=None, song_or_call='song', file_ext='.wav.not.mat', bird_subset=None, copy_locally=False, wav_root=None)[source]

Discover WhisperSeg metadata file paths organised by bird ID.

Walks seg_directory recursively, collecting .wav.not.mat metadata files from subdirectories whose path contains song_or_call. Bird IDs are inferred from the directory two levels above the song/ folder (i.e. the structure <seg_directory>/<bird>/song/*.wav.not.mat).

When copy_locally is True the function first checks whether a populated local cache already exists under save_path. If so, it uses that directly (no server access needed). Otherwise it scans seg_directory, copies audio and metadata to save_path, and returns the local paths. Existing local files are only overwritten when the source appears to have changed (different size or newer mtime).

Parameters:
  • seg_directory (str) – Root directory to scan (e.g. metadata/). Must follow the layout <seg_directory>/<bird>/song/.

  • save_path (str, optional) – If provided, bird subdirectories are created here. Required when copy_locally is True.

  • song_or_call (str, optional) – Subdirectory name to match — 'song' (default) or 'call'.

  • file_ext (str, optional) – Metadata file extension. Default is '.wav.not.mat'.

  • bird_subset (list of str, optional) – Restrict discovery to these bird IDs. None returns all birds.

  • copy_locally (bool, optional) – If True, copy audio and metadata files to save_path and use those local paths for all downstream processing. On subsequent runs with the same save_path the local cache is reused automatically. Default is False.

  • wav_root (str)

Returns:

  • metadata_file_paths (dict mapping str to list of str) – {bird_id: [path_to_metadata_file, ...]}.

  • audio_file_paths (dict mapping str to list of str) – {bird_id: [path_to_audio_file, ...]}. Populated only when copy_locally is True; otherwise an empty list per bird.

Return type:

Tuple[Dict[str, List[str]], Dict[str, List[str]]]

See also

save_specs_for_wseg_birds

Run Stage A using paths returned by this function.

song_phenotyping.ingestion.main()[source]

Main processing pipeline with configurable parameters.

song_phenotyping.ingestion.process_and_save_audio(audio_file_path, output_path, metadata, params, split_syllables=False, verbose=False, save_manual=True)[source]

Process audio file and save segmented data with progress tracking. Updated to use consolidated ProcessingResult.

Parameters:
Return type:

bool

song_phenotyping.ingestion.process_pipeline(pipeline_name, settings)[source]

Process a single pipeline (evsonganaly or wseg).

Parameters:
  • pipeline_name (str)

  • settings (dict)

song_phenotyping.ingestion.process_single_file(metadata_file_path, audio_file_path, save_path, params, read_songpath_from_metadata, verbose, prefer_local=True, run_name='default', save_manual=True, bird_name=None)[source]

Process a single metadata file and save spectrograms if conditions are met.

Args:

prefer_local: If True, prefer local audio files over server files bird_name: If provided, use this as the bird ID for output path construction

instead of parsing it from the audio filename. Needed when audio files are named after tutor birds rather than the foster bird being processed.

Parameters:
Return type:

Dict[str, str]

song_phenotyping.ingestion.reconstruct_server_path(stored_path)[source]

Reconstruct full server path from stored relative path using current platform.

Parameters:

stored_path (str)

Return type:

str

song_phenotyping.ingestion.resolve_audio_file_path(metadata_file_path, metadata_matfile, read_songpath_from_metadata, bird_folder=None, prefer_local=True)[source]

Resolve the path to the audio file and return offset.

Args:

bird_folder: Path to bird folder for audio path mapping (optional) prefer_local: If True and bird_folder provided, prefer local files

Returns:

tuple: (audio_file_path, wseg_offset) or (None, offset) if file not found

Parameters:
  • metadata_file_path (str)

  • metadata_matfile (dict)

  • read_songpath_from_metadata (bool)

  • bird_folder (str)

  • prefer_local (bool)

Return type:

tuple[str, float]

song_phenotyping.ingestion.save_data_specs(candidate_files, save_path, params, verbose=False, read_songpath_from_metadata=True, prefer_local=True, run_name='default', save_manual=True, bird_name=None, max_workers=None)[source]

Process metadata files and save spectrograms to HDF5 files with detailed progress tracking.

bird_namestr, optional

If provided, use this name for output path construction instead of parsing it from the audio filename. Required for cross-foster data where audio files are named after tutor birds rather than the foster bird being processed.

max_workersint, optional

Number of parallel worker processes. Defaults to half of available CPU cores (conservative, since each worker does I/O plus FFT computation). Pass 1 to disable parallelism.

Parameters:
Return type:

Dict[str, List[str]]

song_phenotyping.ingestion.save_specs_for_evsonganaly_birds(metadata_file_paths, audio_file_paths, save_path=None, songs_per_bird=5, params=None, verbose=False, songs_seed=None, run_name='default')[source]

Run Stage A for evsonganaly birds: extract and save syllable spectrograms.

For each bird in metadata_file_paths, selects up to songs_per_bird unprocessed recordings, computes syllable spectrograms, and saves them as HDF5 files under <save_path>/<bird>/syllable_data/specs/.

Parameters:
  • metadata_file_paths (dict mapping str to list of str) – {bird_id: [path_to_not_mat_file, ...]}, as returned by filepaths_from_evsonganaly().

  • audio_file_paths (dict mapping str to list of str or None) – {bird_id: [path_to_wav_file, ...]}. If None, audio paths are resolved from the metadata files directly.

  • save_path (str) – Root output directory. Bird subdirectories are created automatically.

  • songs_per_bird (int or None, optional) – Maximum number of songs to process per bird. None processes all available recordings. Default is 5.

  • params (SpectrogramParams, optional) – Spectrogram computation parameters. Defaults to SpectrogramParams.

  • verbose (bool, optional) – Enable verbose per-file logging. Default is False.

  • songs_seed (int or None, optional) – Random seed for song subset selection. None (default) gives non-deterministic selection; set an integer for reproducible subsets.

  • run_name (str)

Notes

Already-processed songs are detected by counting files in the output specs/ directory; only the remaining quota is processed. Re-running is therefore safe and incremental.

See also

filepaths_from_evsonganaly

Discover input file paths.

save_specs_for_wseg_birds

Equivalent function for WhisperSeg data.

song_phenotyping.ingestion.save_specs_for_wseg_birds(metadata_file_paths, audio_file_paths, save_path, songs_per_bird=20, params=None, verbose=False, copy_locally=False, songs_seed=None, run_name='default')[source]

Run Stage A for WhisperSeg birds: extract and save syllable spectrograms.

For each bird in metadata_file_paths, resolves audio paths from the embedded fname field in each .wav.not.mat file, selects up to songs_per_bird unprocessed recordings, computes syllable spectrograms, and saves them as HDF5 files under <save_path>/<bird>/syllable_data/specs/.

Parameters:
  • metadata_file_paths (dict mapping str to list of str) – {bird_id: [path_to_not_mat_file, ...]}, as returned by filepaths_from_wseg().

  • audio_file_paths (dict mapping str to list of str) – {bird_id: [path_to_wav_file, ...]}. Populated when copy_locally was True in filepaths_from_wseg(); otherwise pass an empty-list dict and audio is resolved from metadata.

  • save_path (str) – Root output directory. Bird subdirectories are created automatically.

  • songs_per_bird (int or None, optional) – Maximum number of songs to process per bird. None processes all available recordings. Default is 20.

  • params (SpectrogramParams, optional) – Spectrogram computation parameters. Defaults to SpectrogramParams.

  • verbose (bool, optional) – Enable verbose per-file logging. Default is False.

  • copy_locally (bool, optional) – If True, audio_file_paths contains local copies (as populated by filepaths_from_wseg() with copy_locally=True) and those paths are used directly. Default is False.

  • songs_seed (int or None, optional) – Random seed for song subset selection. None (default) gives non-deterministic selection; set an integer for reproducible subsets.

  • run_name (str)

Notes

Already-processed songs are detected by counting files in the output specs/ directory; only the remaining quota is processed.

See also

filepaths_from_wseg

Discover input file paths.

save_specs_for_evsonganaly_birds

Equivalent function for evsonganaly data.

song_phenotyping.ingestion.select_new_file_pairs(available_metadata_files, available_audio_files, already_saved_files, needed_count, seed=None)[source]

Return up to needed_count (metadata_path, audio_path) pairs whose base names are not present in already_saved_files. Matching is done by filename stem (filename without extension). If a metadata file has no matching audio file, it is skipped and a warning is logged.

Parameters:
  • seed (int or None) – Random seed for reproducible subset selection. None (default) uses system entropy so different songs may be chosen each run. Set an integer to always select the same subset from a larger library.

  • available_metadata_files (list[str])

  • available_audio_files (list[str])

  • already_saved_files (list[str])

  • needed_count (int)

Return type:

list[tuple[str, str]]

song_phenotyping.ingestion.select_new_files(available_metadata_files, already_saved_files, needed_count)[source]

Select files that haven’t been processed yet.

Parameters:
  • available_metadata_files (list[str])

  • already_saved_files (list[str])

  • needed_count (int)

Return type:

list[str]

song_phenotyping.ingestion.select_wseg_file_pairs_from_metadata(metadata_files, already_saved_files, needed_count)[source]

For wseg files, extract audio paths from metadata and create pairs.

Parameters:
Return type:

List[Tuple[str, str]]

song_phenotyping.ingestion.standardize_bird_band(band_string)[source]

Convert bird band strings from various formats to standardized ‘co#co#’ format.

Args:

band_string (str): Bird band identifier in various formats

Returns:

str: Standardized band string in ‘co#co#’ format, or None if invalid