Skip to content

Add separate beam defaults for alignment#464

Open
lenzo-ka wants to merge 4 commits intomainfrom
kal-beams
Open

Add separate beam defaults for alignment#464
lenzo-ka wants to merge 4 commits intomainfrom
kal-beams

Conversation

@lenzo-ka
Copy link
Copy Markdown
Contributor

@lenzo-ka lenzo-ka commented Apr 14, 2026

Summary

Adds separate configuration for forced-alignment FSG (ps_set_align_text, CLI align) so pruning does not use the same beam / pbeam / wbeam triple as LVCSR. Default alignment beams (align_beam, align_pbeam, align_wbeam, all 1e-48) avoid the fragile interaction where stock wbeam differed from beam. Adds align_use_main_beams to opt back into the main decoder beams when comparing behavior or debugging.

Motivation

Forced alignment uses a linear FSG (_align). Asymmetric word vs frame beams in the global defaults could produce empty or broken word segments and fsg_search errors (“Final result does not match the grammar”) even when audio, transcript, and lexicon match. Alignment-specific parameters keep LVCSR defaults unchanged while giving alignment a sensible, documentable default.

Changes

  • src/config_macro.h: align_beam, align_pbeam, align_wbeam (float); align_use_main_beams (bool, default no). Wired into POCKETSPHINX_OPTIONS.
  • src/fsg_search.c: For the _align search only, read align_* unless align_use_main_beams is set; other FSG searches unchanged.
  • Docs: include/pocketsphinx/search.h, README align subsection, doxygen/pocketsphinx.1(.in), cython/_pocketsphinx.pyx (set_align_text), usage_align in pocketsphinx_main.c.
  • Test: test/unit/test_align_fsg_beam.c + CMakeLists.txt entry (asserts _align FSG uses align_* under asymmetric wbeam, and main beams when align_use_main_beams is on).

Follow-up in the same change set: For the _align FSG only, bestpath is forced off (fsgs->bestpath = FALSE) after the usual config read. When global bestpath is yes, FSG hyp() otherwise uses the lattice bestpath string, which can be shorter than the forced transcript; the CLI align command already disables bestpath for the whole run, but the Python Decoder() does not, so cython/test/alignment_test.py::TestAlignment::test_default_lm failed in CI. Aligning FSG behavior with the CLI fixes that without requiring tests to tweak config.

How to review

  • Confirm fsg_search_init branch: _align + not align_use_main_beamsalign_*; else → beam / pbeam / wbeam (same as before for non-align FSG).
  • Skim docs for consistency with align_use_main_beams.
  • Optional local check: cmake --build build --target test_align_fsg_beam (or full check).

Testing

  • cmake --build <build> --target check (or CI)
  • test_align_fsg_beam passes

Follow-ups (not in this PR)

  • Optional: mirror align_* in GStreamer props if maintainers want parity there.
  • Remaining alignment failures on a corpus are likely OOV / text / audio, not beam triple (see separate experiments).

… FSG

Register align_beam, align_pbeam, align_wbeam (defaults 1e-48) and
align_use_main_beams (default no). fsg_search_init uses them for the
_align search only; other FSG and LVCSR paths unchanged.

Document CLI flags in usage_align.
Describe align_* vs main beams and align_use_main_beams in API header,
README, man page, and Python set_align_text.
Verify _align FSG uses align_* under asymmetric wbeam and main beams
when align_use_main_beams is enabled.
@lenzo-ka lenzo-ka requested a review from dhdaines April 14, 2026 14:45
When global bestpath is enabled, fsg_search_hyp() can return the lattice
bestpath string, which may be shorter than the forced transcript. The
align CLI already turns bestpath off; Decoder() does not. Force
fsgs->bestpath false for _align so ps_get_hyp matches the full path and
cython/test/alignment_test.py::test_default_lm passes.
Copy link
Copy Markdown
Contributor

@dhdaines dhdaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good - I presume the reason for this is so that you can use the same ps_decoder_t for both recognition and alignment?

Or is it just to get better defaults in alignment mode? In this case, I still think it's fine, though another option would simply be to ignore -pbeam and -wbeam when aligning. This might not be as obvious to the user, though. What do you think?

Comment thread cython/_pocketsphinx.pyx
segmentation in the usual manner. For phone-level alignment,
see `set_alignment` and `get_alignment`.

Pruning for this pass uses ``align_beam``, ``align_pbeam``, and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be single or double backquotes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants