Search and Discovery

How transcript search, OpenSearch indexing, subtitles, and multimodal AI search work in Nomad Media.

Search and Discovery

Nomad Media supports several layers of search and discovery, from classic metadata and keyword search to transcript-driven discovery and richer multimodal AI search.

The most important concept is that different search experiences are powered by different outputs. A transcript, a subtitle file, an image description, and a vector-based multimodal index are related, but they are not interchangeable.

Programmatic search: To run these searches from code, use the SDK search method — see search and Real-world search patterns.


Search Layers in Nomad Media

Search layerWhat it searchesTypical use case
Metadata / filename searchTitles, metadata fields, filenames, tagsStructured search when assets are already well-described
Transcript full-text searchSpoken words transcribed from audio/videoFinding an interview, meeting, podcast, or speech by what was said
Image searchVisual descriptions, detected text, concepts from imagesFinding still images by what appears in them
Deep video searchVisual and spoken content with time-coded relevanceFinding the exact segment where a scene, action, or discussion happens

What Happens When Transcription Is Enabled

When AI transcription is enabled for audio or video:

  1. the audio is extracted and transcribed
  2. the transcript text is indexed for search
  3. a VTT subtitle file is generated for playback in web players
  4. an SRT subtitle file can optionally be created for other downstream workflows

This means transcription improves two things at once:

  • searchability of spoken content
  • playback usability through subtitles

Important: it is the indexed transcript text that makes content searchable, not the presence of the VTT file by itself.


Subtitle Files vs. Search Indexes

These are often confused, so it is worth separating them clearly.

Subtitle outputs

  • VTT is the web-native subtitle format used by web video players
  • SRT is a closely related subtitle format commonly used in editing and delivery workflows
  • subtitle files are valuable for playback, accessibility, review, and export

Search outputs

  • transcript text is indexed into the search layer
  • once indexed, users can search the spoken content as free text
  • richer AI layers can be added later on top of the same content

If you need the file format details for manually uploaded subtitle files, see Metadata Management.


OpenSearch-Based Transcript Search

Once transcripts are indexed, their contents become searchable as free text.

This is the main search benefit of enabling transcription for audio and video libraries.

What it is good at

  • locating assets based on words or phrases that were spoken
  • expanding search beyond manually entered metadata
  • helping users discover relevant content even when filenames are weak

Important behavior to understand

  • transcript search is still a search index, not a verbatim transcript viewer
  • small or common words may be ignored during indexing
  • indexing behavior is language-aware
  • in non-LLM search flows, users usually get the best results when they search in the same language as the indexed transcript

This is why transcript search is powerful, but still different from richer multimodal or vector-based search.


LLM-Enhanced Search on Transcript Content

After baseline transcription is working, richer AI search can be layered on top.

This generally enables:

  • more natural-language queries
  • concept-oriented retrieval rather than exact-word matching only
  • better matching of descriptive searches
  • segment-level relevance in supported experiences

For example, instead of searching only for exact quoted words, users can search in a more conversational way such as "find the interview where they discuss budget cuts and staff reductions."

Language behavior

Multimodal and vector-based search can reduce some of the friction of strict same-language keyword search. In many deployments this improves discovery across languages, although it is not perfect and should still be tested with representative content.


Image Search

Image enrichment expands search from text and spoken words into still imagery.

Depending on the enabled processors, image search can use:

  • generated visual descriptions
  • detected text inside the image
  • object and concept detection
  • richer multimodal understanding for natural-language search

This lets users search for images using descriptive phrases rather than relying only on filenames or manual tags.


Deep Video Search

Deep video search extends the same idea into moving images.

Instead of only finding the asset, supported deep video search experiences can also help users jump toward the relevant moment in the video.

Typical capabilities include:

  • time-coded visual descriptions
  • time-coded on-screen text detection
  • concept detection over time
  • segment-oriented search results

This is one of the most impressive AI capabilities in demos, but it is usually rolled out after transcription because it is a larger cost decision.


Common Search Limitations

Music-heavy content

Speech transcription models perform poorly on music-heavy material. Songs, dense mixes, and heavily stylized audio often produce weak transcripts, which means weak transcript search.

First audio track matters

For AI transcription workflows, the system analyzes the first audio track from the proxy used for transcription. If your source content has multiple separated tracks, transcription quality may depend on proxy/transcode configuration. See Proxy Generation Overview and Transcoding.

Subtitle output is not the same as closed captions

Auto-generated subtitle outputs are primarily subtitle files for playback. If your workflow requires broadcast-style closed-caption behavior or highly specialized caption compliance, validate the output requirements separately.

Search quality depends on content quality

Clear speech, representative languages, readable on-screen text, and clean source media all improve results. Low-quality source media leads to lower-quality search outputs.


Best Practice: Test Before Broad Rollout

Before enabling richer AI search across a full catalog, create a test bed with representative:

  • audio files
  • video files
  • image files
  • different languages
  • clear and difficult examples

Then run the same searches across each AI tier and compare the difference in results. See Phased AI Rollout.


Related Pages