Search and Discovery

Nomad Media supports several layers of search and discovery, from classic metadata and keyword search to transcript-driven discovery and richer multimodal AI search.

The most important concept is that different search experiences are powered by different outputs. A transcript, a subtitle file, an image description, and a vector-based multimodal index are related, but they are not interchangeable.

Programmatic search: To run these searches from code, use the SDK search method — see search and Real-world search patterns.

Search Layers in Nomad Media

Search layer	What it searches	Typical use case
Metadata / filename search	Titles, metadata fields, filenames, tags	Structured search when assets are already well-described
Transcript full-text search	Spoken words transcribed from audio/video	Finding an interview, meeting, podcast, or speech by what was said
Image search	Visual descriptions, detected text, concepts from images	Finding still images by what appears in them
Deep video search	Visual and spoken content with time-coded relevance	Finding the exact segment where a scene, action, or discussion happens

What Happens When Transcription Is Enabled

When AI transcription is enabled for audio or video:

the audio is extracted and transcribed
the transcript text is indexed for search
a VTT subtitle file is generated for playback in web players
an SRT subtitle file can optionally be created for other downstream workflows

This means transcription improves two things at once:

searchability of spoken content
playback usability through subtitles

Important: it is the indexed transcript text that makes content searchable, not the presence of the VTT file by itself.

Subtitle Files vs. Search Indexes

These are often confused, so it is worth separating them clearly.

Subtitle outputs

VTT is the web-native subtitle format used by web video players
SRT is a closely related subtitle format commonly used in editing and delivery workflows
subtitle files are valuable for playback, accessibility, review, and export

Search outputs

transcript text is indexed into the search layer
once indexed, users can search the spoken content as free text
richer AI layers can be added later on top of the same content

If you need the file format details for manually uploaded subtitle files, see Metadata Management.

OpenSearch-Based Transcript Search

Once transcripts are indexed, their contents become searchable as free text.

This is the main search benefit of enabling transcription for audio and video libraries.

What it is good at

locating assets based on words or phrases that were spoken
expanding search beyond manually entered metadata
helping users discover relevant content even when filenames are weak

Important behavior to understand

transcript search is still a search index, not a verbatim transcript viewer
small or common words may be ignored during indexing
indexing behavior is language-aware
in non-LLM search flows, users usually get the best results when they search in the same language as the indexed transcript

This is why transcript search is powerful, but still different from richer multimodal or vector-based search.

LLM-Enhanced Search on Transcript Content

After baseline transcription is working, richer AI search can be layered on top.

This generally enables:

more natural-language queries
concept-oriented retrieval rather than exact-word matching only
better matching of descriptive searches
segment-level relevance in supported experiences

For example, instead of searching only for exact quoted words, users can search in a more conversational way such as "find the interview where they discuss budget cuts and staff reductions."

Language behavior

Multimodal and vector-based search can reduce some of the friction of strict same-language keyword search. In many deployments this improves discovery across languages, although it is not perfect and should still be tested with representative content.

Image Search

Image enrichment expands search from text and spoken words into still imagery.

Depending on the enabled processors, image search can use:

generated visual descriptions
detected text inside the image
object and concept detection
richer multimodal understanding for natural-language search

This lets users search for images using descriptive phrases rather than relying only on filenames or manual tags.

Deep Video Search

Deep video search extends the same idea into moving images.

Instead of only finding the asset, supported deep video search experiences can also help users jump toward the relevant moment in the video.

Typical capabilities include:

time-coded visual descriptions
time-coded on-screen text detection
concept detection over time
segment-oriented search results

This is one of the most impressive AI capabilities in demos, but it is usually rolled out after transcription because it is a larger cost decision.

Common Search Limitations

Music-heavy content

Speech transcription models perform poorly on music-heavy material. Songs, dense mixes, and heavily stylized audio often produce weak transcripts, which means weak transcript search.

First audio track matters

For AI transcription workflows, the system analyzes the first audio track from the proxy used for transcription. If your source content has multiple separated tracks, transcription quality may depend on proxy/transcode configuration. See Proxy Generation Overview and Transcoding.

Subtitle output is not the same as closed captions

Auto-generated subtitle outputs are primarily subtitle files for playback. If your workflow requires broadcast-style closed-caption behavior or highly specialized caption compliance, validate the output requirements separately.

Search quality depends on content quality

Clear speech, representative languages, readable on-screen text, and clean source media all improve results. Low-quality source media leads to lower-quality search outputs.

Best Practice: Test Before Broad Rollout

Before enabling richer AI search across a full catalog, create a test bed with representative:

audio files
video files
image files
different languages
clear and difficult examples

Then run the same searches across each AI tier and compare the difference in results. See Phased AI Rollout.

AI Enablement Guide — rollout order, costs, dependencies, and capability planning
Phased AI Rollout — how to test and stage AI before go-live
Metadata Management — subtitle file handling and VTT/SRT details
Proxy Generation Overview — proxy dependencies that affect transcription and playback
Rules Engine Overview — folder and rule scoping for AI processing
SDK search — run searches programmatically with the SDK
Real-world search patterns — recipe combining filters, sorting, and pagination