Search and Discovery
How transcript search, OpenSearch indexing, subtitles, and multimodal AI search work in Nomad Media.
Search and Discovery
Nomad Media supports several layers of search and discovery, from classic metadata and keyword search to transcript-driven discovery and richer multimodal AI search.
The most important concept is that different search experiences are powered by different outputs. A transcript, a subtitle file, an image description, and a vector-based multimodal index are related, but they are not interchangeable.
Programmatic search: To run these searches from code, use the SDK
searchmethod — seesearchand Real-world search patterns.
Search Layers in Nomad Media
| Search layer | What it searches | Typical use case |
|---|---|---|
| Metadata / filename search | Titles, metadata fields, filenames, tags | Structured search when assets are already well-described |
| Transcript full-text search | Spoken words transcribed from audio/video | Finding an interview, meeting, podcast, or speech by what was said |
| Image search | Visual descriptions, detected text, concepts from images | Finding still images by what appears in them |
| Deep video search | Visual and spoken content with time-coded relevance | Finding the exact segment where a scene, action, or discussion happens |
What Happens When Transcription Is Enabled
When AI transcription is enabled for audio or video:
- the audio is extracted and transcribed
- the transcript text is indexed for search
- a VTT subtitle file is generated for playback in web players
- an SRT subtitle file can optionally be created for other downstream workflows
This means transcription improves two things at once:
- searchability of spoken content
- playback usability through subtitles
Important: it is the indexed transcript text that makes content searchable, not the presence of the VTT file by itself.
Subtitle Files vs. Search Indexes
These are often confused, so it is worth separating them clearly.
Subtitle outputs
- VTT is the web-native subtitle format used by web video players
- SRT is a closely related subtitle format commonly used in editing and delivery workflows
- subtitle files are valuable for playback, accessibility, review, and export
Search outputs
- transcript text is indexed into the search layer
- once indexed, users can search the spoken content as free text
- richer AI layers can be added later on top of the same content
If you need the file format details for manually uploaded subtitle files, see Metadata Management.
OpenSearch-Based Transcript Search
Once transcripts are indexed, their contents become searchable as free text.
This is the main search benefit of enabling transcription for audio and video libraries.
What it is good at
- locating assets based on words or phrases that were spoken
- expanding search beyond manually entered metadata
- helping users discover relevant content even when filenames are weak
Important behavior to understand
- transcript search is still a search index, not a verbatim transcript viewer
- small or common words may be ignored during indexing
- indexing behavior is language-aware
- in non-LLM search flows, users usually get the best results when they search in the same language as the indexed transcript
This is why transcript search is powerful, but still different from richer multimodal or vector-based search.
LLM-Enhanced Search on Transcript Content
After baseline transcription is working, richer AI search can be layered on top.
This generally enables:
- more natural-language queries
- concept-oriented retrieval rather than exact-word matching only
- better matching of descriptive searches
- segment-level relevance in supported experiences
For example, instead of searching only for exact quoted words, users can search in a more conversational way such as "find the interview where they discuss budget cuts and staff reductions."
Language behavior
Multimodal and vector-based search can reduce some of the friction of strict same-language keyword search. In many deployments this improves discovery across languages, although it is not perfect and should still be tested with representative content.
Image Search
Image enrichment expands search from text and spoken words into still imagery.
Depending on the enabled processors, image search can use:
- generated visual descriptions
- detected text inside the image
- object and concept detection
- richer multimodal understanding for natural-language search
This lets users search for images using descriptive phrases rather than relying only on filenames or manual tags.
Deep Video Search
Deep video search extends the same idea into moving images.
Instead of only finding the asset, supported deep video search experiences can also help users jump toward the relevant moment in the video.
Typical capabilities include:
- time-coded visual descriptions
- time-coded on-screen text detection
- concept detection over time
- segment-oriented search results
This is one of the most impressive AI capabilities in demos, but it is usually rolled out after transcription because it is a larger cost decision.
Common Search Limitations
Music-heavy content
Speech transcription models perform poorly on music-heavy material. Songs, dense mixes, and heavily stylized audio often produce weak transcripts, which means weak transcript search.
First audio track matters
For AI transcription workflows, the system analyzes the first audio track from the proxy used for transcription. If your source content has multiple separated tracks, transcription quality may depend on proxy/transcode configuration. See Proxy Generation Overview and Transcoding.
Subtitle output is not the same as closed captions
Auto-generated subtitle outputs are primarily subtitle files for playback. If your workflow requires broadcast-style closed-caption behavior or highly specialized caption compliance, validate the output requirements separately.
Search quality depends on content quality
Clear speech, representative languages, readable on-screen text, and clean source media all improve results. Low-quality source media leads to lower-quality search outputs.
Best Practice: Test Before Broad Rollout
Before enabling richer AI search across a full catalog, create a test bed with representative:
- audio files
- video files
- image files
- different languages
- clear and difficult examples
Then run the same searches across each AI tier and compare the difference in results. See Phased AI Rollout.
Related Pages
- AI Enablement Guide — rollout order, costs, dependencies, and capability planning
- Phased AI Rollout — how to test and stage AI before go-live
- Metadata Management — subtitle file handling and VTT/SRT details
- Proxy Generation Overview — proxy dependencies that affect transcription and playback
- Rules Engine Overview — folder and rule scoping for AI processing
- SDK
search— run searches programmatically with the SDK - Real-world search patterns — recipe combining filters, sorting, and pagination
