The Challenge: Extracting and Generating Metadata at Scale
DPG Media, a leading media company in Benelux, operates multiple online platforms and TV channels. With a growing library of long-form video content, they recognized the importance of efficiently managing and enhancing video metadata. This metadata includes actor information, genre, summary of episodes, mood, and more. Accurate metadata is key to providing TV guide descriptions, improving content recommendations, and enhancing the consumer’s ability to explore content that aligns with their interests and current mood.
The company receives video productions accompanied by a wide range of marketing materials, such as visual media and brief descriptions. However, these materials often lack standardization and vary in quality. As a result, DPG Media producers have to run a screening process to consume and understand the content sufficiently to generate the missing metadata.
Solution Overview
To address the challenges of automation, DPG Media decided to implement a combination of AI techniques and existing metadata to generate new, accurate content and category descriptions, mood, and context. The project focused solely on audio processing due to its cost-efficiency and faster processing time.
Step 1: Generate Transcriptions of Audio Tracks
To generate the necessary audio transcripts for metadata extraction, DPG Media used speech recognition models to generate accurate transcripts of the audio content. The team evaluated two different transcription strategies: Whisper-v3-large, which requires at least 10 GB of vRAM and high operational processing, and Amazon Transcribe, a managed service with the added benefit of automatic model updates from AWS over time and speaker diarization.
Step 2: Generate Metadata
DPG Media used LLMs through Amazon Bedrock to generate the various categories of metadata (summaries, genre, mood, key events, and so on). The team selected the Anthropic Claude 3 Sonnet model based on internal testing and tuned the prompts to ensure the generated metadata matched the expected format and style.
Results and Lessons Learned
The implementation of the AI-powered metadata pipeline has been a transformative journey for DPG Media. Their approach saves days of work generating metadata for a TV series. The solution also stores the direct association between each type of metadata and its corresponding system prompt, making it straightforward to tune, remove, or add prompts as needed.
Conclusion
In this post, we shared how DPG Media introduced AI-powered processes using Amazon Bedrock into its video publication pipelines. This solution can help accelerate audio metadata extraction, create a more engaging user experience, and save time.
About the Authors
Lucas Desard is GenAI Engineer at DPG Media. He helps DPG Media integrate generative AI efficiently and meaningfully into various company processes.
Tom Lauwers is a machine learning engineer on the video personalization team for DPG Media. He builds and architects the recommendation systems for DPG Media’s long-form video platforms, supporting brands like VTM GO, Streamz, and RTL play.
Sam Landuydt is the Area Manager Recommendation & Search at DPG Media. As the manager of the team, he guides ML and software engineers in building recommendation systems and generative AI solutions for the company.
Irina Radu is a Senior Prototyping Engagement Manager, part of AWS EMEA Prototyping and Cloud Engineering. She helps customers get the most out of the latest tech, innovate faster, and think bigger.
Fernanda Machado is a Senior AWS Prototyping Architect. She helps customers bring ideas to life and use the latest best practices for modern applications.
Andrew Shved, Senior AWS Prototyping Architect, helps customers build business solutions that use innovations in modern applications, big data, and AI.
FAQs
Q: What are the challenges of automating metadata generation?
A: The challenges of automating metadata generation include language diversity, variability in content volume, release frequency, and data aggregation.
Q: What is Amazon Transcribe?
A: Amazon Transcribe is a managed service that provides automatic speech recognition (ASR) capabilities, enabling you to transcribe and analyze audio and video content.
Q: What is Amazon Bedrock?
A: Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Q: How does Amazon Bedrock generate metadata?
A: Amazon Bedrock uses LLMs to generate metadata from transcribed audio content, including summaries, genre, mood, key events, and more.