Construct a video insights and summarization engine utilizing generative AI with Amazon Bedrock

Professionals in all kinds of industries have adopted digital video conferencing instruments as a part of their common conferences with suppliers, colleagues, and clients. These conferences usually contain exchanging data and discussing actions that a number of events should take after the session. The normal approach to ensure data and actions aren’t forgotten is to take notes throughout the session; a guide and tedious course of that may be error-prone, significantly in a high-activity or high-pressure situation. Moreover, these notes are normally private and never saved in a central location, which is a misplaced alternative for companies to be taught what does and doesn’t work, in addition to easy methods to enhance their gross sales, buying, and communication processes.

This put up presents an answer the place you possibly can add a recording of your assembly (a characteristic out there in most fashionable digital communication providers comparable to Amazon Chime) to a centralized video insights and summarization engine. This engine makes use of synthetic intelligence (AI) and machine studying (ML) providers and generative AI on AWS to extract transcripts, produce a abstract, and supply a sentiment for the decision. The answer notes the logged actions per particular person and offers steered actions for the uploader. All of this information is centralized and can be utilized to enhance metrics in eventualities comparable to gross sales or name facilities. Many business generative AI options out there are costly and require user-based licenses. In distinction, our resolution is an open-source mission powered by Amazon Bedrock, providing a cheap different with out these limitations.

This resolution may help your organizations’ gross sales, gross sales engineering, and assist features turn into extra environment friendly and customer-focused by lowering the necessity to take notes throughout buyer calls.

Use case overview

The group on this situation has seen that in buyer calls, some actions usually get skipped because of the complexity of the discussions, and that there is likely to be potential to centralize buyer information to raised perceive easy methods to enhance buyer interactions in the long term. The group already information classes in video format, however these movies are sometimes saved in particular person repositories, and a assessment of the entry logs has proven that workers not often use them of their day-to-day actions.

To extend effectivity, scale back the load, and achieve higher insights, this resolution appears to be like at easy methods to use generative AI to research recorded movies and supply workers with helpful insights regarding their calls. It additionally helps audio recordsdata so you may have flexibility round the kind of name recordings you utilize. Generated name transcripts and insights embody dialog abstract, sentiment, an inventory of logged actions, and a set of steered subsequent greatest actions. These insights are saved in a central repository, unlocking the power for analytics groups to have a single view of interactions and use the information to formulate higher gross sales and assist methods.

Organizations usually can’t predict their name patterns, so the answer depends on AWS serverless providers to scale throughout busy occasions. This allows you to sustain with peak calls for, but additionally scale down to scale back prices throughout occasions comparable to seasonal holidays when the gross sales, engineering, and assist groups are away.

This put up offers steering on how one can create a video insights and summarization engine utilizing AWS AI/ML providers. We stroll by the important thing elements and providers wanted to construct the end-to-end structure, providing instance code snippets and explanations for every vital ingredient that assist obtain the core performance. This strategy ought to allow you to know the underlying architectural ideas and offers flexibility so that you can both combine these into present workloads or use them as a basis to construct a brand new workload.

Answer overview

The next diagram illustrates the pipeline for the video insights and summarization engine.

To allow the video insights resolution, the structure makes use of a mixture of AWS providers, together with the next:

Amazon API Gateway is a totally managed service that makes it simple for builders to create, publish, keep, monitor, and safe APIs at scale.
Amazon Bedrock is a totally managed service that gives a alternative of high-performing basis fashions (FMs) from main AI firms like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon by a single API, together with a broad set of capabilities to construct generative AI functions with safety, privateness, and accountable AI.
Amazon DynamoDB is a totally managed NoSQL database service that gives quick and predictable efficiency with seamless scalability.
AWS Lambda is an event-driven compute service that permits you to run code for just about any kind of utility or backend service with out provisioning or managing servers. You may invoke Lambda features from over 200 AWS providers and software-as-a-service (SaaS) functions.
Amazon Easy Storage Service (Amazon S3) is an object storage service providing industry-leading scalability, information availability, safety, and efficiency. You need to use Amazon S3 to securely retailer objects and in addition serve static web sites.
Amazon Transcribe is an computerized speech recognition (ASR) service that makes it simple for builders so as to add speech-to-text functionality to their functions.

For integration between providers, we use API Gateway as an occasion set off for our Lambda operate, and DynamoDB as a extremely scalable database to retailer our buyer particulars. Lastly, video or audio recordsdata uploaded are saved securely in an S3 bucket.

The top-to-end resolution for the video insights and summarization engine begins with the UI. We construct a easy static internet utility hosted in Amazon S3 and deploy an Amazon CloudFront distribution to serve the static web site for low latency and excessive switch speeds. We use CloudFront origin entry management (OAC) to safe Amazon S3 origins and allow entry to the designated CloudFront distributions solely. With Amazon Cognito, we’re in a position to defend the online utility from unauthenticated customers.

We use API Gateway because the entry level for real-time communications between the frontend and backend of the video insights and summarization engine, whereas controlling entry utilizing Amazon Cognito because the authorizer. With Lambda integration, we are able to create an online API with an endpoint to the Lambda operate.

To start out the workflow, add a uncooked video file instantly into an S3 bucket with the pre-signed URL given by API Gateway and a Lambda operate. The up to date video is fed into Amazon Transcribe, which converts the speech of the video right into a video transcript in textual content format. Lastly, we use massive language fashions (LLMs) out there by Amazon Bedrock to summarize the video transcript and extract insights from the video content material.

The answer shops uploaded movies and video transcripts in Amazon S3, which gives sturdy, extremely out there, and scalable information storage at a low price. We additionally retailer the video summaries, sentiments, insights, and different workflow metadata in DynamoDB, a NoSQL database service that lets you rapidly maintain monitor of the workflow standing and retrieve related data from the unique video.

We additionally use Amazon CloudWatch and Amazon EventBridge to observe each element of the workflow in actual time and reply as essential.

AI/ML workflow

On this put up, we deal with the workflow utilizing AWS AI/ML providers to generate the summarized content material and extract insights from the video transcript.

Beginning with the Amazon Transcribe StartTranscriptionJob API, we transcribe the unique video saved in Amazon S3 right into a JSON file. The next code reveals an instance of this utilizing Python:

job_args = {
    'TranscriptionJobName': jobId,
    'Media': {'MediaFileUri': media_uri},
    'MediaFormat': media_format,
    'LanguageCode': language_code,
    'Subtitles': {'Codecs': ['srt']},
    'OutputBucketName': output_bucket_name,
    'OutputKey': jobId + ".json"
}
if vocabulary_name will not be None:
    job_args['Settings'] = {'VocabularyName': vocabulary_name}
response = transcribe_client.start_transcription_job(**job_args)

The next is an instance of our workload’s Amazon Transcribe output in JSON format:

{
    "jobName": "a37f0f27-0908-45eb-8d98-8efc3a9d4590-1698392975",
    "accountId": "8469761*****",
    "outcomes": {
        "transcripts": [{
                "transcript": "Thank you for calling, my name is Ivy. Can I have your name?..."}],
        "objects": [{
                "start_time": "7.809","end_time": "8.21",
                "alternatives": [{
                        "confidence": "0.998","content": "Thank"}],
                "kind": "pronunciation"
            },
            ...
        ]
    },
    "standing": "COMPLETED"
}

Because the output from Amazon Transcribe is created and saved in Amazon S3, we use Amazon S3 Occasion Notifications to invoke an occasion to a Lambda operate when the transcription job is completed and a video transcript file object has been created.

Within the subsequent step of the workflow, we use LLMs out there by Amazon Bedrock. LLMs are neural network-based language fashions containing a whole lot of tens of millions to over a trillion parameters. The flexibility to generate content material has resulted in LLMs being broadly utilized to be used circumstances comparable to textual content era, summarization, translation, sentiment evaluation, conversational chatbots, and extra. For this resolution, we use Anthropic’s Claude 3 on Amazon Bedrock to summarize the unique textual content, get the sentiment of the dialog, extract logged actions, and recommend additional actions for the gross sales workforce. In Amazon Bedrock, you can too use different LLMs for textual content summarization comparable to Amazon Titan, Meta Llama 3, and others, which could be invoked utilizing the Amazon Bedrock API.

As proven within the following Python code to summarize the video transcript, you possibly can name the InvokeEndpoint API to invoke the required Amazon Bedrock mannequin to run inference utilizing the enter supplied within the request physique:

modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
settle for="utility/json"
contentType="utility/json"
    
prompt_template = """
The next is the transcript from one in every of our gross sales representatives and our buyer.
The AI is a software that the gross sales consultant makes use of to acquire a quick abstract of what the dialog was about. The AI primarily based this abstract on the contents of the dialog and doesn't make up occasions that didn't occur.
     The transcript is:
     
       {}
     
What's the 2 paragraphs abstract of the dialog?
"""
    
PROMPT = prompt_template.format(raw_text)
   	
physique = json.dumps(
     {
     	"messages": [
            {
              "role": "user",
              "content": [
                 {"type": "text", "text": PROMPT}
              ],
             }
            ],
           "anthropic_version": "bedrock-2023-05-31",
           "max_tokens": 512,
           "temperature": 0.1,
           "top_p": 0.9
        }
    )
response = bedrock.invoke_model(physique=physique, modelId=modelId, settle for=settle for, contentType=contentType)
response_body = json.hundreds(response["body"].learn())
abstract = response_body["content"][0]["text"]

You may invoke the endpoint with completely different parameters outlined within the payload to affect the textual content summarization:

temperature – temperature is utilized in textual content era to manage the extent of randomness of the output. A decrease temperature worth ends in a extra conservative and deterministic output; the next temperature worth encourages extra numerous and artistic outputs.
top_p – top_p, often known as nucleus sampling, is one other parameter to manage the variety of the summaries textual content. It signifies the cumulative chance threshold to pick out the following token throughout the textual content era course of. Decrease values of top_p lead to a narrower collection of tokens with excessive chances, resulting in extra deterministic outputs. Conversely, larger values of top_p introduce extra randomness and variety into the generated summaries.

Though there’s no common optimum mixture of top_p and temperature for all eventualities, within the previous code, we show pattern values with excessive top_p and low temperature with a view to generate summaries targeted on key data, sustaining constancy to the unique video transcript whereas nonetheless introducing a point of wording variation.

The next is one other instance of utilizing the Anthropic’s Claude 3 mannequin by the Amazon Bedrock API to offer steered actions to gross sales representatives primarily based on the video transcript:

prompt_template = """
The next is the transcript from one in every of our gross sales representatives and our buyer.
The AI is a software that the gross sales consultant makes use of to look into what further actions they'll use to extend gross sales after the session. The AI bases the steered actions on the contents of the dialog and what it thinks may assist improve the shoppers satisfaction and loyalty.

The transcript is:
     
      {}
     

     Utilizing the transcript above, present a bullet level format for steered actions the gross sales consultant might do to extend comply with on gross sales.
    """


PROMPT = prompt_template.format(raw_text)
    
physique = json.dumps(
   	{
     	"messages": [
         	  {
              "role": "user",
              "content": [
                 {"type": "text", "text": PROMPT}
               ],
             }
            ],
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1024,
            "temperature": 0.1,
            "top_p": 0.9
        }
    )

response = bedrock.invoke_model(physique=physique, modelId=modelId, settle for=settle for, contentType=contentType)
response_body = json.hundreds(response["body"].learn())
suggested_actions = response_body["content"][0]["text"]

After we efficiently generate video summaries, sentiments, logged actions, and steered actions from the unique video transcript, we retailer these insights in a DynamoDB desk, which is then up to date within the UI by API Gateway.

The next screenshot reveals a easy UI for the video insights and summarization engine. The frontend is constructed on Cloudscape, an open supply design system for the cloud. On common, it takes lower than 5 minutes and prices not more than $2 to course of 1 hour of video, assuming the video’s transcript incorporates roughly 8,000 phrases.

Future enhancements

The answer on this put up reveals how you should use AWS providers with Amazon Bedrock to construct a cheap and highly effective generative AI utility that lets you analyze video content material and extract insights to assist groups turn into extra environment friendly. This resolution is just the start of the worth you possibly can unlock with AWS generative AI and broader ML providers.

One instance of how this resolution could possibly be taken additional is to broaden the scope to assist sort out a number of the logged actions from calls. The addition of providers comparable to Amazon Bedrock Brokers might assist automate a number of the responses, comparable to forwarding related documentation like product specs, tariffs, or perhaps a easy recap electronic mail. All of those might save time and effort, enabling you to focus extra on value-added actions.

Equally, the centralization of all this information might mean you can create an analytics layer on high of a centralized database to assist formulate simpler gross sales and assist methods. This information is normally misplaced or misplaced inside organizations as a result of folks favor completely different strategies for word assortment. The proposed resolution provides you the liberty to centralize information but additionally increase group information with the voice of the client. For instance, the analytics workforce might analyze what workers did effectively in calls which have a optimistic sentiment and provide coaching or steering to assist everybody obtain extra optimistic buyer interactions.

Conclusion

On this put up, we described easy methods to create an answer that ingests video and audio recordsdata to create highly effective, actionable, and correct insights that a company can use by the facility of Amazon Bedrock generative AI capabilities on AWS. The insights supplied may help scale back the undifferentiated heavy lifting that customer-facing groups encounter, and in addition present a centralized dataset of buyer conversations that a company can use to additional enhance efficiency.

For additional data on how you should use Amazon Bedrock in your workloads, see Amazon Bedrock.

Concerning the Authors

Simone Zucchet is a Options Architect Supervisor at AWS. With over 6 years of expertise as a Cloud Architect, Simone enjoys engaged on revolutionary initiatives that assist rework the way in which organizations strategy enterprise issues. He helps assist massive enterprise clients at AWS and is a part of the Machine Studying TFC. Outdoors of his skilled life, he enjoys engaged on automobiles and images.

Vu San Ha Huynh is a Options Architect at AWS. He has a PhD in pc science and enjoys engaged on completely different revolutionary initiatives to assist assist massive enterprise clients.

Adam Raffe is a Principal Options Architect at AWS. With over 8 years of expertise in cloud structure, Adam helps massive enterprise clients resolve their enterprise issues utilizing AWS.

Ahmed Raafat is a Principal Options Architect at AWS, with 20 years of area expertise and a devoted focus of 6 years inside the AWS ecosystem. He makes a speciality of AI/ML options. His intensive expertise spans numerous {industry} verticals, making him a trusted advisor for quite a few enterprise clients, serving to them seamlessly navigate and speed up their cloud journey.

Post Views: 81

Construct a video insights and summarization engine utilizing generative AI with Amazon Bedrock

Use case overview

Answer overview

AI/ML workflow

Future enhancements

Conclusion

Concerning the Authors

Generate single title from this title Chevy built an all-American EV truck — why is nobody buying it? in 100 -150 characters. And it...

The 4 Questions HR Needs to Answer If They Want Teams to Actually Thrive

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

MIT student teams win top honors in NASA competition | MIT News

5 Design Considerations for Effective Employee Recognition Programs

Generate single title from this title Chevy built an all-American EV truck — why is nobody buying it? in 100 -150 characters. And it...

The 4 Questions HR Needs to Answer If They Want Teams to Actually Thrive

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

MIT student teams win top honors in NASA competition | MIT News

5 Design Considerations for Effective Employee Recognition Programs

Agibot reaches new milestone as its 15,000th humanoid robot rolls off production line

How AI Navigation is Improving the Performance of Robotic Pool Cleaners

Generate single title from this title SAP aligns commerce data for AI personalisation in 100 -150 characters. And it must return only title i...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Chevy built an all-American EV truck — why is nobody buying it? in 100 -150 characters. And it...

The 4 Questions HR Needs to Answer If They Want Teams to Actually Thrive

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

Categories

Useful Links

Our Newsletter