Amazon Bedrock: A Fully Managed Service for High-Performing Foundation Models
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Common generative AI use cases, including but not limited to chatbots, virtual assistants, conversational search, and agent assistants, use FMs to provide responses. Retrieval Augment Generation (RAG) is a technique to optimize the output of FMs by providing context around the questions for these use cases. Fine-tuning the FM is recommended to further optimize the output to follow the brand and industry voice or vocabulary.
Custom Model Import for Amazon Bedrock
In this post, we provide a step-by-step approach of fine-tuning a Mistral model using SageMaker and import it into Amazon Bedrock using the Custom Import Model feature. We use the OpenOrca dataset to fine-tune the Mistral model and use the SageMaker FMEval library to evaluate the fine-tuned model imported into Amazon Bedrock.
Key Features of Custom Model Import
Some of the key features of Custom Model Import for Amazon Bedrock are:
- This feature allows you to bring your fine-tuned models and leverage the fully managed serverless capabilities of Amazon Bedrock
- Currently, we are supporting Llama 2, Llama 3, Flan, Mistral Model architectures using this feature with a precision of FP32, FP16, and BF16 with further quantizations coming soon
- To leverage this feature, you can run the import process with your model weights being in Amazon Simple Storage Service (Amazon S3)
- You can even leverage your models created using Amazon SageMaker by referencing the Amazon SageMaker model Amazon Resource Names (ARN) which provides for a seamless integration with SageMaker
- Amazon Bedrock will automatically scale your model as your traffic pattern increases and when not in use, scale your model down to 0 thus reducing your costs
Solution Overview
At the time of writing, the Custom Model Import feature in Amazon Bedrock supports models following the architectures and patterns in the following figure.
Prerequisites
We use Mistral-7B-v0.3 in this post because it uses an extended vocabulary compared to its prior version produced by Mistral AI. This model is straightforward to fine-tune, and Mistral AI has provided example fine-tuned models. We use Mistral for this use case because this model supports a 32,000-token context capacity and is fluent in English, French, Italian, German, Spanish, and coding languages. With the Mixture of Experts (MoE) feature, it can achieve higher accuracy for customer support use cases.
Fine-tune the Model using QLoRA
To fine-tune the Mistral model, we apply QLoRA and Parameter-Efficient Fine-Tuning (PEFT) optimization techniques. In the provided notebook, you use the Fully Sharded Data Parallel (FSDP) PyTorch API to perform distributed model tuning. You use supervised fine-tuning (SFT) to fine-tune the Mistral model.
Prepare the Dataset
The first step in the fine-tuning process is to prepare and format the dataset. After you transform the dataset into the Mistral model, you use the metrics F1 Score, Exact Match Score, Quasi Exact Match Score, Precision Over Words, and Recall Over Words. The key metrics for the question answering tasks are Exact Match, Quasi-Exact Match, and F1 over words evaluated by comparing the model predicted answers against the ground truth answers.
Fine-tune the Model using SageMaker
To fine-tune the Mistral model, you submit a SageMaker training job to fine-tune the Mistral model from the SageMaker JupyterLab notebook. You use the Fully Sharded Data Parallel (FSDP) PyTorch API to perform distributed model tuning. You use supervised fine-tuning (SFT) to fine-tune the Mistral model.
Import the Model into Amazon Bedrock
To import the fine-tuned model into Amazon Bedrock, you can use the Amazon Bedrock console, the Boto3 library, or APIs. An import job orchestrates the process to import the model and make the model available from the customer account. The import job copies all the model artifacts from the user’s account into an AWS managed S3 bucket.
Evaluate the Imported Model
You can use the SageMaker FMEval library to evaluate the imported model. The FMEval library supports out-of-the-box evaluation algorithms for metrics such as accuracy, QA Accuracy, and others detailed in the FMEval documentation.
Conclusion
In this post, we explained the different aspects of fine-tuning a Mistral model using SageMaker, importing the model into Amazon Bedrock, invoking the model using both an Amazon Bedrock playground and Boto3, and then evaluating the imported model using the FMEval library. You can use this feature to import base FMs or FMs fine-tuned either on premises, on SageMaker, or on Amazon EC2 into Amazon Bedrock and use the models without any heavy lifting in your generative AI applications.
Authors
Jay Pillai is a Principal Solutions Architect at Amazon Web Services. Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. Evandro Franco is a Sr. AI/ML Specialist Solutions Architect at Amazon Web Services. Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS. Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services. Ragha Prasad is a Principal Engineer and a founding member of Amazon Bedrock. Paras Mehra is a Senior Product Manager at AWS.

