Importing a Fine-Tuned Question Answering Model to Amazon SageMaker

Amazon Bedrock: A Fully Managed Service for High-Performing Foundation Models

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Common generative AI use cases, including but not limited to chatbots, virtual assistants, conversational search, and agent assistants, use FMs to provide responses. Retrieval Augment Generation (RAG) is a technique to optimize the output of FMs by providing context around the questions for these use cases. Fine-tuning the FM is recommended to further optimize the output to follow the brand and industry voice or vocabulary.

Custom Model Import for Amazon Bedrock

In this post, we provide a step-by-step approach of fine-tuning a Mistral model using SageMaker and import it into Amazon Bedrock using the Custom Import Model feature. We use the OpenOrca dataset to fine-tune the Mistral model and use the SageMaker FMEval library to evaluate the fine-tuned model imported into Amazon Bedrock.

Key Features of Custom Model Import

Some of the key features of Custom Model Import for Amazon Bedrock are:

This feature allows you to bring your fine-tuned models and leverage the fully managed serverless capabilities of Amazon Bedrock
Currently, we are supporting Llama 2, Llama 3, Flan, Mistral Model architectures using this feature with a precision of FP32, FP16, and BF16 with further quantizations coming soon
To leverage this feature, you can run the import process with your model weights being in Amazon Simple Storage Service (Amazon S3)
You can even leverage your models created using Amazon SageMaker by referencing the Amazon SageMaker model Amazon Resource Names (ARN) which provides for a seamless integration with SageMaker
Amazon Bedrock will automatically scale your model as your traffic pattern increases and when not in use, scale your model down to 0 thus reducing your costs

Solution Overview

At the time of writing, the Custom Model Import feature in Amazon Bedrock supports models following the architectures and patterns in the following figure.

Prerequisites

We use Mistral-7B-v0.3 in this post because it uses an extended vocabulary compared to its prior version produced by Mistral AI. This model is straightforward to fine-tune, and Mistral AI has provided example fine-tuned models. We use Mistral for this use case because this model supports a 32,000-token context capacity and is fluent in English, French, Italian, German, Spanish, and coding languages. With the Mixture of Experts (MoE) feature, it can achieve higher accuracy for customer support use cases.

Fine-tune the Model using QLoRA

To fine-tune the Mistral model, we apply QLoRA and Parameter-Efficient Fine-Tuning (PEFT) optimization techniques. In the provided notebook, you use the Fully Sharded Data Parallel (FSDP) PyTorch API to perform distributed model tuning. You use supervised fine-tuning (SFT) to fine-tune the Mistral model.

Prepare the Dataset

The first step in the fine-tuning process is to prepare and format the dataset. After you transform the dataset into the Mistral model, you use the metrics F1 Score, Exact Match Score, Quasi Exact Match Score, Precision Over Words, and Recall Over Words. The key metrics for the question answering tasks are Exact Match, Quasi-Exact Match, and F1 over words evaluated by comparing the model predicted answers against the ground truth answers.

Fine-tune the Model using SageMaker

To fine-tune the Mistral model, you submit a SageMaker training job to fine-tune the Mistral model from the SageMaker JupyterLab notebook. You use the Fully Sharded Data Parallel (FSDP) PyTorch API to perform distributed model tuning. You use supervised fine-tuning (SFT) to fine-tune the Mistral model.

Import the Model into Amazon Bedrock

To import the fine-tuned model into Amazon Bedrock, you can use the Amazon Bedrock console, the Boto3 library, or APIs. An import job orchestrates the process to import the model and make the model available from the customer account. The import job copies all the model artifacts from the user’s account into an AWS managed S3 bucket.

Evaluate the Imported Model

You can use the SageMaker FMEval library to evaluate the imported model. The FMEval library supports out-of-the-box evaluation algorithms for metrics such as accuracy, QA Accuracy, and others detailed in the FMEval documentation.

Conclusion

In this post, we explained the different aspects of fine-tuning a Mistral model using SageMaker, importing the model into Amazon Bedrock, invoking the model using both an Amazon Bedrock playground and Boto3, and then evaluating the imported model using the FMEval library. You can use this feature to import base FMs or FMs fine-tuned either on premises, on SageMaker, or on Amazon EC2 into Amazon Bedrock and use the models without any heavy lifting in your generative AI applications.

Authors

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. Evandro Franco is a Sr. AI/ML Specialist Solutions Architect at Amazon Web Services. Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS. Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services. Ragha Prasad is a Principal Engineer and a founding member of Amazon Bedrock. Paras Mehra is a Senior Product Manager at AWS.

Post Views: 49

Importing a Fine-Tuned Question Answering Model to Amazon SageMaker

Amazon Bedrock: A Fully Managed Service for High-Performing Foundation Models

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

Generate single title from this title It exposed what was already broken in 100 -150 characters. And it must return only title i dont...

What is a Performance Review + Definition?

LEAVE A REPLY Cancel reply

Latest

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Categories

Useful Links

Our Newsletter