In the rapidly evolving world of AI, the ability to customize language models for specific industries has become more important.
In the rapidly evolving world of AI, the ability to customize language models for specific industries has become more important. Although large language models (LLMs) are adept at handling a wide range of tasks with natural language, they excel at general-purpose tasks as compared with specialized tasks. This can create challenges when processing text data from highly specialized domains with their own distinct terminology or specialized tasks where intrinsic knowledge of the LLM is not well-suited for solutions such as Retrieval Augmented Generation (RAG).
For instance, in the automotive industry, users might not always provide specific diagnostic trouble codes (DTCs), which are often proprietary to each manufacturer.
For instance, in the automotive industry, users might not always provide specific diagnostic trouble codes (DTCs), which are often proprietary to each manufacturer. These codes, such as P0300 for a generic engine misfire or C1201 for an ABS system fault, are crucial for precise diagnosis. Without these specific codes, a general-purpose LLM might struggle to provide accurate information. This lack of specificity can lead to hallucinations in the generated responses, where the model invents plausible but incorrect diagnoses, or sometimes result in no answers at all. For example, if a user simply describes “engine running rough” without providing the specific DTC, a general LLM might suggest a wide range of potential issues, some of which may be irrelevant to the actual problem, or fail to provide any meaningful diagnosis due to insufficient context. Similarly, in tasks like code generation and suggestions through chat-based applications, users might not specify the APIs they want to use. Instead, they often request help in resolving a general issue or in generating code that utilizes proprietary APIs and SDKs.
Moreover, generative AI applications for consumers can offer valuable insights into the types of interactions from end-users.
Moreover, generative AI applications for consumers can offer valuable insights into the types of interactions from end-users. With appropriate feedback mechanisms, these applications can also gather important data to continuously improve the behavior and responses generated by these models.
For these reasons, there is a growing trend in the adoption and customization of small language models (SLMs).
For these reasons, there is a growing trend in the adoption and customization of small language models (SLMs). SLMs are compact transformer models, primarily utilizing decoder-only or encoder-decoder architectures, typically with parameters ranging from 1–8 billion. They are generally more efficient and cost-effective to train and deploy compared to LLMs, and are highly effective when fine-tuned for specific domains or tasks. SLMs offer faster inference times, lower resource requirements, and are suitable for deployment on a wider range of devices, making them particularly valuable for specialized applications and edge computing scenarios. Additionally, more efficient techniques for customizing both LLMs and SLMs, such as Low Rank Adaptation (LoRA), are making these capabilities increasingly accessible to a broader range of customers.
AWS offers a wide range of solutions for interacting with language models.
AWS offers a wide range of solutions for interacting with language models. Amazon Bedrock is a fully managed service that offers foundation models (FMs) from Amazon and other AI companies to help you build generative AI applications and host customized models. Amazon SageMaker is a comprehensive, fully managed machine learning (ML) service to build, train, and deploy LLMs and other FMs at scale. You can fine-tune and deploy models with Amazon SageMaker JumpStart or directly through Hugging Face containers.
Solution Overview
This solution uses multiple features of SageMaker and Amazon Bedrock, and can be divided into four main steps:
- Data analysis and preparation: In this step, we assess the available data, understand how it can be used to develop solution, select data for fine-tuning, and identify required data preparation steps. We use Amazon SageMaker Studio, a comprehensive web-based integrated development environment (IDE) designed to facilitate all aspects of ML development. We also employ SageMaker jobs to access more computational power on-demand, thanks to the SageMaker Python SDK.
- Model fine-tuning: In this step, we prepare prompt templates for fine-tuning SLM. For this post, we use Meta Llama3.1 8B Instruct from Hugging Face as the SLM. We run our fine-tuning script directly from the SageMaker Studio JupyterLab environment. We use the @remote decorator feature of the SageMaker Python SDK to launch a remote training job. The fine-tuning script uses LoRA, distributing compute across all available GPUs on a single instance.
- Model deployment: When the fine-tuning job is complete and the model is ready, we have two deployment options:
- Deploy in SageMaker by selecting the best instance and container options available.
- Deploy in Amazon Bedrock by importing the fine-tuned model for on-demand use.
- Model evaluation: In this final step, we evaluate the fine-tuned model against a similar base model and a larger model available from Amazon Bedrock. Our evaluation focuses on how well the model uses specific terminology for the automotive space, as well as the improvements provided by fine-tuning in generating answers.
Using the Automotive_NER dataset
The Automotive_NER dataset, available on the Hugging Face platform, is designed for named entity recognition (NER) tasks specific to the automotive domain. This dataset is specifically curated to help identify and classify various entities related to the automotive industry and uses domain-specific terminologies.
Conclusion
This post demonstrated the process of customizing SLMs on AWS for domain-specific applications, focusing on automotive terminology for diagnostics. The provided steps and source code show how to analyze data, fine-tune models, deploy them efficiently, and evaluate their performance against larger base models using SageMaker and Amazon Bedrock. We further highlighted the benefits of customization by enhancing vocabulary within specialized domains.
FAQs
Q: What are the benefits of customizing language models for specific industries?
A: Customizing language models for specific industries enables them to learn domain-specific terminology and provide more accurate and relevant responses.
Q: What is the difference between large language models (LLMs) and small language models (SLMs)?
A: LLMs are generally more powerful and capable of handling a wide range of tasks, while SLMs are more efficient and cost-effective, making them suitable for deployment on a wider range of devices.
Q: What is LoRA, and how does it improve the customization of language models?
A: LoRA is a technique for customizing language models by adapting their weights to specific tasks or domains, making them more effective and efficient.
Q: What are the benefits of using Amazon SageMaker and Amazon Bedrock for customizing language models?
A: Amazon SageMaker and Amazon Bedrock provide a range of features and services for customizing language models, including data preparation, model fine-tuning, deployment, and evaluation, making it easier to build and deploy customized models.

