Date:

Data Filter Challenge

Introduction and Motivation

The rapid development of language models (LMs) has catalyzed breakthroughs across various domains, including natural language understanding, robotics, and digital human interaction. Compared with general large LMs, which are difficult to deploy on resource-constrained edge devices, edge LMs fine-tuned for target downstream tasks have the potential to achieve both greater efficiency and higher task accuracy. However, this fine-tuning hinges on the availability of high-quality, diverse datasets.
The Data Filtering Challenge for Training Edge Language Models
seeks to unite academic researchers, industry experts, and AI enthusiasts to develop data filtering techniques that refine datasets driving the next generation of edge LMs.

The Challenge

This challenge invites participants to create data filtering techniques and submit datasets refined by these methods, aiming to
significantly enhance the achievable performance of edge LMs on downstream tasks deployed on edge devices
. With a focus on improving model accuracy and applicability across crucial domains, participants will have the
opportunity to push the frontier of edge LMs and gain recognition within the AI community
. For the fine-tuning technique, we are focusing on a method known as Low-Rank Adaptation (LoRA), which allows for the creation of efficient task-specific edge LMs from pre-trained ones using fewer resources, making it ideal for devices such as smartphones and portable robots.

Methodology

The proposed methodology involves the following steps:

  1. Data Selection
  2. Data Preprocessing
  3. Data Filtering
  4. Model Fine-Tuning

Conclusion

The Data Filtering Challenge for Training Edge Language Models aims to unite researchers, industry experts, and AI enthusiasts to develop data filtering techniques that refine datasets driving the next generation of edge LMs. By leveraging LoRA and other fine-tuning techniques, participants will have the opportunity to push the frontier of edge LMs and gain recognition within the AI community.

FAQs

Q: What is the goal of the Data Filtering Challenge?

The goal is to develop data filtering techniques that refine datasets driving the next generation of edge LMs.

Q: What is LoRA and why is it used?

LoRA is a method known as Low-Rank Adaptation, which allows for the creation of efficient task-specific edge LMs from pre-trained ones using fewer resources, making it ideal for devices such as smartphones and portable robots.

Q: What are the key steps in the proposed methodology?

The key steps are data selection, data preprocessing, data filtering, and model fine-tuning.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here