Date:

Developing a 172B LLM with Strong Japanese Capabilities

LLM-jp Initiatives at GENIAC

The Ministry of Economy, Trade and Industry (METI) launched the Generative AI Accelerator Challenge (GENIAC) to raise the level of platform model development capability in Japan and to encourage companies and others to be creative. GENIAC has provided computational resources, supported matching with companies and data holders, fostered collaboration with global technology companies, held community events, and evaluated the performance of the developed platform models.

Training the Model using NVIDIA Megatron-LM

Megatron-LM serves as a lightweight research-oriented framework leveraging Megatron-Core for training LLMs at unparalleled speed. Megatron-Core is an open-source library that contains GPU-optimized techniques and cutting-edge system-level optimizations essential for large-scale training.

Model Architecture and Training Settings

The LLM-jp 172B model is being trained from scratch using 2.1 trillion tokens of a multilingual corpus developed for the project, mainly consisting of Japanese and English. The training is performed using NVIDIA H100 Tensor Core GPUs on Google Cloud A3 Instance with FP8 hybrid training using the Transformer Engine. Megatron-Core v0.6 and Transformer Engine v1.4 are used in the experiment.

Training Throughput and Results

Pretraining for the latest LLM-jp 172B model is currently underway, with periodic evaluations every few thousand iterations to monitor training progress and ensure successful accuracy results on Japanese and English downstream tasks. So far, over 80% is complete of the targeted 2.1 trillion tokens.

Conclusion

As mentioned above, the training of LLM-jp 172B is still ongoing using Megatron-LM. Based on the evaluation results of downstream tasks using the current checkpoint data, we suppose that the model has already acquired excellent Japanese language capabilities, but the complete model is expected to be ready early next year. Training time is often a significant challenge in pretraining LLMs, where vast datasets are required. Therefore, efficient training frameworks like Megatron-LM are crucial for accelerating Generative AI research and development.

FAQs

  1. What is the current status of the LLM-jp 172B model training?
    The training is still ongoing, with periodic evaluations every few thousand iterations to monitor training progress and ensure successful accuracy results on Japanese and English downstream tasks.

  2. What is the expected completion date for the LLM-jp 172B model?
    The complete model is expected to be ready early next year.

  3. What is the purpose of the Generative AI Accelerator Challenge (GENIAC)?
    The purpose of GENIAC is to raise the level of platform model development capability in Japan and to encourage companies and others to be creative.

  4. What is the role of Megatron-LM in the LLM-jp 172B model training?
    Megatron-LM serves as a lightweight research-oriented framework for training LLMs at unparalleled speed, leveraging Megatron-Core for large-scale training.

  5. What is the current performance of the LLM-jp 172B model?
    The model has already acquired excellent Japanese language capabilities, as evaluated by downstream tasks using the current checkpoint data.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here