Developing a 172B LLM with Strong Japanese Capabilities

LLM-jp Initiatives at GENIAC

The Ministry of Economy, Trade and Industry (METI) launched the Generative AI Accelerator Challenge (GENIAC) to raise the level of platform model development capability in Japan and to encourage companies and others to be creative. GENIAC has provided computational resources, supported matching with companies and data holders, fostered collaboration with global technology companies, held community events, and evaluated the performance of the developed platform models.

Training the Model using NVIDIA Megatron-LM

Megatron-LM serves as a lightweight research-oriented framework leveraging Megatron-Core for training LLMs at unparalleled speed. Megatron-Core is an open-source library that contains GPU-optimized techniques and cutting-edge system-level optimizations essential for large-scale training.

Model Architecture and Training Settings

The LLM-jp 172B model is being trained from scratch using 2.1 trillion tokens of a multilingual corpus developed for the project, mainly consisting of Japanese and English. The training is performed using NVIDIA H100 Tensor Core GPUs on Google Cloud A3 Instance with FP8 hybrid training using the Transformer Engine. Megatron-Core v0.6 and Transformer Engine v1.4 are used in the experiment.

Training Throughput and Results

Pretraining for the latest LLM-jp 172B model is currently underway, with periodic evaluations every few thousand iterations to monitor training progress and ensure successful accuracy results on Japanese and English downstream tasks. So far, over 80% is complete of the targeted 2.1 trillion tokens.

Conclusion

As mentioned above, the training of LLM-jp 172B is still ongoing using Megatron-LM. Based on the evaluation results of downstream tasks using the current checkpoint data, we suppose that the model has already acquired excellent Japanese language capabilities, but the complete model is expected to be ready early next year. Training time is often a significant challenge in pretraining LLMs, where vast datasets are required. Therefore, efficient training frameworks like Megatron-LM are crucial for accelerating Generative AI research and development.

FAQs

What is the current status of the LLM-jp 172B model training?
The training is still ongoing, with periodic evaluations every few thousand iterations to monitor training progress and ensure successful accuracy results on Japanese and English downstream tasks.
What is the expected completion date for the LLM-jp 172B model?
The complete model is expected to be ready early next year.
What is the purpose of the Generative AI Accelerator Challenge (GENIAC)?
The purpose of GENIAC is to raise the level of platform model development capability in Japan and to encourage companies and others to be creative.
What is the role of Megatron-LM in the LLM-jp 172B model training?
Megatron-LM serves as a lightweight research-oriented framework for training LLMs at unparalleled speed, leveraging Megatron-Core for large-scale training.
What is the current performance of the LLM-jp 172B model?
The model has already acquired excellent Japanese language capabilities, as evaluated by downstream tasks using the current checkpoint data.

Post Views: 60

Developing a 172B LLM with Strong Japanese Capabilities

Generate single title from this title The Impact of Generative AI in Business: Key Insights in 100 -150 characters. And it must return only...

Generate single title from this title The most interesting startups right now want to get you off your phone in 100 -150 characters. And...

Generate single title from this title My students need real connection, not AI feedback in 100 -150 characters. And it must return only title...

Startup helps retailers track their products in real-time | MIT News

Generate single title from this title Dozens of Red Hat packages backdoored through its official NPM channel in 100 -150 characters. And it must...

Generate single title from this title The Impact of Generative AI in Business: Key Insights in 100 -150 characters. And it must return only...

Generate single title from this title The most interesting startups right now want to get you off your phone in 100 -150 characters. And...

Generate single title from this title My students need real connection, not AI feedback in 100 -150 characters. And it must return only title...

Startup helps retailers track their products in real-time | MIT News

Generate single title from this title Dozens of Red Hat packages backdoored through its official NPM channel in 100 -150 characters. And it must...

Ambassadors of STEM | MIT News

Generate single title from this title How districts can build a shared AI structure in 100 -150 characters. And it must return only title...

Generate single title from this title Training Azerbaijani language models on Amazon SageMaker AI in 100 -150 characters. And it must return only title...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title The Impact of Generative AI in Business: Key Insights in 100 -150 characters. And it must return only...

Generate single title from this title The most interesting startups right now want to get you off your phone in 100 -150 characters. And...

Generate single title from this title My students need real connection, not AI feedback in 100 -150 characters. And it must return only title...

Categories

Useful Links

Our Newsletter