Open source GPT-based large-scale language model 'Cerebras-GPT' seven types can be downloaded by anyone at once

Cerebras, an AI company, has released seven types of open source large-scale language models ' Cerebras-GPT ' with 111 million to 13 billion parameters. Cerebras-GPT is a model trained by

the Chinchilla method released by DeepMind in March 2022 based on OpenAI's GPT-3. It is characterized by low power consumption.

Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models - Cerebras

Cerebras (Cerebras)

GitHub - Cerebras/modelzoo

Below is a table summarizing open access and licenses for major large-scale language models. OpenAI's GPT-4 is not open including the model structure, and DeepMind's Chinchilla can only openly access the model structure. Meta's OPT is mostly open, but model weights are limited to researchers and licenses are limited to non-commercial use. On the other hand, Cerebras-GPT has all models, weights and checkpoints published on Hugging Face and GitHub under the Apache 2.0 license .

“For LLM to be an open and accessible technology, we believe that access to state-of-the-art models that are open, reproducible and royalty-free are critical for both research and commercial applications,” said Cerebras. I am.”

Cerebras-GPT is said to be learning in a few weeks on the CS-2 system, which is part of

Andromeda , an AI supercomputer owned by Cerebras. Cerebras is 111M (111 million parameters), 256M (256 million parameters), 590M (590 million parameters), 1.3B (1.3 billion parameters), 2.7B (2.7 billion parameters), There are 7 models, 6.7B (6.7 billion parameters) and 13B (13 billion parameters), and by using the optimal learning token for each model size, the data loss per unit computation amount is the highest for all model sizes. It's getting smaller, says Cerebras.

Below is a graph showing the computational efficiency of Cerebras-GPT (orange) and Pythia (green), a large-scale language model of EleutherAI . The vertical is the data value lost during learning, and the horizontal is the computational complexity (logarithmic display) in learning. The smaller the slope of this graph, the higher the learning efficiency.

Cerebras also appealed that Cerebras-GPT maintains high learning efficiency in

downstream tasks . The graph below shows the efficiency in downstream tasks when learning Cerebras-GPT (orange), Pythia (green), and OPT (brown) using various data sets. Cerebras claims that the graph shows that Cerebras-GPT has high learning efficiency even in downstream tasks.

“We hope that Cerebras-GPT will serve as the first public large-scale GPT model family with state-of-the-art learning efficiency, as a recipe for efficient learning, and as a reference for further community research,” said Cerebras. In addition, through Cerebras AI Model Studio, we are making both the infrastructure and models available on the cloud.We will further develop the large-scale generative AI industry through better training infrastructure and community sharing. I believe we can do it,' he commented.

in Software, Posted by log1i_yk