Kazakhstan has developed a large language model KazLLM
13.12.2024 21:10:15 1793
In Kazakhstan, the training of the large KazLLM language model based on 148 billion tokens in Kazakh, English, Russian and Turkish has been completed. The model was developed by the team of the Institute of Smart Systems and Artificial Intelligence (ISSAI) at Nazarbayev University with the support and coordination of the ICRIAP of the Republic of Kazakhstan and the Ministry of Internal Affairs of the Republic of Kazakhstan.
This model will be available to a wide range of users, including the scientific community, startups and large corporations. In accordance with the initiative of the Head of State, KazLLM will become the basis for the creation of a larger project — TurkLLM, aimed at the development of natural language processing technologies in the Turkic-speaking space. The corresponding agreement was signed at the last OTG summit.
This project will be an important milestone in the creation of a national AI infrastructure and confirmation of Kazakhstan's status as a technological leader in the region. The implementation of the project contributed not only to the creation of an advanced artificial intelligence tool, but also to the growth of competence and development of human capital in the field of artificial intelligence.
Linguistic institutes and research and production organizations such as Til Kazyna, JSC "NIT", Maqsut Narikbayev University and other institutes contributed to the implementation of this project.
"The launch of the Kaz LLM open source model represents an important step forward in the development of Kazakhstan's artificial intelligence ecosystem. This initiative reflects our commitment to supporting innovation and advancing scientific achievements that contribute to technological progress. I am confident that this advanced model will help overcome digital inequality by providing affordable and inclusive digital services for every citizen of Kazakhstan," Minister Jaslan Madiyev said.
The model was trained on the basis of 148 billion tokens. Two versions have been created with 8 billion and 70 billion parameters. They serve as the basis for the development of new products in the field of artificial intelligence and surpass similar models in quality and accuracy.
At the first stage, KazLLM will be available in open access for developers, startups and companies to stimulate the creation of products and services based on it. Detailed instructions have been prepared to help you quickly integrate the model into various projects.
"This model reflects Kazakhstan's desire for innovation, independence and growth of its technological ecosystem. Our team has prepared two versions of KAZ-LLM with 8 billion and 70 billion parameters, built on the Meta Llama architecture and optimized for high-performance systems and environments with limited resources. In this way, developers will be able to download and run our model on both complex servers and laptops," said NU Professor Hussein Atakan Varol, Director of the Institute of Smart Systems and Artificial Intelligence (ISSAI) at Nazarbayev University.
Beeline Kazakhstan and its IT company QazCode have become key partners in creating a national language model. By combining efforts and experience in developing language models such as Kaz-RoBERTA, as well as in creating AI solutions for small language groups in partnership with foreign organizations, the companies have played an important role in creating an innovative and accessible model for Kazakhstanis. Support in the form of a provided server with computing power of 8 DGX H100 significantly accelerated the learning process and expanded the capabilities of the model.
For comparison, an ordinary computer needs several days to analyze an archive of 1 million photos. While the 8 DGX H100 servers used for KAZ-LLM training will handle this task in just a few seconds.
"Our team actively participated in the development and training of the Kaz-LLM model. A complex process, including the creation of a model that takes into account the peculiarities of the Kazakh language, and 50 days of calculations, made it possible to improve understanding of the context and ensure high-quality interaction with users. Testing has shown that the model effectively solves technical problems, taking into account cultural characteristics. We are confident that Kaz-LLM will become an important tool for the whole of Kazakhstan, helping to overcome the digital language barrier and improve the quality of digital services in the region," commented Alexey Sharavar, CEO of QazCode.
KazLLM is a modern language model of artificial intelligence created for processing, analyzing and generating texts in the Kazakh language. This is a unique development aimed at promoting the use of the Kazakh language in the digital space, supporting business, science and society. It is capable of performing a wide range of tasks: from translation and document processing to automation of communication.
The national model will enable businesses to develop chatbots, customer support systems, automate document management and analyze data. For example, local banks will be able to speed up the processing of requests in the Kazakh language, and retail will be able to improve the user experience by introducing the model into their processes. Educational and scientific institutions will be able to create applications for teaching the Kazakh language, as well as tools for analyzing scientific texts and helping students. For those who deal with media and content, it will be possible to generate news, improve the quality of translation and create tools for writing texts.
Link to use KazLLM:
https://huggingface.co/collections/issai/issai-kazllm-10- 6732d58c81bcaf177442c362
The press service of the ICRIAP RK
Source : https://www.gov.kz/memleket/entities/mdai/press/news/details/902638