Home / News / Artificial Intelligence / Open-source large language model from China’s search engine pioneer rivals OpenAI

Open-source large language model from China’s search engine pioneer rivals OpenAI

“China needs its own OpenAI,” Sogou founder Wang Xiaochuan tweeted in February. Today, Chinese entrepreneur Baichuan Intelligence launched its next-generation large language model Baichuan-13B.

Due to its founder’s history as a computer science prodigy from Tsinghua University and founding Sogou, which Tencent later acquired, Baichuan is considered one of China’s most promising LLM developers.

Wang left Sogou late 2021. In April, the entrepreneur launched Baichuan and received $50 million from angel investors as ChatGPT swept the world.

Baichuan, a 13 billion-parameter model based on the Transformer architecture (which underpins GPT), is trained on Chinese and English data like other Chinese LLMs. (Parameters are model variables that generate and analyze text.) GitHub says the model is open-source and optimized for commercial use.

1.4 trillion tokens train Baichuan-13. Meta’s 13 billion-parameter LLaMa model uses 1 trillion tokens. Wang said in an interview that his startup would release a large-scale model like OpenAI’s GPT-3.5 by the end of the year.

Baichuan has grown rapidly in three months. The team had 50 members by April and released its first LLM, the 7 billion-parameter pre-training model Baichuan-7B, in June.

The foundational model Baichuan-13B is now free to academics and developers with official permission to use it commercially. Importantly, the model can run on consumer-grade hardware, including Nvidia’s 3090 graphic cards, despite U.S. AI chip sanctions on China.

Baidu, Zhipu.ai, and IDEA, led by Harry Shum, have also invested heavily in large language models.

China’s large language models are growing as it prepares to implement some of the world’s strictest AI regulations. According to the Financial Times, China will regulate generative AI with a focus on content, tightening control over the April rules. Before launching large language models, companies may need a license, which could slow China’s efforts to compete with the U.S. in the nascent industry.

About Chambers

Check Also

Researchers have recently identified the initial fractal molecule found in the natural world

Fractals, which are self-repeating shapes that can be infinitely magnified without losing their intricate details, …