Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

DeepSeek’s new AI model looks to be one of the best “open” competitors yet


A Chinese lab has created what appears to be one of the most powerful “open” artificial intelligence models to date.

model, DeepSeek V3DeepSeek was developed by an artificial intelligence firm and was released on Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial applications.

DeepSeek V3 can handle a range of text-based workloads and tasks, such as encoding, translating, and writing essays and emails from descriptive text.

According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both existing downloadable “open” models and “closed” AI models available only via API. DeepSeek outperforms other models, including Meta, in some of the coding competitions organized on Codeforces, a platform for programming competitions. 3.1 Call 405BOpenAIs GPT-4oand Alibaba’s Qwen 2.5 72B.

DeepSeek V3 also crushes the competition in the Aider Polyglot, a test designed to measure whether a model can successfully write new code that integrates with existing code, among other things.

DeepSeek claims that DeepSeek V3 was trained on a database of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data—1 million tokens equals about 750,000 words.

It’s not just the training kit that’s massive. DeepSeek V3 is huge: 671 billion parameters. (Parameters are internal variables that models use to make predictions or decisions.) This is about 1.6 times larger than Llama 3.1 405B, which has 405 billion parameters.

The number of settings is often (but not always) related to skill; models with more parameters outperform models with fewer parameters. But larger models also require more powerful hardware to operate. A non-optimized version of DeepSeek V3 will need a bank of high-end GPUs to answer queries at a reasonable speed.

Although not the most practical model, the DeepSeek V3 is a success in some respects. DeepSeek was able to train the model for about two months using a data center of Nvidia H800 GPUs — GPUs recently introduced by Chinese companies is restricted From purchases by the US Department of Commerce. The company also claims that it spent a total of $5.5 million to train DeepSeek V3 cuts Development cost of models like OpenAI’s GPT-4.

The downside is that the model’s political views are a bit off. For example, ask DeepSeek V3 about Tiananmen Square and it won’t answer.

DeepSeek V3
Image credits:Any conversation

DeepSeek, being a Chinese company, is subordinated comparison China’s internet regulator to ensure that its models’ responses “embody core socialist values”. A lot Chinese AI systems landing to respond to topics that may incur the ire of regulators, such as speculation about Xi Jinping mode.

DeepSeek, which is in late November opened DeepSeek-R1, the answer OpenAI’s o1 “reasoning” modelIt is an interesting organization. It is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses artificial intelligence to inform trading decisions.

High-Flyer is one of the latest to build its own server clusters for model training reported It has 10,000 Nvidia A100 GPUs and costs 1 billion yen (~$138 million). Founded by computer science graduate Liang Wenfeng, High-Flyer aims to achieve “superintelligent” artificial intelligence through DeepSeek org.

one interview Earlier this year, Wenfeng characterized closed-source AI like OpenAI as a “temporary” moat. “(It) didn’t stop others from catching up,” he said.

Indeed.


TechCrunch has an AI-powered newsletter! Register here to receive in your inbox every Wednesday.






Source link

Leave a Reply

Your email address will not be published. Required fields are marked *