Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
A Chinese lab has created what appears to be one of the most powerful “open” artificial intelligence models to date.
model, DeepSeek V3DeepSeek was developed by an artificial intelligence firm and was released on Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial applications.
DeepSeek V3 can handle a range of text-based workloads and tasks, such as encoding, translating, and writing essays and emails from descriptive text.
According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both existing downloadable “open” models and “closed” AI models available only via API. DeepSeek outperforms other models, including Meta, in some of the coding competitions organized on Codeforces, a platform for programming competitions. 3.1 Call 405BOpenAIs GPT-4oand Alibaba’s Qwen 2.5 72B.
DeepSeek V3 also crushes the competition in the Aider Polyglot, a test designed to measure whether a model can successfully write new code that integrates with existing code, among other things.
DeepSeek-V3!
60 tokens/second (3x faster than V2!)
API compatibility is intact
Fully open source models and documentation
671B TN settings
37B enabled settings
14.8T trained on high-quality tokensLlama 3.1 outperforms 405b in almost all indicators https://t.co/OiHu17hBSI pic.twitter.com/jVwJU07dqf
— Chubby♨️ (@kimmonismus) December 26, 2024
DeepSeek claims that DeepSeek V3 was trained on a database of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data—1 million tokens equals about 750,000 words.
It’s not just the training kit that’s massive. DeepSeek V3 is huge: 671 billion parameters. (Parameters are internal variables that models use to make predictions or decisions.) This is about 1.6 times larger than Llama 3.1 405B, which has 405 billion parameters.
The number of settings is often (but not always) related to skill; models with more parameters outperform models with fewer parameters. But larger models also require more powerful hardware to operate. A non-optimized version of DeepSeek V3 will need a bank of high-end GPUs to answer queries at a reasonable speed.
Although not the most practical model, the DeepSeek V3 is a success in some respects. DeepSeek was able to train the model for about two months using a data center of Nvidia H800 GPUs — GPUs recently introduced by Chinese companies is restricted From purchases by the US Department of Commerce. The company also claims that it spent a total of $5.5 million to train DeepSeek V3 cuts Development cost of models like OpenAI’s GPT-4.
The downside is that the model’s political views are a bit off. For example, ask DeepSeek V3 about Tiananmen Square and it won’t answer.
DeepSeek, being a Chinese company, is subordinated comparison China’s internet regulator to ensure that its models’ responses “embody core socialist values”. A lot Chinese AI systems landing to respond to topics that may incur the ire of regulators, such as speculation about Xi Jinping mode.
DeepSeek, which is in late November opened DeepSeek-R1, the answer OpenAI’s o1 “reasoning” modelIt is an interesting organization. It is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses artificial intelligence to inform trading decisions.
High-Flyer is one of the latest to build its own server clusters for model training reported It has 10,000 Nvidia A100 GPUs and costs 1 billion yen (~$138 million). Founded by computer science graduate Liang Wenfeng, High-Flyer aims to achieve “superintelligent” artificial intelligence through DeepSeek org.
one interview Earlier this year, Wenfeng characterized closed-source AI like OpenAI as a “temporary” moat. “(It) didn’t stop others from catching up,” he said.
Indeed.
TechCrunch has an AI-powered newsletter! Register here to receive in your inbox every Wednesday.