OpenAI’s agent tool may be nearing release

OpenAI may be close to releasing an AI tool that can monitor your computer and take actions on your behalf.

Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, claims Unearthing evidence for long-rumored OpenAI Operator instrument. Publications including Bloomberg earlier informed in the Operator said to beagent” system is capable of autonomously handling tasks such as coding and booking travel.

according to According to reports, OpenAI is targeting January as the launch month for Operator. The code discovered by Blaho this weekend adds credence to this report.

OpenAIs ChatGPT The client for macOS has gained currently hidden options for Blaho to define shortcuts for “Change Operator” and “Force Operator”. OpenAI has added references to Operator on its website, Blaho said — references that aren’t yet publicly visible.

The OpenAI website already has references to the Operator/OpenAI CUA (Computer Usage Agent) – “Operator System Card Table”, “Operator Research Evaluation Table” and “Operator Abandonment Rate Table”.

Claude 3.5 Sonnet Computer usage, Google Mariner, etc. including comparison with .

(see tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

According to Blaho, OpenAI’s website has yet-to-be-released charts comparing Operator’s performance to other computer-based AI systems. Tables can be placeholders. However, if the numbers are accurate, they suggest that they are not 100% reliable depending on the operator’s task.

The OpenAI website already has references to the Operator/OpenAI CUA (Computer Usage Agent) – “Operator System Card Table”, “Operator Research Evaluation Table” and “Operator Abandonment Rate Table”.

Claude 3.5 Sonnet Computer usage, Google Mariner, etc. including comparison with .

(see tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

OSWorld’s “OpenAI Computer Use Agent (CUA)” — possibly an AI model that augments the Operator — beats Anthropic by 38.1% at OSWorld, which tries to mimic a real computer environment. computer control model but far less than the 72.4% score of people. OpenAI CUA outperforms humans in WebVoyager, which assesses artificial intelligence’s ability to navigate and interact on websites. But according to leaked benchmarks, the model falls short of human-level scores on WebArena, another web-based benchmark.

The operator also struggles with tasks that a human can easily perform if the leak is to be believed. In a test that tasked the Operator with registering with a cloud provider and launching a virtual machine, the Operator succeeded only 60% of the time. Tasked with creating a Bitcoin wallet, Operator succeeded only 10% of the time.

OpenAI’s imminent entry into the artificial intelligence agent space comes as competitors, including the aforementioned Anthropic. Googleand others are developing plays for the emerging segment. There may be AI agents risky and speculativebut the tech giants are already presenting them as “benevolent”. the next big thing In AI. according to According to analyst firm Markets and Markets, the market for artificial intelligence agents could be worth $47.1 billion by 2030.

Agents today are pretty primitive. But some experts have expressed concerns about their safety if the technology improves rapidly.

One of the leaked charts shows that the Operator performed well in selected security assessments, including tests that try to prevent the system from performing “illegal activities” and searching for “sensitive personal data.” According to informationsecurity testing is one of the reasons for Operator’s long development cycle. Recently X postWojciech Zaremba, co-founder of OpenAI, criticized Anthropic for releasing an agent he claimed lacked security mitigations.

“I can only imagine the backlash if OpenAI did a similar release,” Zaremba said.

It should be noted that OpenAI has been criticized By artificial intelligence researchers, including former employees, for allegedly de-emphasizing security work in favor of quickly producing the technology.

Source link

Leave a ReplyCancel Reply