AI benchmarking organization OpenAI has been criticized for waiting to disclose funding

An organization developing math benchmarks for artificial intelligence didn’t disclose funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grant-making foundation, announced on December 20 that OpenAI is supporting the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure artificial intelligence’s mathematical skills, was one of the benchmarks OpenAI used to showcase its upcoming advanced AI. o3.

a post A contractor for Epoch AI, who goes by the username “Meemi” on the LessWrong forum, says that many contributors to the FrontierMath benchmark were not informed of OpenAI’s involvement until it was made public.

“Communication on this has been opaque,” Meemi said. “In my opinion, Epoch AI should disclose OpenAI funding and contractors should have transparent information about the potential for their work to be used for opportunities when they choose to work on the benchmark.”

on social media, some of them users raised concerns that confidentiality could damage FrontierMath’s reputation as an objective benchmark. In addition to supporting FrontierMath, OpenAI has access to many of the problems and solutions in the benchmark—a fact Epoch AI didn’t disclose before December 20, when o3 was announced.

In response to Meemi’s post, Tamay Besiroglu, deputy director of Epoch AI and one of the organization’s co-founders, claimed that FrontierMath’s integrity had not been compromised, but acknowledged that Epoch AI “made a mistake” by not being more so. transparent.

“We were restricted from disclosing the partnership until the time that 3 started, and in retrospect we should have negotiated more seriously to be transparent to the benchmark contributors as soon as possible,” Besiroglu said. “Our mathematicians deserved to know who had access to their work. While we are contractually limited in what we can say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.”

Basiroglu added that while OpenAI has access to FrontierMath, it has a “verbal agreement” with Epoch AI not to use the FrontierMath problem to train its AI. (Teaching AI on FrontierMath is similar teach for the test.) Epoch AI also has a “separate capture suite” that serves as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.

“OpenAI … fully supported our decision to maintain a separate, unseen storage suite,” Besiroglu wrote.

However, it was Epoch AI lead mathematician Ellot Glazer who muddied the waters It was mentioned in a post on Reddit Epoch AI has not been able to independently verify OpenAI’s FrontierMath o3 results.

“My personal opinion is that (OpenAI’s) scores are legitimate (ie, they didn’t train on the database) and they have no incentive to lie about their internal benchmarks,” Glazer said. “However, we cannot guarantee them until our independent assessment is complete.”

It is an epic yet else example The challenge of developing empirical benchmarks for evaluating AI and securing the necessary resources for benchmark development without creating the perception of conflicts of interest.

Source link

AI benchmarking organization OpenAI has been criticized for waiting to disclose funding

Leave a ReplyCancel Reply

In the motion to dismiss, the Chatbot platform character AI claims that it is protected by the first amendment

Kaley Cuoco, Tom Pelphrey’s Daughter Matilda Album: Photos

The fight against AG1—known for its powerful, $100-a-month green supplement—is coming to a vending machine or grocery store near you.

Leave a ReplyCancel Reply

Trending now

In the motion to dismiss, the Chatbot platform character AI claims that it is protected by the first amendment

Kaley Cuoco, Tom Pelphrey’s Daughter Matilda Album: Photos

The fight against AG1—known for its powerful, $100-a-month green supplement—is coming to a vending machine or grocery store near you.