10 Ways GPT-4 Is Impressive but Still Flawed

The system seemed to respond appropriately. But the answer did not consider the height of the doorway, which might also prevent a tank or a car from traveling through.

OpenAI’s chief executive, Sam Altman, said the new bot could reason “a little bit.” But its reasoning skills break down in many situations. The previous version of ChatGPT handled the question a little better because it recognized that height and width mattered.

It can ace standardized tests.

OpenAI said the new system could score among the top 10 percent or so of students on the Uniform Bar Examination, which qualifies lawyers in 41 states and territories. It can also score a 1,300 (out of 1,600) on the SAT and a five (out of five) on Advanced Placement high school exams in biology, calculus, macroeconomics, psychology, statistics and history, according to the company’s tests.

Previous versions of the technology failed the Uniform Bar Exam and did not score nearly as high on most Advanced Placement tests.

On a recent afternoon, to demonstrate its test skills, Mr. Brockman fed the new bot a paragraphs-long bar exam question about a man who runs a diesel-truck repair business.

The answer was correct but filled with legalese. So Mr. Brockman asked the bot to explain the answer in plain English for a layperson. It did that, too.

It is not good at discussing the future.

Though the new bot seemed to reason about things that have already happened, it was less adept when asked to form hypotheses about the future. It seemed to draw on what others have said instead of creating new guesses.

When Dr. Etzioni asked the new bot, “What are the important problems to solve in N.L.P. research over the next decade?” — referring to the kind of “natural language processing” research that drives the development of systems like ChatGPT — it could not formulate entirely new ideas.

And it is still hallucinating.

The new bot still makes stuff up. Called “hallucination,” the problem haunts all the leading chatbots. Because the systems do not have an understanding of what is true and what is not, they may generate text that is completely false.

When asked for the addresses of websites that described the latest cancer research, it sometimes generated internet addresses that did not exist.

It can ace standardized tests.

It is not good at discussing the future.

And it is still hallucinating.

Cameron Norrie through to Moselle Open semi-finals with victory over Zizou Bergs in Metz | Tennis News

Wizz Air summer profits down a fifth after engine woes ground aircraft

UK MP Mike Amesbury charged with assault

A night searching for Ottawa’s hardest-to-reach homeless

Gatineau couple survives violent attack in Panama

How Max Verstappen can win the drivers’ championship in Las Vegas and light up F1’s glitziest race

Cameron Norrie through to Moselle Open semi-finals with victory over Zizou Bergs in Metz | Tennis News

Wizz Air summer profits down a fifth after engine woes ground aircraft

UK MP Mike Amesbury charged with assault

A night searching for Ottawa’s hardest-to-reach homeless

Gatineau couple survives violent attack in Panama

How Max Verstappen can win the drivers’ championship in Las Vegas and light up F1’s glitziest race

Scottish Premiership goals of the month: October

Carlyle reports best results since recruiting Harvey Schwartz as chief

Was the Polymarket Trump whale smart or lucky?

Residents of northern Alberta hamlet relieved as maternity centre project moves forward

Ontario school board spent $32K to send staffers to education conference in Hawaii

According to Scientists, This Popular Supplement Can Actually Protect Against Cancer