To produce the SDGs, IIUC they clustered the world's problems as an international collaborative exercise; to succeed the MDGs (2000-2015).
Each country voluntarily produces an annual SDG report on their progress on their Targets according to the Indicators.
IMHO,
Priorities should include clean energy and AI efficiency,
given the growth projections for energy use of AI (and our electrical bills given continued expected supply shortages of energy)
Which real-word SDG tasks can be AI eval'd?
Snuggly733 hours ago
Apparently producing a react component that returns a piece of html with aria tags set up. Long horizon my ass.
westurner3 hours ago
Did the LLM in that case suggest adopting an open-source UI library that already has tests for and implements support for W3C ARIA accessibility features, like React-Aria or other alternatives?
Or did it just do the job as prompted and not mention suggestions for continuous improvement like reusing tested open source components?
Snuggly733 hours ago
Not sure how it went in their tests - I've tried Opus and GPT5 and it was few lines of react + tests, so I guess 'no'
They reported the competitors' performance for a change. Especially curious because OpenAI is not first. Kudos?
Claude's low noise message style and good commonsense baiting people into thinking they can rely on it for hard stuff.
"GDPVal: Measuring AI model performance on real world economically viable tasks" (2025) https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf1...
GDP? GlobalGoals ... The Sustainable Development Goals (SDGs) include 17 goals, 169 targets, and over 230 indicators.
For strategic alignment,
Strategic alignment: https://en.wikipedia.org/wiki/Strategic_alignment
Sustainable Development Goals: https://en.wikipedia.org/wiki/Sustainable_Development_Goals
To produce the SDGs, IIUC they clustered the world's problems as an international collaborative exercise; to succeed the MDGs (2000-2015).
Each country voluntarily produces an annual SDG report on their progress on their Targets according to the Indicators.
IMHO, Priorities should include clean energy and AI efficiency, given the growth projections for energy use of AI (and our electrical bills given continued expected supply shortages of energy)
Which real-word SDG tasks can be AI eval'd?
Apparently producing a react component that returns a piece of html with aria tags set up. Long horizon my ass.
Did the LLM in that case suggest adopting an open-source UI library that already has tests for and implements support for W3C ARIA accessibility features, like React-Aria or other alternatives?
Or did it just do the job as prompted and not mention suggestions for continuous improvement like reusing tested open source components?
Not sure how it went in their tests - I've tried Opus and GPT5 and it was few lines of react + tests, so I guess 'no'
Couldn’t find their open source evals dataset
https://huggingface.co/datasets/openai/gdpval/viewer/default...
thanks!
[flagged]