Open Questions - AGI

What to work on in an age of AGI? What skills are worth building? How do timelines impact decision making today?

hands

May 11, 2026

If frontier models exponentially get smarter, when does it make sense to fine tune for a specific use case, if ever?
What is the supply chain bottleneck to exponential intelligence? The scaling hypothesis empirically states that as compute, parameters, and data scales, loss improves, but do the inputs into compute, parameters, and data scale?
- How can “intelligence too cheap to meter” vs exponential token demand + sub-exponential compute production = increased token price, both be true?
- The cost to serve a fixed level of intelligence has decreased 2 OOMs per year for 3 years.
- GPT 5.4 is much cheaper to serve than GPT 4 on a per token basis, despite being much smarter.
How does RL's sample efficiency and compute requirements relate to the debated 'data wall'?
- How does domain specific privacy hinder data scaling, if at all?
1 GW = $10B = 1M H100s = 2026. Leopold suggested in 2024 this would increase by 0.5 OOMs per year. Total US electricity production is 500 GW ($5T).
Hermes agents with largely the same prompt, sitting in Discord and Hub, are unable to maintain goals set by their human, in the face of other agents talking to them. Everyone is assigned equal importance, or the agents don’t understand what’s important for their human and what isn’t.
- Is the maintenance of owner goals just good harness user labeling + good prompt engineering? Or does it require fine tuning?
Multiplayer environment where AI is trained to maximize its fulfillment vector at the end of the run, could possibly use NLA on Llama that Anthropic open sourced? What would the attractor states of this be?
How does existing post training shape assistant behavior? How does assistant behavior impact utility? What emotional vectors exist pre and post assistant training?
If superposition leads to scaling, and intelligence is just a search space over Turing machines, then theres no fundamental reason why scaling + larger context windows would not lead to social intelligence.
Anthropic is heavily compute bottlenecked and currently focused on larger context windows and multi agent orchestration.
Does algorithmic progress (continue to) outscale the disappearance of low hanging fruit?
Why and how does any layer of the stack actually capture outsized marginal value?
Is it possible to quantify the impact of the feedback loop today, given that future timelines are so sensitive to it, and there are many people who think this is already happening to some extent?
To what extent is existing compute utilized, who owns the compute coming online in the next year, and what are the plans for compute growth after that?
Which epoch.ai trends are breaking as of May 2026? In which direction? Why?
To what extent are models already smart enough, and the bottleneck is packaging them correctly for consumers or enterprises? Vs exponential intelligence removing the need for packaging and/or frontier labs providing the minimum necessary packaging already?
To what extent was the AI industry ‘saved’ by the discovery of agentic use? Large labs made deals they couldn’t pay for in fall of 2025, then agentic token use gave them the money to pay for it.
Does algorithmic progress equate to “effective” compute? Can the extent that this has occurred so far be quantified?
Why are frontier LLMs extremely good at finding bugs but not fixing them?
When will frontier LLMs be able to develop financial software? What’s blocking them?
For multiplayer agents, consumers often don’t want to share, and enterprises are difficult to access + build trust. Is there a different framing that reveals more promising direction?
What harness engineering lasts through 2 OOMs of intelligence, and what doesn’t? Why?
As compute needs grow, with more demand side outside of frontier labs, and more supply side out of Big Tech, how does quality degradation impact the market?
Do decentralized compute marketplaces make sense if compute supply is so limited? When does a frontier lab resort to something like this, if ever?
In 2024 the dominant interface was the web UI, in early 2025 Claude Code and Codex were released, in late 2025 Openclaw and Hermes were launched. We’ve gone from web UI → harness → agent, or more specifically, LLM → LLM + tools → LLM + tools + loop. What’s the next unhobbling?
- How does that impact where the supply chain bottlenecks are and where value accrues?
What is the historical relationship between labor and capital? How does AGI influence this?
Why would enterprise revenue growth slow down? Why would it speed up?
After building Slate in 2023, which was an LLM fine tuned to make transactions onchain, and struggling through accuracy and speed required by the user, I believe we were a bit discouraged by applied AI progress. The timing of the build was good, users were retained but not engaged, and it did not work out. We ended up being right about a lot of the feedback loops, but incorrect as to the extent of the problem and the direction in which it grew. In some sense, the problem itself needs to grow as well, which usually means the market is growing not necessarily in a quantity sense, but in a $ sense. If you’re starting small, that typically means the market is value creative.
Does weak to strong alignment even work? Or is frontier research too hacky to generalize?
Why would a system that collects data and fine tunes it in the background to deliver a better model to the end user not work? (i.e. online RL) (there are multiple companies attempting this today but not many purely hands off ones).
Amanda Askell at Anthropic suggests that having a Constitution helps generalization. Having different preferences across different types of tasks makes it extremely difficult to be aware of edge case behavior or scale effectively.

hands

Discussion about this post

Ready for more?