Aleksei Ivanov

Actually, Claude Computer Use is not there yet

We are all in awe about agentic AI nowadays. “It can do this” and “it can do that” all by itself.

Except, does it really?

For me personally, I have never had a moment where I told an agent to do a thing and it just nailed it perfectly, exceeding my expectations tenfold, not even onefold.

Chatting with Claude? Yes, it produces amazing zero-shot stuff. But Multify-step agentic mode? So far, never.

And I know what they say: “this is the worst it will ever be”. But on the point of computer use I would actually argue that it will not, and the reason is simple:

LLMs are just too slow.

That and the fact that language models are good for modelling text, but not complex state representation in time and space. And visual information on screen is in 2D space.

So for the true computer use to be truly viable there needs to be a much better breakthrough than simply teaching a language model to take screenshots and click on buttons.

And this is why I am still convinced that the technology is not there yet.