A quick trial of claude's new computer use

Yesterday, Anthropic released a new feature for their Claude AI chatbot: the ability to use a computer. This is a big step for AI assistants, and I thought I'd quickly try it out and see what it's like.

All of the computer use is via tools: you prompt Claude and it calls agentic computer-specific tools: moving the mouse, typing, creating a file, taking a screenshot, and running bash commands. However, this is just the API surface. Their client reference implementation isolates the system in a docker container, but because it's all API calls, you could just as easily run it on your own machine. Someone, of course, already has with agent.exe.

For now, we'll just use the docker version to get a feel for things.

Browsing

The example that it gave was asking it to book flight tickets. It opened up firefox (the only browser provided in the docker container) and went to google flights and had an OK time selecting flights. However, this was a bit slow. I wonder if it'd be faster on more complicated browsing tasks if we did beam search: looking at a few options and then selecting the best one over and over again. While I may be able to serially select flights faster than it, I'm not sure I could do better than it on something that required looking at multiple sites with different filters, such as finding flights and activities and everything else needed for a trip and jointly optimizing, whereas Claude could perform hundreds of queries in parallel and have sync operations.

This is, of course, a programming blog, so the last thing I did today with it was ask it some questions.

Can you solve my earlier blog post?

YES! It solved that whole blog post with only a hint as to where the xcode project directory was mounted in the docker container and instrucitons to use ripgrep instead of grep. It has a bit of a problem submitting commands without a sensible timeout or checking for senses of scale, but if you ensure that it either checks or the datasets and tools you're working with will respond properly, it can blitz out commands and iterate quickly enough to turn my half day of work into a few minutes of mind-bending binary mining.