We are entering the third phase of generative AI. First came the chatbots, followed by the assistants. Now we are beginning to see agents: systems that aspire to greater autonomy and can work in “teams” or use tools to accomplish complex tasks.
The latest hot product is OpenAI’s ChatGPT agent. This combines two pre-existing products (Operator and Deep Research) into a single more powerful system which, according to the developer, “thinks and acts”.
These new systems represent a step up from earlier AI tools. Knowing how they work and what they can do – as well as their drawbacks and risks – is rapidly becoming essential.
From chatbots to agents
ChatGPT launched the chatbot era in November 2022, but despite its huge popularity the conversational interface limited what could be done with the technology.
Enter the AI assistant, or copilot. These are systems built on top of the same large language models that power generative AI chatbots, only now designed to carry out tasks with human instruction and supervision.
Agents are another step up. They are intended to pursue goals (rather than just complete tasks) with varying degrees of autonomy, supported by more advanced capabilities such as reasoning and memory.
Multiple AI agent systems may be able to work together, communicating with each other to plan, schedule, decide and coordinate to solve complex problems.
Agents are also “tool users” as they can also call on software tools for specialised tasks – things such as web browsers, spreadsheets, payment systems and more.
A year of rapid development
Agentic AI has felt imminent since late last year. A big moment came last October, when Anthropic gave its Claude chatbot the ability to interact with a computer in much the same way a human does. This system could search multiple data sources, find relevant information and submit online forms.
Other AI developers were quick to follow. OpenAI released a web browsing agent named Operator, Microsoft announced Copilot agents, and we saw the launch of Google’s Vertex AI and Meta’s Llama agents.
Earlier this year, the Chinese startup Monica demonstrated its Manus AI agent buying real estate and converting lecture recordings into summary notes. Another Chinese startup, Genspark, released a search engine agent that returns a single-page overview (similar to what Google does now) with embedded links to online tasks such as finding the best shopping deals. Another startup, Cluely, offers a somewhat unhinged “cheat at anything” agent that has gained attention but is yet to deliver meaningful results.
Not all agents are made for general-purpose activity. Some are specialised for particular areas.
Coding and software engineering are at the vanguard here, with Microsoft’s Copilot coding agent and OpenAI’s Codex among the frontrunners. These agents can independently write, evaluate and commit code, while also assessing human-written code for errors and performance lags.
Search, summarisation and more
One core strength of generative AI models is search and summarisation. Agents can use this to carry out research tasks that might take a human expert days to complete.
OpenAI’s Deep Research tackles complex tasks using multi-step online research. Google’s AI “co-scientist” is a more sophisticated multi-agent system that aims to help scientists generate new ideas and research proposals.