Amazon's Nova Act joins OpenAI and Anthropic's computer using AI agents

Amazon unveiled Nova Act, an AI model for browser-based task completion. Available as a research preview, Nova Act prioritizes reliability in executing simple commands rather than higher-level workflows as the key to unlock genuine AI agents that are both capable and autonomous.

On Monday, Amazon unveiled Nova Act, a new AI model designed to perform actions within a web browser. Currently available as a research preview at nova.amazon.com, the Nova Act SDK lets developers experiment with Nova Act's capabilities and build agents capable of using a web browser to complete tasks such as submitting an out-of-office request through an internal system, putting out the corresponding calendar hold, and even configuring an automatic 'out-of-office' reply.

Building reliable action-capable AI agents

Like many companies selling the idea of an AI agent as the natural next generation of AI-powered products, Amazon lacks a concrete definition of what an AI agent is, other than the generic slogans that have been in circulation for a while now. Amazon negatively characterizes genuine AI agents as not having conversation or knowledge retrieval as their primary focus, thus differentiating them from most current AI-powered assistants.

On a positive characterization of what an AI agent should be, Amazon only tells us that Nova Act is built to "complete tasks and act in a range of digital and physical environments on behalf of the user". This definition is underscored by an ambitious vision according to which AI agents will be capable of "organizing a wedding or handling complex IT tasks to increase business productivity". These examples appear to be meant to exemplify the kind of complex, multi-step workflows Amazon expects its agents to be able to tackle eventually.

Performance and Automation

Although Amazon never mentions its rivals by name, the company does point out that rival browser-using agents are usually evaluated using high-level task benchmarks like OSWorld, WebArena, and WebVoyager. Instead of focusing on the larger picture, Amazon designed Nova Act to prioritize reliability by accurately completing simpler, low-level actions that, according to the company, trip rival models more often, such as date picking or navigating drop-downs and pop-ups.

Moreover, to further promote reliability, the Nova Act SDK allows developers to break complex workflows into atomic commands, add detailed instructions, call APIs, and integrate browser manipulation with Python code. According to Amazon, ensuring agents can reliably complete these low-level tasks first will lay the groundwork for genuine AI agents that do not require constant supervision. Amazon says Nova Act agents can run autonomously in headless mode once configured, functioning as APIs or operating on schedules.

Looking Forward

Amazon describes Nova Act as just the beginning of its vision for useful agents at scale. Commenting on current approaches to "agentic" AI, the company notes that agents capable of handling increasingly complex tasks require reinforcement learning on diverse environments as a supplement to supervised fine-tuning approaches. Amazon also committed to sharing more about its strategy and results as its research progresses.

Subscribe

Amazon's Nova Act joins OpenAI and Anthropic's computer using AI agents

Building reliable action-capable AI agents

Performance and Automation

Looking Forward

Comments

Read Next

Humanloop's team joins Anthropic as competition in the enterprise AI market tightens

Cohere secures $500M at $6.8B valuation, adds Joelle Pineau as Chief AI Officer

Former Google engineer launches Continua with $8M to transform group chats with AI