On Monday, Amazon unveiled Nova Act, a new AI model designed to perform actions within a web browser. Currently available as a research preview at nova.amazon.com, the Nova Act SDK lets developers experiment with Nova Act's capabilities and build agents capable of using a web browser to complete tasks such as submitting an out-of-office request through an internal system, putting out the corresponding calendar hold, and even configuring an automatic 'out-of-office' reply.
Building reliable action-capable AI agents
Like many companies selling the idea of an AI agent as the natural next generation of AI-powered products, Amazon lacks a concrete definition of what an AI agent is, other than the generic slogans that have been in circulation for a while now. Amazon negatively characterizes genuine AI agents as not having conversation or knowledge retrieval as their primary focus, thus differentiating them from most current AI-powered assistants.
On a positive characterization of what an AI agent should be, Amazon only tells us that Nova Act is built to "complete tasks and act in a range of digital and physical environments on behalf of the user". This definition is underscored by an ambitious vision according to which AI agents will be capable of "organizing a wedding or handling complex IT tasks to increase business productivity". These examples appear to be meant to exemplify the kind of complex, multi-step workflows Amazon expects its agents to be able to tackle eventually.
Performance and Automation
Although Amazon never mentions its rivals by name, the company does point out that rival browser-using agents are usually evaluated using high-level task benchmarks like OSWorld, WebArena, and WebVoyager. Instead of focusing on the larger picture, Amazon designed Nova Act to prioritize reliability by accurately completing simpler, low-level actions that, according to the company, trip rival models more often, such as date picking or navigating drop-downs and pop-ups.
Moreover, to further promote reliability, the Nova Act SDK allows developers to break complex workflows into atomic commands, add detailed instructions, call APIs, and integrate browser manipulation with Python code. According to Amazon, ensuring agents can reliably complete these low-level tasks first will lay the groundwork for genuine AI agents that do not require constant supervision. Amazon says Nova Act agents can run autonomously in headless mode once configured, functioning as APIs or operating on schedules.
Looking Forward
Amazon describes Nova Act as just the beginning of its vision for useful agents at scale. Commenting on current approaches to "agentic" AI, the company notes that agents capable of handling increasingly complex tasks require reinforcement learning on diverse environments as a supplement to supervised fine-tuning approaches. Amazon also committed to sharing more about its strategy and results as its research progresses.
Comments