Alibaba Group has introduced Page Agent, an open-source JavaScript library that enables natural language control of web interfaces directly within the browser. Unlike traditional automation tools that operate externally, Page Agent integrates into the webpage, reading the live Document Object Model (DOM) as text. This approach, termed DOM dehydration, converts the DOM into a compact text map, allowing smaller language models to precisely identify and interact with elements like buttons and forms. The library is model-agnostic, supporting any OpenAI-compatible endpoint, and is best suited for applications where developers can embed the code, such as SaaS copilots or smart form-filling tools. AI
IMPACT Enables more integrated AI copilots and automation within web applications by leveraging in-browser DOM manipulation.
RANK_REASON This is a new open-source library for web automation, not a frontier model release or significant industry shift.
- Alibaba Group
- Browser Usage
- Document Object Model
- FlatDomTree
- Javascript
- MIT License
- OpenAI
- Page Agent
- @page-agent/core
- @page-agent/page-controller
- PageController
- playwright
- puppeteer
- selenium
- SimulatorMask
- TypeScript
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →