Web

Arcade Optimized

Arcade.dev LLM tools for web scraping related tasks

Author:Arcade

Version:2.0.1

Auth:No authentication required

6tools

6require secrets

Arcade.dev provides a powerful toolkit designed for web scraping tasks, enabling developers to efficiently manage and extract data from websites. This toolkit leverages the Firecrawl API to provide a range of functionalities for both synchronous and asynchronous crawling.

Capabilities:

Initiate and manage web crawls, with options for both synchronous and asynchronous operations.
Retrieve crawl data and status updates for ongoing or recently completed tasks.
Map entire websites starting from a single URL.
Scrape specific URLs and receive data in various formats.

Secrets:

API key required for accessing Firecrawl, named FIRECRAWL_API_KEY.

Available tools(6)

6 of 6

Tool name	Description	Secrets
Web.CancelCrawl	Cancel an asynchronous crawl job that is in progress using the Firecrawl API.	1
Web.CrawlWebsite	Crawl a website using Firecrawl. If the crawl is asynchronous, then returns the crawl ID. If the crawl is synchronous, then returns the crawl data.	1
Web.GetCrawlData	Get the data of a Firecrawl 'crawl' that is either in progress or recently completed.	1
Web.GetCrawlStatus	Get the status of a Firecrawl 'crawl' that is either in progress or recently completed.	1
Web.MapWebsite	Map a website from a single URL to a map of the entire website.	1
Web.ScrapeUrl	Scrape a URL using Firecrawl and return the data in specified formats.	1

Selected tools

No tools selected.

Click "Show all tools" to add tools.

Requirements

Select tools to see requirements

Web.CancelCrawl

Add to selected tools

Cancel an asynchronous crawl job that is in progress using the Firecrawl API.

Parameters

Parameter	Type	Req.	Description
`crawl_id`	`string`	Required	The ID of the asynchronous crawl job to cancel

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:json— Cancellation status information

Web.CrawlWebsite

Add to selected tools

Crawl a website using Firecrawl. If the crawl is asynchronous, then returns the crawl ID. If the crawl is synchronous, then returns the crawl data.

Parameters

Parameter	Type	Req.	Description
`url`	`string`	Required	URL to crawl
`exclude_paths`	`array<string>`	Optional	URL patterns to exclude from the crawl
`include_paths`	`array<string>`	Optional	URL patterns to include in the crawl
`max_depth`	`integer`	Optional	Maximum depth to crawl relative to the entered URL
`ignore_sitemap`	`boolean`	Optional	Ignore the website sitemap when crawling
`limit`	`integer`	Optional	Limit the number of pages to crawl
`allow_backward_links`	`boolean`	Optional	Enable navigation to previously linked pages and enable crawling sublinks that are not children of the 'url' input parameter.
`allow_external_links`	`boolean`	Optional	Allow following links to external websites
`webhook`	`string`	Optional	The URL to send a POST request to when the crawl is started, updated and completed.
`async_crawl`	`boolean`	Optional	Run the crawl asynchronously

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:json— Crawl status and data

Web.GetCrawlData

Add to selected tools

Get the data of a Firecrawl 'crawl' that is either in progress or recently completed.

Parameters

Parameter	Type	Req.	Description
`crawl_id`	`string`	Required	The ID of the crawl job

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:json— Crawl data information

Web.GetCrawlStatus

Add to selected tools

Get the status of a Firecrawl 'crawl' that is either in progress or recently completed.

Parameters

Parameter	Type	Req.	Description
`crawl_id`	`string`	Required	The ID of the crawl job

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:json— Crawl status information

Web.MapWebsite

Add to selected tools

Map a website from a single URL to a map of the entire website.

Parameters

Parameter	Type	Req.	Description
`url`	`string`	Required	The base URL to start crawling from
`search`	`string`	Optional	Search query to use for mapping
`ignore_sitemap`	`boolean`	Optional	Ignore the website sitemap when crawling
`include_subdomains`	`boolean`	Optional	Include subdomains of the website
`limit`	`integer`	Optional	Maximum number of links to return

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:json— Website map data

Web.ScrapeUrl

Add to selected tools

Scrape a URL using Firecrawl and return the data in specified formats.

Parameters

Parameter	Type	Req.	Description
`url`	`string`	Required	URL to scrape
`formats`	`array<string>`	Optional	Formats to retrieve. Defaults to ['markdown']. `markdownhtmlrawHtmllinksscreenshotscreenshot@fullPage`
`only_main_content`	`boolean`	Optional	Only return the main content of the page excluding headers, navs, footers, etc.
`include_tags`	`array<string>`	Optional	List of tags to include in the output
`exclude_tags`	`array<string>`	Optional	List of tags to exclude from the output
`wait_for`	`integer`	Optional	Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.
`timeout`	`integer`	Optional	Timeout in milliseconds for the request

Requirements

Secrets:FIRECRAWL_API_KEY

Output

Type:json— Scraped data in specified formats

Get Building

Use tools hosted on Arcade Cloud

Arcade tools are hosted by our cloud platform and ready to be used in your agents. Learn how.

Self Host Arcade tools

Arcade tools can be self-hosted on your own infrastructure. Learn more about self-hosting.

pip install arcade_web

Last updated on January 30, 2026

About Arcade