Web
Arcade.dev LLM tools for web scraping related tasks
2.0.1Arcade.dev provides a powerful toolkit designed for web scraping tasks, enabling developers to efficiently manage and extract data from websites. This toolkit leverages the Firecrawl API to provide a range of functionalities for both synchronous and asynchronous crawling.
Capabilities:
- Initiate and manage web crawls, with options for both synchronous and asynchronous operations.
- Retrieve crawl data and status updates for ongoing or recently completed tasks.
- Map entire websites starting from a single URL.
- Scrape specific URLs and receive data in various formats.
Secrets:
- API key required for accessing Firecrawl, named
FIRECRAWL_API_KEY.
Available tools(6)
| Tool name | Description | Secrets | |
|---|---|---|---|
Cancel an asynchronous crawl job that is in progress using the Firecrawl API. | 1 | ||
Crawl a website using Firecrawl. If the crawl is asynchronous, then returns the crawl ID.
If the crawl is synchronous, then returns the crawl data. | 1 | ||
Get the data of a Firecrawl 'crawl' that is either in progress or recently completed. | 1 | ||
Get the status of a Firecrawl 'crawl' that is either in progress or recently completed. | 1 | ||
Map a website from a single URL to a map of the entire website. | 1 | ||
Scrape a URL using Firecrawl and return the data in specified formats. | 1 |
Selected tools
No tools selected.
Click "Show all tools" to add tools.
Requirements
Select tools to see requirements
Web.CancelCrawl
Cancel an asynchronous crawl job that is in progress using the Firecrawl API.
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
crawl_id | string | Required | The ID of the asynchronous crawl job to cancel |
Requirements
Output
json— Cancellation status informationWeb.CrawlWebsite
Crawl a website using Firecrawl. If the crawl is asynchronous, then returns the crawl ID. If the crawl is synchronous, then returns the crawl data.
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
url | string | Required | URL to crawl |
exclude_paths | array<string> | Optional | URL patterns to exclude from the crawl |
include_paths | array<string> | Optional | URL patterns to include in the crawl |
max_depth | integer | Optional | Maximum depth to crawl relative to the entered URL |
ignore_sitemap | boolean | Optional | Ignore the website sitemap when crawling |
limit | integer | Optional | Limit the number of pages to crawl |
allow_backward_links | boolean | Optional | Enable navigation to previously linked pages and enable crawling sublinks that are not children of the 'url' input parameter. |
allow_external_links | boolean | Optional | Allow following links to external websites |
webhook | string | Optional | The URL to send a POST request to when the crawl is started, updated and completed. |
async_crawl | boolean | Optional | Run the crawl asynchronously |
Requirements
Output
json— Crawl status and dataWeb.GetCrawlData
Get the data of a Firecrawl 'crawl' that is either in progress or recently completed.
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
crawl_id | string | Required | The ID of the crawl job |
Requirements
Output
json— Crawl data informationWeb.GetCrawlStatus
Get the status of a Firecrawl 'crawl' that is either in progress or recently completed.
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
crawl_id | string | Required | The ID of the crawl job |
Requirements
Output
json— Crawl status informationWeb.MapWebsite
Map a website from a single URL to a map of the entire website.
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
url | string | Required | The base URL to start crawling from |
search | string | Optional | Search query to use for mapping |
ignore_sitemap | boolean | Optional | Ignore the website sitemap when crawling |
include_subdomains | boolean | Optional | Include subdomains of the website |
limit | integer | Optional | Maximum number of links to return |
Requirements
Output
json— Website map dataWeb.ScrapeUrl
Scrape a URL using Firecrawl and return the data in specified formats.
Parameters
| Parameter | Type | Req. | Description |
|---|---|---|---|
url | string | Required | URL to scrape |
formats | array<string> | Optional | Formats to retrieve. Defaults to ['markdown'].markdownhtmlrawHtmllinksscreenshotscreenshot@fullPage |
only_main_content | boolean | Optional | Only return the main content of the page excluding headers, navs, footers, etc. |
include_tags | array<string> | Optional | List of tags to include in the output |
exclude_tags | array<string> | Optional | List of tags to exclude from the output |
wait_for | integer | Optional | Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load. |
timeout | integer | Optional | Timeout in milliseconds for the request |
Requirements
Output
json— Scraped data in specified formats