ACTION_ID: scrape_web_page_using_firecrawl NAME: Scrape Web Page Content CATEGORY: scrape CREDITS: 0.1 Scrape web pages for information using a text prompt. INDEX: 1. Inputs 2. Outputs 3. How to configure 4. Key notes 5. Where it fits in a workflow 6. When to use 7. When not to use ================================================================================ 1. INPUTS ================================================================================ url (type: url, required) — URL URL of the page to scrape. extract_only_main_content (type: string, required) — Extract Only Main Content Whether to extract only the main body of the page (stripping navigation, footers, sidebars, etc). Allowed values: "true" / "false". Standard config: "true". extract_only_text (type: string, required) — Extract Only Text Whether to return plain text only (no HTML markup, no markdown). Allowed values: "true" / "false". Standard config: "true". include_html_content (type: string, required) — Include HTML Content Whether to include the raw HTML in the output. Allowed values: "true" / "false". Standard config: "false". include_page_links (type: string, required) — Include Page Links Whether to return all links found on the page in the output. Allowed values: "true" / "false". Standard config: "false". proxy_type (type: string, optional) — Proxy Type Which proxy strategy to use. Default: "auto". Allowed values: - "basic" — Proxies for sites with none-to-basic anti-bot. Fast and usually works. - "stealth" — Proxies for sites with advanced anti-bot solutions. Slower but more reliable. - "auto" — Automatically retries with stealth if basic fails. Recommended default. ================================================================================ 2. OUTPUTS ================================================================================ page_title (type: string) — Page Title Title of the scraped page. page_url (type: url) — Page URL Resolved URL of the scraped page (after redirects). content (type: string) — Content Firecrawl's page-scraping result — the extracted body text of the page (plain text when `extract_only_text` is `"true"`, otherwise markdown). html (type: string) — HTML Raw HTML of the page as a single string. Only populated when `include_html_content` is `"true"`. links (type: raw_array) — Links All links found on the page. Only populated when `include_page_links` is `"true"`. metadata (type: raw_array) — Metadata Page metadata such as meta title, meta description, og:title, og:description, etc. ================================================================================ 3. HOW TO CONFIGURE ================================================================================ Configure Action body: { "inputs": { "url": "{{input.url}}", "extract_only_main_content": "true", "extract_only_text": "true", "include_html_content": "false", "include_page_links": "false", "proxy_type": "auto" } } Standard config (recommended starting point): - extract_only_main_content = "true" (extract only main body content) - extract_only_text = "true" (plain text, no HTML or markdown) - include_html_content = "false" (skip raw HTML) - include_page_links = "false" (skip page links) - proxy_type = "auto" (retry with stealth if basic fails) Override the flags only when you specifically need the raw HTML (e.g. you're going to parse a structured table) or the page-link list (e.g. crawling outbound links from the page). ================================================================================ 4. KEY NOTES ================================================================================ - Pair with `format_data_using_js_expression` to pull structured data out of a scraped page. The `content` / `metadata` / `links` outputs are nested raw_array blobs — use `format_data_using_js_expression` to write a JS expression that extracts specific fields (e.g. pricing tiers from a pricing page, the list of customer logos from a homepage, all email addresses on a contact page) into a raw array, then push that raw array into `raw_to_structured_array` if you want one row per extracted item on a downstream sheet. See https://floqer.com/docs/action-detail/format_data_using_js_expression.txt and https://floqer.com/docs/action-detail/raw_to_structured_array.txt. ================================================================================ 5. WHERE IT FITS IN A WORKFLOW ================================================================================ Pattern (page scrape -> per-prospect insight): scrape a page tied to the row (e.g. the prospect's company website, a job posting, a news article), then run an LLM step over the scraped content to produce a per-row insight or score for outreach. input (row data — typically a URL on the row) -> scrape_web_page_using_firecrawl (returns content as a string, plus metadata / links as nested raw_arrays) -> llm_models (prompt: extract per-prospect insights or score company fit from the scraped content) -> outreach. ================================================================================ 6. WHEN TO USE ================================================================================ Use scrape_web_page_using_firecrawl to grab the content of a single known URL — typically tied to the row (the prospect's company website, a specific landing page, a news article, a job posting) — for downstream LLM processing or structured-data extraction. ================================================================================ 7. WHEN NOT TO USE ================================================================================ Need step-by-step browser navigation -> ai_web_navigator (https://floqer.com/docs/action-detail/ai_web_navigator.txt) Need open-ended web research -> llm_web_agents (https://floqer.com/docs/action-detail/llm_web_agents.txt) ================================================================================ This file is maintained manually. Last updated: 2026-04-30. Full interactive reference: https://floqer.com/docs/reference Action catalog: https://floqer.com/docs/action-catalog.txt