Scrapfly · Capability

Web Data Collection

Unified capability for web data collection workflows using Scrapfly's scraping, screenshot, and extraction APIs. Enables data engineers and researchers to collect, extract, and transform web content at scale with anti-bot bypass, proxy rotation, and AI-assisted extraction.

Run with Naftiko Web ScrapingData CollectionData ExtractionScreenshotsAnti-BotProxies

What You Can Do

GET

Scrape url — Scrape a URL with configurable rendering and extraction

/v1/scrape

GET

Capture screenshot — Capture a screenshot of a webpage or element

/v1/screenshots

MCP Tools

scrape-webpage

Scrape any webpage and return its content. Supports anti-bot bypass, JavaScript rendering for dynamic sites, proxy rotation across 190+ countries, and output in HTML, markdown, or plain text format.

read-only

extract-structured-data

Scrape a webpage and extract structured data using an AI prompt. Returns structured JSON data extracted from the page content.

read-only

scrape-with-session

Scrape a webpage with session persistence, maintaining cookies and browser fingerprint across multiple requests to the same site.

read-only

scrape-with-cache

Scrape a URL with caching enabled to avoid redundant requests. Ideal for repeatedly accessed URLs that don't change frequently.

read-only idempotent

capture-full-page-screenshot

Capture a full-page screenshot of any website. Useful for visual verification, archiving, or accessibility testing.

read-only

capture-element-screenshot

Capture a screenshot of a specific HTML element using a CSS selector. Useful for extracting visual data from specific page components.

read-only

APIs Used

scrapfly-scrape

Capability Spec

naftiko: "1.0.0-alpha1"

info:
  label: "Web Data Collection"
  description: >-
    Unified capability for web data collection workflows using Scrapfly's
    scraping, screenshot, and extraction APIs. Enables data engineers and
    researchers to collect, extract, and transform web content at scale
    with anti-bot bypass, proxy rotation, and AI-assisted extraction.
  tags:
    - Web Scraping
    - Data Collection
    - Data Extraction
    - Screenshots
    - Anti-Bot
    - Proxies
  created: "2026-05-02"
  modified: "2026-05-02"

binds:
  - namespace: env
    keys:
      SCRAPFLY_API_KEY: SCRAPFLY_API_KEY

capability:
  consumes:
    - import: scrapfly-scrape
      location: ./shared/scrapfly-scrape.yaml

  exposes:
    - type: rest
      port: 8080
      namespace: web-data-collection-api
      description: "Unified REST API for web data collection, scraping, and screenshot capture."
      resources:
        - path: /v1/scrape
          name: scraping
          description: Web page scraping with anti-bot bypass
          operations:
            - method: GET
              name: scrape-url
              description: Scrape a URL with configurable rendering and extraction
              call: "scrapfly-scrape.scrape-url"
              with:
                url: "rest.url"
                render_js: "rest.render_js"
                asp: "rest.asp"
                country: "rest.country"
                format: "rest.format"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/screenshots
          name: screenshots
          description: Web page screenshot capture
          operations:
            - method: GET
              name: capture-screenshot
              description: Capture a screenshot of a webpage or element
              call: "scrapfly-scrape.capture-screenshot"
              with:
                url: "rest.url"
                capture: "rest.capture"
                format: "rest.format"
              outputParameters:
                - type: object
                  mapping: "$."

    - type: mcp
      port: 9090
      namespace: web-data-collection-mcp
      transport: http
      description: "MCP server for AI-assisted web data collection and extraction workflows."
      tools:
        - name: scrape-webpage
          description: >-
            Scrape any webpage and return its content. Supports anti-bot bypass,
            JavaScript rendering for dynamic sites, proxy rotation across 190+
            countries, and output in HTML, markdown, or plain text format.
          hints:
            readOnly: true
            openWorld: true
          call: "scrapfly-scrape.scrape-url"
          with:
            url: "tools.url"
            render_js: "tools.render_js"
            asp: "tools.asp"
            country: "tools.country"
            format: "tools.format"
          outputParameters:
            - type: object
              mapping: "$."

        - name: extract-structured-data
          description: >-
            Scrape a webpage and extract structured data using an AI prompt.
            Returns structured JSON data extracted from the page content.
          hints:
            readOnly: true
            openWorld: true
          call: "scrapfly-scrape.scrape-url"
          with:
            url: "tools.url"
            extraction_prompt: "tools.prompt"
            render_js: "tools.render_js"
            asp: "tools.asp"
          outputParameters:
            - type: object
              mapping: "$."

        - name: scrape-with-session
          description: >-
            Scrape a webpage with session persistence, maintaining cookies and
            browser fingerprint across multiple requests to the same site.
          hints:
            readOnly: true
          call: "scrapfly-scrape.scrape-url"
          with:
            url: "tools.url"
            session: "tools.session"
            render_js: "tools.render_js"
          outputParameters:
            - type: object
              mapping: "$."

        - name: scrape-with-cache
          description: >-
            Scrape a URL with caching enabled to avoid redundant requests.
            Ideal for repeatedly accessed URLs that don't change frequently.
          hints:
            readOnly: true
            idempotent: true
          call: "scrapfly-scrape.scrape-url"
          with:
            url: "tools.url"
            cache: "tools.cache"
            cache_ttl: "tools.cache_ttl"
          outputParameters:
            - type: object
              mapping: "$."

        - name: capture-full-page-screenshot
          description: >-
            Capture a full-page screenshot of any website. Useful for visual
            verification, archiving, or accessibility testing.
          hints:
            readOnly: true
          call: "scrapfly-scrape.capture-screenshot"
          with:
            url: "tools.url"
            capture: "fullpage"
            format: "tools.format"
          outputParameters:
            - type: object
              mapping: "$."

        - name: capture-element-screenshot
          description: >-
            Capture a screenshot of a specific HTML element using a CSS selector.
            Useful for extracting visual data from specific page components.
          hints:
            readOnly: true
          call: "scrapfly-scrape.capture-screenshot"
          with:
            url: "tools.url"
            capture: "tools.css_selector"
            format: "tools.format"
          outputParameters:
            - type: object
              mapping: "$."