Sitemap Crawl

Sitemap Crawl is a Naftiko capability published by The Philadelphia Inquirer, one of 3 capabilities the APIs.io network indexes for this provider.

Can be deployed as a REST endpoint, MCP tool, or Agent Skill via Naftiko.

Run with Naftiko

Capability Spec

sitemap-crawl.yaml Raw ↑
name: Philadelphia Inquirer Sitemap Crawl
description: >-
  Capability that walks the Inquirer.com sitemap index, expands child URL
  sitemaps for a given date range, and emits a stream of article URLs with
  lastmod and image metadata for downstream crawling or freshness checks.
category: Discovery
trigger: scheduled
inputs:
- name: dateRange
  description: Inclusive YYYY-MM-DD start and end.
  type: object
  properties:
    start:
      type: string
      format: date
    end:
      type: string
      format: date
outputs:
- name: urls
  description: Article URL entries with sitemap metadata.
  type: array
  items:
    $ref: ../json-schema/sitemap-url-schema.json
steps:
- name: fetch-sitemap-index
  call:
    api: philadelphia-inquirer:sitemaps
    operationId: getSitemapIndex
- name: filter-children-by-date
- name: fetch-daily-sitemaps
  forEach: childSitemaps
  call:
    api: philadelphia-inquirer:sitemaps
    operationId: getDailySitemap
    pathParams:
      date: "{{ item.date }}"
- name: flatten-urls