Skip to main content

Documentation Index

Fetch the complete documentation index at: https://webscraping.titannet.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

The scrape action type is the closest analogue to classic scraping: you supply concrete URLs (or URLs from previous_step / task_url_inventory) and an output_schema, and workers return structured records (and media when the workflow and template support it). This is the right primitive when you already know which pages must become rows in your dataset.
Examples use TITAN_API_URL and TITAN_TOKEN. Tab titles match other integration pages. Rust: ureq + serde_json.

When to use scrape

  • You have a stable URL list (product pages, filings, dashboards).
  • You need schema-shaped JSON for downstream analytics, RAG chunks, or alerts.
  • An upstream search or crawl step produced the URLs you want to read deeply.

Single-action example (POST /api/v1/tasks)

schedule must satisfy validation for execution_type: scheduled in your environment. If you only need a one-off run, use execution_type: single and omit schedule.
curl -sS -X POST "$TITAN_API_URL/api/v1/tasks" \
  -H "Authorization: Bearer $TITAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "PDP price monitor",
    "objective": "Extract price, currency, and availability for each PDP URL",
    "execution_type": "scheduled",
    "urls": [
      "https://shop.example.com/p/100",
      "https://shop.example.com/p/200"
    ],
    "action_type": "scrape",
    "input_source": "static_urls",
    "template_slug": "commerce-pdp",
    "output_schema": {
      "type": "object",
      "properties": {
        "price": { "type": "number" },
        "currency": { "type": "string" },
        "in_stock": { "type": "boolean" }
      },
      "required": ["price", "currency", "in_stock"]
    },
    "schedule": {
      "cron": "15 */6 * * *",
      "timezone": "UTC"
    }
  }'

Consume output from a prior step

When input_source is previous_step, the scrape step belongs inside an execution_plan; the previous step must emit URLs your scrape template understands.
{
  "step_id": "extract_pdps",
  "action_type": "scrape",
  "input_source": "previous_step",
  "template_slug": "commerce-pdp",
  "output_schema": {
    "type": "object",
    "properties": {
      "price": { "type": "number" }
    }
  }
}

Run with execution-time secrets

Do not send secret_payload on create. Pass sensitive material when you trigger the run:
curl -sS -X POST "$TITAN_API_URL/api/v1/tasks/$TASK_ID/run" \
  -H "Authorization: Bearer $TITAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "secret_payload": {
      "vendor_session": "read-from-vault-at-runtime"
    }
  }'

Fetch results (after execution completes)

curl -sS "$TITAN_API_URL/api/v1/executions/$EXECUTION_ID/results" \
  -H "Authorization: Bearer $TITAN_TOKEN"
Use the OpenAPI Task Service pages for export and media download options.