Skip to main content

Documentation Index

Fetch the complete documentation index at: https://webscraping.titannet.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

The search action type is for discovery-first work: turn queries and ranking into candidate URLs or documents before you lock onto specific pages. It pairs with scrape or crawl when the hard part is “what should we read?” rather than only “extract these known URLs.”
Examples use TITAN_API_URL (Task Service origin, no /api/v1 suffix) and TITAN_TOKEN. Tab titles match other integration pages. Rust: ureq + serde_json. You need tasks:write (and related scopes for run and read paths).
  • You have questions or keywords, not a fixed PDP list.
  • You want fresh candidates from the open web on each run.
  • You will downstream either crawl for more coverage or scrape into a stable schema.

Inputs and wiring

  • With input_source: static_urls, seeds usually live in the task’s urls list (validated per action rules).
  • In multi-step plans, a later scrape step often sets input_source: previous_step so discovered URLs flow into extraction.

Single-action example (POST /api/v1/tasks)

The exact limits and payload keys depend on your template and script contract—treat the JSON below as structural, not a guarantee of field names.
curl -sS -X POST "$TITAN_API_URL/api/v1/tasks" \
  -H "Authorization: Bearer $TITAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Nightly patent news discovery",
    "objective": "Find recent articles matching our keywords and return candidate URLs with titles",
    "execution_type": "single",
    "urls": ["https://example.com/seed-blog"],
    "action_type": "search",
    "input_source": "static_urls",
    "template_slug": "example-search-template",
    "limits": { "max_results": 25 }
  }'

Multi-step example (search → scrape)

Create a task whose execution_plan has two steps: search with static_urls, then scrape with previous_step and an output_schema carried on the scrape step (self-contained plan). Sketch:
{
  "name": "Discover then extract headlines",
  "objective": "Search for product mentions, then scrape top results into records",
  "execution_type": "single",
  "execution_plan": {
    "steps": [
      {
        "step_id": "discover",
        "action_type": "search",
        "input_source": "static_urls",
        "template_slug": "vertical-search",
        "limits": { "max_results": 15 }
      },
      {
        "step_id": "extract",
        "action_type": "scrape",
        "input_source": "previous_step",
        "template_slug": "article-headline-scrape",
        "output_schema": {
          "type": "object",
          "properties": {
            "url": { "type": "string" },
            "title": { "type": "string" }
          },
          "required": ["url", "title"]
        }
      }
    ]
  }
}
Do not send top-level action_type or output_schema when the body is an execution_plan-only create; the plan must include everything each step needs.

Submit the multi-step task (same body via API)

curl -sS -X POST "$TITAN_API_URL/api/v1/tasks" \
  -H "Authorization: Bearer $TITAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d @search-scrape-plan.json
Save the JSON sketch as search-scrape-plan.json, or build the same object in memory in your language.

Run and inspect

Run task

curl -sS -X POST "$TITAN_API_URL/api/v1/tasks/$TASK_ID/run" \
  -H "Authorization: Bearer $TITAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

Get task

curl -sS "$TITAN_API_URL/api/v1/tasks/$TASK_ID" \
  -H "Authorization: Bearer $TITAN_TOKEN"
Dry-run a plan without persisting a task:
curl -sS -X POST "$TITAN_API_URL/api/v1/executions/preview" \
  -H "Authorization: Bearer $TITAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "objective": "Probe a two-step search→scrape graph",
    "execution_plan": { "steps": [] }
  }'
Replace the empty steps array with a real plan when you call this for real.