Skip to content

Command-Line Interface

Warning: pre-1.0.0 - APIs and contracts may change.

wxpath provides a CLI for executing expressions directly from the terminal.

Basic Usage

wxpath "url('https://quotes.toscrape.com')//a/@href"

Output is streamed as newline-delimited JSON (NDJSON).

Options

Option Description Default
--depth <n> Maximum crawl depth 1
--verbose Enable verbose output false
--debug Enable debug logging false
--concurrency <n> Global concurrent fetches 16
--concurrency-per-host <n> Per-host concurrent fetches 8
--header "Key:Value" Add custom header (repeatable) -
--respect-robots Respect robots.txt true
--cache Enable response caching false

Examples

wxpath "url('https://quotes.toscrape.com')//a/@href"

Deep Crawl with Custom Headers

wxpath --depth 1 \
    --header "User-Agent: my-bot/1.0 (contact: me@example.com)" \
    "url('https://quotes.toscrape.com')///url(//a/@href)//title/text()"

Structured Data Extraction

wxpath --depth 1 \
    --header "User-Agent: my-app/0.1 (contact: you@example.com)" \
    "url('https://en.wikipedia.org/wiki/Python_(programming_language)') \
    /map{ \
        'title': (//h1//text())[1] ! normalize-space(.), \
        'url': string(base-uri(.)) \
    }"

Wikipedia Crawl with Filters

wxpath --depth 1 \
    --header "User-Agent: my-app/0.1 (contact: you@example.com)" \
    "url('https://en.wikipedia.org/wiki/Expression_language') \
    ///url(//div[@id='mw-content-text']//a/@href[starts-with(., '/wiki/') \
        and not(matches(@href, '^(?:/wiki/)?(?:Wikipedia|File|Template|Special|Template_talk|Help):'))]) \
    /map{ \
        'title':(//span[contains(@class, 'mw-page-title-main')]/text())[1], \
        'short_description':(//div[contains(@class, 'shortdescription')]/text())[1], \
        'url':string(base-uri(.)), \
        'backlink':wx:backlink(.), \
        'depth':wx:depth(.) \
    }"

Cached Crawl

# Requires wxpath[cache-sqlite] or wxpath[cache-redis]
wxpath --cache --depth 1 \
    "url('https://example.com')///url(//a/@href)//title/text()"

Output Format

The CLI outputs NDJSON (newline-delimited JSON):

{"title": "Page 1", "url": "https://example.com/page1"}
{"title": "Page 2", "url": "https://example.com/page2"}
{"title": "Page 3", "url": "https://example.com/page3"}

This format is easy to pipe to other tools:

# Count results
wxpath "..." | wc -l

# Filter with jq
wxpath "..." | jq 'select(.title != null)'

# Save to file
wxpath "..." > results.jsonl

Environment Variables

Variable Description
WXPATH_OUT Output file path for JSONLWriter hook