Data Extraction API

Web-to-JSONAny URL into clean structured JSON in one API call.

POST a URL and get normalized JSON back. No brittle CSS selectors. Built for SPA rendering, dynamic pages, login gates, and paywall-heavy sites where simple HTML scraping fails.

Simple pricing: $0.01/page pay-as-you-go or $15/month unlimited.

Example request:

curl -X POST https://api.webtojson.dev/api/extract \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{"url":"https://example.com/post"}'

Response shape:

{
  "title": "Page title",
  "summary": "Concise factual summary",
  "entities": [...],
  "fields": {...},
  "keyPoints": [...]
}

Why Most Scrapers Miss the Data

Solo builders need output they can trust in production, not selector whack-a-mole. Web-to-JSON focuses on rendered content quality first.

Selectors Break Every Week

Most pages ship JS-heavy UI changes constantly. Hardcoded selectors drift and your scraper silently returns junk.

SPAs Hide Real Content

Basic HTTP fetch misses the rendered content because data lands after hydration, lazy loading, or route transitions.

Auth Walls & Paywalls

The useful data often sits behind soft walls or anti-bot messaging where naive extraction gives fragments only.

What You Get In One Call

The API is intentionally narrow: send URL in, receive practical JSON out, with enough structure to plug directly into lead enrichment, content indexing, or monitoring workflows.

Rendered Browser Capture

Puppeteer loads the page as a real browser session so SPAs, delayed scripts, and dynamic layouts are actually captured.

AI Structured Extraction

Claude or OpenAI converts noisy page text into stable JSON with key points, entities, and typed fields.

Paywall-Aware Signals

Returns paywall/auth-wall indicators so your pipeline knows when extraction quality is likely constrained.

Cookie-Based Paid Access

Purchased users unlock `/tool` and `/api/extract` access via secure cookie, keeping the monetized feature behind the wall.

Simple Pricing for Indie Builders

Start with low variable cost when you are validating. Switch to unlimited when extraction becomes core.

Pay-as-you-go

$0.01/page

Ideal for data experiments, side projects, and low-volume jobs where every call should stay cheap.

- Full rendered extraction
- Structured JSON output
- API + dashboard tool access

Best Value

Unlimited

$15/month

Built for founders shipping daily automations, ingestion pipelines, and recurring data jobs.

- Unlimited page extraction
- Priority processing lane
- Cookie-unlocked private tool

API Quickstart

The extractor endpoint accepts a URL and returns normalized JSON from rendered page content.

POST /api/extract

Request body

{
  "url": "https://example.com"
}

Response body (example)

{
  "ok": true,
  "data": {
    "title": "Example Domain",
    "summary": "...",
    "keyPoints": ["..."],
    "entities": [],
    "fields": {
      "author": null,
      "publishedDate": null,
      "price": null
    }
  }
}

If neither `OPENAI_API_KEY` nor `ANTHROPIC_API_KEY` is configured, the endpoint still returns useful heuristic JSON so your workflow remains operational.

Frequently Asked Questions

How is this different from web-access skills?

web-access style skills are built for conversational agents. Web-to-JSON is an API-first service designed for production pipelines and automation jobs.

Do I need to maintain selectors?

No. You send only the URL. The extractor renders the page, captures the meaningful content, and returns normalized JSON without CSS selector maintenance.

Can it handle SPAs and delayed content?

Yes. Puppeteer executes client-side JavaScript, waits for network idle, and scrolls to trigger lazy-loaded sections before extraction.

How does paid access work with Stripe Payment Link?

After checkout, Stripe sends a webhook event with the checkout session ID. The success page exchanges that session ID for an httpOnly cookie that unlocks the protected tool and API endpoint.

What should I configure in Stripe?

Set your payment link's success URL to `/success?session_id={CHECKOUT_SESSION_ID}` so buyers can be unlocked automatically after payment.