Build Web scraping API with V0 | Step-by-Step

TL;DR

Build a web scraping service with V0 featuring a REST API that accepts URLs and CSS selectors, extracts structured data with Cheerio (not Puppeteer — it runs on Vercel serverless), API key authentication, result caching in Supabase, and a job management dashboard. You'll handle Vercel's timeout limits with maxDuration and provide a ScrapingBee fallback for JavaScript-rendered pages — all in about 2-4 hours.

What you're building

Web scraping extracts structured data from websites — product prices, article titles, contact information, or any data visible on a page. Building a scraping API lets you automate this extraction with simple HTTP requests.

V0 generates the API endpoints, job dashboard, and configuration forms from prompts. Cheerio parses HTML using CSS selectors, running efficiently on Vercel serverless without the overhead of a headless browser. Supabase stores scrape jobs, results, and API keys.

The architecture uses an API route for the public scraping endpoint with API key authentication, an internal route for the actual scraping logic with Cheerio, Server Components for the dashboard, and route segment config for extending Vercel's timeout limits.

Final result

A web scraping API with configurable CSS selectors, API key authentication, result caching, and a management dashboard.

Tech stack

V0AI Code Generator

Next.jsFull-Stack Framework

Tailwind CSSStyling

shadcn/uiComponent Library

SupabaseDatabase & Auth

CheerioHTML Parsing

Prerequisites

A V0 account (Premium recommended for the project complexity)
A Supabase project (free tier works — connect via V0's Connect panel)
Optional: a ScrapingBee account for JavaScript-rendered pages (free tier: 1,000 requests)
No other API keys or services needed

Build steps

Set up the scrape jobs, results, and API keys schema

Open V0 and create a new project. Use the Connect panel to add Supabase. Create tables for scrape job definitions, extraction results, and API key management.

supabase/migrations/001_schema.sql

1CREATE TABLE api_keys (
2  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
3  owner_id uuid NOT NULL,
4  key_hash text UNIQUE NOT NULL,
5  key_prefix text NOT NULL,
6  name text NOT NULL,
7  is_active boolean DEFAULT true,
8  created_at timestamptz DEFAULT now()
9);
10
11CREATE TABLE scrape_jobs (
12  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
13  owner_id uuid NOT NULL,
14  target_url text NOT NULL,
15  selector_config jsonb NOT NULL DEFAULT '{}',
16  schedule_cron text,
17  status text DEFAULT 'pending'
18    CHECK (status IN ('pending','running','completed','failed')),
19  last_run_at timestamptz,
20  created_at timestamptz DEFAULT now()
21);
22
23CREATE TABLE scrape_results (
24  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
25  job_id uuid REFERENCES scrape_jobs(id) ON DELETE CASCADE,
26  data jsonb NOT NULL,
27  html_snapshot text,
28  status_code int,
29  duration_ms int,
30  scraped_at timestamptz DEFAULT now()
31);
32
33CREATE INDEX idx_results_job_id ON scrape_results(job_id);
34CREATE INDEX idx_results_scraped_at ON scrape_results(scraped_at DESC);
35
36-- RLS: users only see their own data
37ALTER TABLE scrape_jobs ENABLE ROW LEVEL SECURITY;
38CREATE POLICY "Users manage own jobs" ON scrape_jobs
39  FOR ALL USING (owner_id = auth.uid());
40
41ALTER TABLE scrape_results ENABLE ROW LEVEL SECURITY;
42CREATE POLICY "Users see own results" ON scrape_results
43  FOR SELECT USING (
44    job_id IN (SELECT id FROM scrape_jobs WHERE owner_id = auth.uid())
45  );

Pro tip: The selector_config is a JSON map like {"title": "h1", "price": ".price-tag", "image": "img.hero@src"}. The @src suffix tells the scraper to extract an attribute instead of text content.

Expected result: Three tables with RLS policies. API keys use hashed storage (only the prefix is visible). Scrape results are indexed by job and timestamp for fast queries.

Build the scraping API with Cheerio and API key auth

Create the public API endpoint that authenticates via X-API-Key header, fetches the target URL, parses it with Cheerio using the provided CSS selectors, and returns structured data.

app/api/scrape/route.ts

1import { NextRequest, NextResponse } from 'next/server'
2import { createClient } from '@supabase/supabase-js'
3import * as cheerio from 'cheerio'
4import crypto from 'crypto'
5
6export const maxDuration = 60
7
8const supabase = createClient(
9  process.env.NEXT_PUBLIC_SUPABASE_URL!,
10  process.env.SUPABASE_SERVICE_ROLE_KEY!
11)
12
13export async function POST(req: NextRequest) {
14  const apiKey = req.headers.get('x-api-key')
15  if (!apiKey) {
16    return NextResponse.json({ error: 'Missing API key' }, { status: 401 })
17  }
18
19  const keyHash = crypto.createHash('sha256').update(apiKey).digest('hex')
20  const { data: keyRecord } = await supabase
21    .from('api_keys')
22    .select('owner_id, is_active')
23    .eq('key_hash', keyHash)
24    .single()
25
26  if (!keyRecord?.is_active) {
27    return NextResponse.json({ error: 'Invalid API key' }, { status: 401 })
28  }
29
30  const { url, selectors } = await req.json()
31  const start = Date.now()
32
33  const res = await fetch(url, {
34    headers: { 'User-Agent': 'RapidScraper/1.0' },
35    signal: AbortSignal.timeout(15000),
36  })
37
38  const html = await res.text()
39  const $ = cheerio.load(html)
40  const data: Record<string, string | string[]> = {}
41
42  for (const [key, selector] of Object.entries(selectors as Record<string, string>)) {
43    const [sel, attr] = selector.split('@')
44    const elements = $(sel)
45    if (elements.length > 1) {
46      data[key] = elements.map((_, el) =>
47        attr ? $(el).attr(attr) ?? '' : $(el).text().trim()
48      ).get()
49    } else {
50      data[key] = attr
51        ? elements.attr(attr) ?? ''
52        : elements.text().trim()
53    }
54  }
55
56  await supabase.from('scrape_results').insert({
57    job_id: null,
58    data,
59    status_code: res.status,
60    duration_ms: Date.now() - start,
61  })
62
63  return NextResponse.json({ data, status_code: res.status, duration_ms: Date.now() - start })
64}

Pro tip: Set maxDuration = 60 at the top of the route file. This extends the Vercel serverless timeout from the default 10 seconds (Hobby) to 60 seconds (Pro plan). Without it, slow websites will timeout.

Expected result: POST to /api/scrape with an X-API-Key header and JSON body containing url and selectors returns the extracted data, HTTP status code, and duration.

Build the API key generation system

Create a secure API key generation flow. Keys are generated as random strings, the full key is shown once to the user, and only the SHA-256 hash is stored in the database.

app/actions/api-keys.ts

1'use server'
2
3import { createClient } from '@/lib/supabase/server'
4import crypto from 'crypto'
5
6export async function generateApiKey(name: string) {
7  const supabase = await createClient()
8  const user = (await supabase.auth.getUser()).data.user
9  if (!user) return { error: 'Unauthorized' }
10
11  const rawKey = `rsk_${crypto.randomBytes(32).toString('hex')}`
12  const keyHash = crypto.createHash('sha256').update(rawKey).digest('hex')
13  const keyPrefix = rawKey.slice(0, 8)
14
15  const { error } = await supabase.from('api_keys').insert({
16    owner_id: user.id,
17    key_hash: keyHash,
18    key_prefix: keyPrefix,
19    name,
20  })
21
22  if (error) return { error: error.message }
23
24  return { key: rawKey, prefix: keyPrefix }
25}
26
27export async function revokeApiKey(keyId: string) {
28  const supabase = await createClient()
29  await supabase
30    .from('api_keys')
31    .update({ is_active: false })
32    .eq('id', keyId)
33}

Pro tip: The raw key is returned only once during generation. After that, only the prefix (rsk_xxxx) is visible in the dashboard. This follows the same pattern as Stripe and GitHub API keys.

Expected result: Generating an API key shows the full key in a Dialog with a copy button. The key is hashed before storage — the full key is never retrievable again.

Build the job management dashboard

Create a dashboard where users manage scrape jobs, view result history, and configure CSS selectors for each target URL.

prompt.txt

1// Paste this prompt into V0's AI chat:
2// Build a scraping job dashboard at app/dashboard/page.tsx with:
3// 1. Server Component fetching all scrape_jobs for the current user
4// 2. shadcn/ui Table with columns: Target URL (truncated), Status Badge (pending=outline, running=secondary, completed=default, failed=destructive), Last Run, Results Count, Actions
5// 3. Dialog for creating new jobs: Input for target URL, Textarea for JSON selector config with example placeholder ({"title": "h1", "price": ".price"}), optional Input for cron schedule
6// 4. Per-job detail page at app/dashboard/jobs/[id]/page.tsx showing: result history Table, latest extracted data in a pre/code block, re-run Button
7// 5. API Keys section with Table showing key prefix, name, status Badge, and Revoke Button with AlertDialog
8// 6. Generate API Key Button opening Dialog with Input for key name, showing the generated key once with copy-to-clipboard
9// 7. Summary Cards: total jobs, successful scrapes today, failed scrapes, active API keys

Expected result: A dashboard with job Table, status Badges, API key management, and per-job result history with extracted data preview.

Add result caching and ScrapingBee fallback

Add result caching to avoid re-scraping the same URL within a TTL period, and integrate ScrapingBee as a fallback for JavaScript-rendered pages that Cheerio cannot parse.

prompt.txt

1// Paste this prompt into V0's AI chat:
2// Enhance the scraping API with:
3// 1. Result caching: before scraping, check scrape_results for a result with the same URL scraped within the last hour. If found, return the cached data with a 'cached: true' flag.
4// 2. ScrapingBee fallback: add a 'render_js' boolean parameter to the API. When true, call ScrapingBee's API (api.scrapingbee.com/api/v1?api_key=KEY&url=URL&render_js=true) instead of direct fetch. This handles JavaScript-rendered SPAs.
5// 3. Set SCRAPINGBEE_API_KEY in V0's Vars tab (server-only).
6// 4. Add a 'Test Scrape' Button on the job form that runs a single scrape and shows the result in a preview Card before saving the job.
7// 5. Error handling: catch ETIMEDOUT, ECONNREFUSED, and HTTP 4xx/5xx with descriptive error messages.
8// Update the scraping API route to support both modes.

Expected result: The API checks for cached results before scraping. JavaScript-heavy pages can be scraped via ScrapingBee fallback. A test button lets users preview extraction results before saving jobs.

Complete code

app/api/scrape/route.ts

1import { NextRequest, NextResponse } from 'next/server'
2import { createClient } from '@supabase/supabase-js'
3import * as cheerio from 'cheerio'
4import crypto from 'crypto'
5
6export const maxDuration = 60
7
8const supabase = createClient(
9  process.env.NEXT_PUBLIC_SUPABASE_URL!,
10  process.env.SUPABASE_SERVICE_ROLE_KEY!
11)
12
13export async function POST(req: NextRequest) {
14  const apiKey = req.headers.get('x-api-key')
15  if (!apiKey) {
16    return NextResponse.json({ error: 'Missing API key' }, { status: 401 })
17  }
18
19  const keyHash = crypto.createHash('sha256').update(apiKey).digest('hex')
20  const { data: keyRecord } = await supabase
21    .from('api_keys')
22    .select('owner_id, is_active')
23    .eq('key_hash', keyHash)
24    .single()
25
26  if (!keyRecord?.is_active) {
27    return NextResponse.json({ error: 'Invalid API key' }, { status: 401 })
28  }
29
30  const { url, selectors } = await req.json()
31  const start = Date.now()
32
33  const res = await fetch(url, {
34    headers: { 'User-Agent': 'RapidScraper/1.0' },
35    signal: AbortSignal.timeout(15000),
36  })
37
38  const html = await res.text()
39  const $ = cheerio.load(html)
40  const data: Record<string, string | string[]> = {}
41
42  for (const [key, selector] of Object.entries(
43    selectors as Record<string, string>
44  )) {
45    const [sel, attr] = selector.split('@')
46    const elements = $(sel)
47    if (elements.length > 1) {
48      data[key] = elements
49        .map((_, el) =>
50          attr ? $(el).attr(attr) ?? '' : $(el).text().trim()
51        )
52        .get()
53    } else {
54      data[key] = attr
55        ? elements.attr(attr) ?? ''
56        : elements.text().trim()
57    }
58  }
59
60  return NextResponse.json({
61    data,
62    status_code: res.status,
63    duration_ms: Date.now() - start,
64  })
65}

Customization ideas

Add scheduled scraping with Vercel Cron

Configure vercel.json cron jobs to trigger scrape jobs on a schedule. Store the cron expression per job and check which jobs are due on each cron invocation.

Add webhook notifications

Let users configure a webhook URL per job. After each scrape, POST the results to their webhook. Useful for triggering downstream workflows when prices change.

Add data diff detection

Compare each scrape result against the previous one. Highlight changed fields and optionally send an email alert when monitored values (like prices) change.

Add CSV/JSON export

Add download buttons on the results page that export scrape history as CSV or JSON files for analysis in spreadsheets or data tools.

Add rate limiting per API key

Track API calls per key in a Redis counter (Upstash) and enforce limits like 100 requests/hour. Return 429 Too Many Requests when exceeded.

Common pitfalls

Pitfall: Using Puppeteer or Playwright for web scraping on Vercel

How to avoid: Use Cheerio for HTML parsing. It loads the HTML string and provides jQuery-like CSS selectors without a browser. For JavaScript-rendered pages, use a third-party service like ScrapingBee.

Pitfall: Not setting maxDuration in the route segment config

How to avoid: Export const maxDuration = 60 at the top of the route file. This extends the timeout to 60 seconds on Pro plans. On Hobby, the maximum is 10 seconds.

Pitfall: Storing API keys in plain text

How to avoid: Hash API keys with SHA-256 before storing. Show the full key only once during generation. Store only the prefix (first 8 chars) for display in the dashboard.

Pitfall: Not setting a fetch timeout for target URLs

How to avoid: Use AbortSignal.timeout(15000) in the fetch call. This aborts the request after 15 seconds, leaving time for Cheerio parsing within the overall maxDuration limit.

Best practices

Use Cheerio instead of headless browsers for Vercel serverless — it parses HTML in milliseconds without Chromium overhead
Set export const maxDuration = 60 in the route segment config to extend Vercel's serverless timeout
Hash API keys with SHA-256 before storage and only show the full key once during generation
Use AbortSignal.timeout() on fetch calls to prevent hanging requests from consuming your entire timeout budget
Cache scrape results in Supabase with a TTL to avoid re-scraping unchanged pages repeatedly
Use the @attr suffix convention in selectors (e.g., 'img.hero@src') to extract HTML attributes instead of text content
Set SUPABASE_SERVICE_ROLE_KEY and SCRAPINGBEE_API_KEY in V0's Vars tab without NEXT_PUBLIC_ prefix

AI prompts to try

Copy these prompts to build this project faster.

ChatGPT Prompt

I'm building a web scraping API with Next.js App Router, Supabase, and Cheerio. I need: 1) API key authentication using SHA-256 hashing, 2) Cheerio-based HTML extraction using configurable CSS selectors from a JSON map, 3) Vercel serverless timeout handling with maxDuration, 4) Result caching to avoid duplicate scrapes. Help me design the API and key management patterns.

Build Prompt

Create a scraping API route at app/api/scrape/route.ts with: 1) export const maxDuration = 60 for extended timeout, 2) X-API-Key header authentication using SHA-256 hash lookup in Supabase, 3) fetch() with AbortSignal.timeout(15000) for the target URL, 4) Cheerio parsing with configurable CSS selectors from the request body, 5) Support for @attr suffix to extract HTML attributes, 6) Duration tracking and structured JSON response.

Frequently asked questions

Why Cheerio instead of Puppeteer?

Cheerio parses HTML strings using CSS selectors in milliseconds without needing a browser. Puppeteer requires 200MB+ of Chromium, exceeds Vercel serverless size limits, and takes 5-10 seconds just to launch. For static HTML pages, Cheerio is 100x faster and actually works on Vercel.

What about JavaScript-rendered pages?

Cheerio only parses the initial HTML response. For SPAs that render content with JavaScript, integrate with ScrapingBee (1,000 free requests) which runs a headless browser on their infrastructure and returns the rendered HTML for Cheerio to parse.

What is the Vercel timeout limit?

Hobby plan: 10 seconds. Pro plan: 60 seconds (with maxDuration = 60). Enterprise: 300 seconds. Set export const maxDuration = 60 in your route file to use the extended timeout on Pro.

How do API keys work?

When a user generates a key, the server creates a random string (rsk_...), shows it once, and stores only the SHA-256 hash. On each API request, the provided key is hashed and compared against stored hashes. This means even database access does not reveal the actual keys.

What V0 plan do I need?

V0 Premium ($20/month) is recommended. The scraping API involves multiple API routes, API key management, and a dashboard that require several prompt iterations.

How do I deploy this?

Click Share then Publish in V0. Set SUPABASE_SERVICE_ROLE_KEY in V0's Vars tab (no NEXT_PUBLIC_ prefix). Optionally add SCRAPINGBEE_API_KEY for JavaScript rendering fallback. The scraping API is immediately accessible at your Vercel production URL.

Can RapidDev help build a custom data extraction platform?

Yes. RapidDev has built 600+ apps including data pipeline platforms with scraping, transformation, and visualization. Book a free consultation to discuss your data extraction requirements.

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

How to Build Web scraping API with V0

What you'll build

What you'll build

What you're building

Final result

Tech stack

Prerequisites

Build steps

Set up the scrape jobs, results, and API keys schema

Build the scraping API with Cheerio and API key auth

Build the API key generation system

Build the job management dashboard

Add result caching and ScrapingBee fallback

Complete code

Customization ideas

Add scheduled scraping with Vercel Cron

Add webhook notifications

Add data diff detection

Add CSV/JSON export

Add rate limiting per API key

Common pitfalls

Best practices

AI prompts to try

Frequently asked questions

Talk to an Expert

Need help building your app?

We put the rapid in RapidDev

How to Build Web scraping API with V0

What you'll build

What you're building

Final result

Tech stack

Prerequisites

Build steps

Set up the scrape jobs, results, and API keys schema

Build the scraping API with Cheerio and API key auth

Build the API key generation system

Build the job management dashboard

Add result caching and ScrapingBee fallback

Complete code

Customization ideas

Add scheduled scraping with Vercel Cron

Add webhook notifications

Add data diff detection

Add CSV/JSON export

Add rate limiting per API key

Common pitfalls

Best practices

AI prompts to try

Related builds

How to Build an API Backend with V0

How to Build Data Visualization Tools with V0

How to Build Search, Filtering, and Sorting with V0

How to Build a Reporting Tool with V0

How to Build an Integration Hub with V0

Explore related guides

Frequently asked questions

Talk to an Expert

Need help building your app?

We put the rapid in RapidDev