Build a web scraping service with V0 featuring a REST API that accepts URLs and CSS selectors, extracts structured data with Cheerio (not Puppeteer — it runs on Vercel serverless), API key authentication, result caching in Supabase, and a job management dashboard. You'll handle Vercel's timeout limits with maxDuration and provide a ScrapingBee fallback for JavaScript-rendered pages — all in about 2-4 hours.
What you're building
Web scraping extracts structured data from websites — product prices, article titles, contact information, or any data visible on a page. Building a scraping API lets you automate this extraction with simple HTTP requests.
V0 generates the API endpoints, job dashboard, and configuration forms from prompts. Cheerio parses HTML using CSS selectors, running efficiently on Vercel serverless without the overhead of a headless browser. Supabase stores scrape jobs, results, and API keys.
The architecture uses an API route for the public scraping endpoint with API key authentication, an internal route for the actual scraping logic with Cheerio, Server Components for the dashboard, and route segment config for extending Vercel's timeout limits.
Final result
A web scraping API with configurable CSS selectors, API key authentication, result caching, and a management dashboard.
Tech stack
Prerequisites
- A V0 account (Premium recommended for the project complexity)
- A Supabase project (free tier works — connect via V0's Connect panel)
- Optional: a ScrapingBee account for JavaScript-rendered pages (free tier: 1,000 requests)
- No other API keys or services needed
Build steps
Set up the scrape jobs, results, and API keys schema
Open V0 and create a new project. Use the Connect panel to add Supabase. Create tables for scrape job definitions, extraction results, and API key management.
1CREATE TABLE api_keys (2 id uuid PRIMARY KEY DEFAULT gen_random_uuid(),3 owner_id uuid NOT NULL,4 key_hash text UNIQUE NOT NULL,5 key_prefix text NOT NULL,6 name text NOT NULL,7 is_active boolean DEFAULT true,8 created_at timestamptz DEFAULT now()9);1011CREATE TABLE scrape_jobs (12 id uuid PRIMARY KEY DEFAULT gen_random_uuid(),13 owner_id uuid NOT NULL,14 target_url text NOT NULL,15 selector_config jsonb NOT NULL DEFAULT '{}',16 schedule_cron text,17 status text DEFAULT 'pending'18 CHECK (status IN ('pending','running','completed','failed')),19 last_run_at timestamptz,20 created_at timestamptz DEFAULT now()21);2223CREATE TABLE scrape_results (24 id uuid PRIMARY KEY DEFAULT gen_random_uuid(),25 job_id uuid REFERENCES scrape_jobs(id) ON DELETE CASCADE,26 data jsonb NOT NULL,27 html_snapshot text,28 status_code int,29 duration_ms int,30 scraped_at timestamptz DEFAULT now()31);3233CREATE INDEX idx_results_job_id ON scrape_results(job_id);34CREATE INDEX idx_results_scraped_at ON scrape_results(scraped_at DESC);3536-- RLS: users only see their own data37ALTER TABLE scrape_jobs ENABLE ROW LEVEL SECURITY;38CREATE POLICY "Users manage own jobs" ON scrape_jobs39 FOR ALL USING (owner_id = auth.uid());4041ALTER TABLE scrape_results ENABLE ROW LEVEL SECURITY;42CREATE POLICY "Users see own results" ON scrape_results43 FOR SELECT USING (44 job_id IN (SELECT id FROM scrape_jobs WHERE owner_id = auth.uid())45 );Pro tip: The selector_config is a JSON map like {"title": "h1", "price": ".price-tag", "image": "img.hero@src"}. The @src suffix tells the scraper to extract an attribute instead of text content.
Expected result: Three tables with RLS policies. API keys use hashed storage (only the prefix is visible). Scrape results are indexed by job and timestamp for fast queries.
Build the scraping API with Cheerio and API key auth
Create the public API endpoint that authenticates via X-API-Key header, fetches the target URL, parses it with Cheerio using the provided CSS selectors, and returns structured data.
1import { NextRequest, NextResponse } from 'next/server'2import { createClient } from '@supabase/supabase-js'3import * as cheerio from 'cheerio'4import crypto from 'crypto'56export const maxDuration = 6078const supabase = createClient(9 process.env.NEXT_PUBLIC_SUPABASE_URL!,10 process.env.SUPABASE_SERVICE_ROLE_KEY!11)1213export async function POST(req: NextRequest) {14 const apiKey = req.headers.get('x-api-key')15 if (!apiKey) {16 return NextResponse.json({ error: 'Missing API key' }, { status: 401 })17 }1819 const keyHash = crypto.createHash('sha256').update(apiKey).digest('hex')20 const { data: keyRecord } = await supabase21 .from('api_keys')22 .select('owner_id, is_active')23 .eq('key_hash', keyHash)24 .single()2526 if (!keyRecord?.is_active) {27 return NextResponse.json({ error: 'Invalid API key' }, { status: 401 })28 }2930 const { url, selectors } = await req.json()31 const start = Date.now()3233 const res = await fetch(url, {34 headers: { 'User-Agent': 'RapidScraper/1.0' },35 signal: AbortSignal.timeout(15000),36 })3738 const html = await res.text()39 const $ = cheerio.load(html)40 const data: Record<string, string | string[]> = {}4142 for (const [key, selector] of Object.entries(selectors as Record<string, string>)) {43 const [sel, attr] = selector.split('@')44 const elements = $(sel)45 if (elements.length > 1) {46 data[key] = elements.map((_, el) =>47 attr ? $(el).attr(attr) ?? '' : $(el).text().trim()48 ).get()49 } else {50 data[key] = attr51 ? elements.attr(attr) ?? ''52 : elements.text().trim()53 }54 }5556 await supabase.from('scrape_results').insert({57 job_id: null,58 data,59 status_code: res.status,60 duration_ms: Date.now() - start,61 })6263 return NextResponse.json({ data, status_code: res.status, duration_ms: Date.now() - start })64}Pro tip: Set maxDuration = 60 at the top of the route file. This extends the Vercel serverless timeout from the default 10 seconds (Hobby) to 60 seconds (Pro plan). Without it, slow websites will timeout.
Expected result: POST to /api/scrape with an X-API-Key header and JSON body containing url and selectors returns the extracted data, HTTP status code, and duration.
Build the API key generation system
Create a secure API key generation flow. Keys are generated as random strings, the full key is shown once to the user, and only the SHA-256 hash is stored in the database.
1'use server'23import { createClient } from '@/lib/supabase/server'4import crypto from 'crypto'56export async function generateApiKey(name: string) {7 const supabase = await createClient()8 const user = (await supabase.auth.getUser()).data.user9 if (!user) return { error: 'Unauthorized' }1011 const rawKey = `rsk_${crypto.randomBytes(32).toString('hex')}`12 const keyHash = crypto.createHash('sha256').update(rawKey).digest('hex')13 const keyPrefix = rawKey.slice(0, 8)1415 const { error } = await supabase.from('api_keys').insert({16 owner_id: user.id,17 key_hash: keyHash,18 key_prefix: keyPrefix,19 name,20 })2122 if (error) return { error: error.message }2324 return { key: rawKey, prefix: keyPrefix }25}2627export async function revokeApiKey(keyId: string) {28 const supabase = await createClient()29 await supabase30 .from('api_keys')31 .update({ is_active: false })32 .eq('id', keyId)33}Pro tip: The raw key is returned only once during generation. After that, only the prefix (rsk_xxxx) is visible in the dashboard. This follows the same pattern as Stripe and GitHub API keys.
Expected result: Generating an API key shows the full key in a Dialog with a copy button. The key is hashed before storage — the full key is never retrievable again.
Build the job management dashboard
Create a dashboard where users manage scrape jobs, view result history, and configure CSS selectors for each target URL.
1// Paste this prompt into V0's AI chat:2// Build a scraping job dashboard at app/dashboard/page.tsx with:3// 1. Server Component fetching all scrape_jobs for the current user4// 2. shadcn/ui Table with columns: Target URL (truncated), Status Badge (pending=outline, running=secondary, completed=default, failed=destructive), Last Run, Results Count, Actions5// 3. Dialog for creating new jobs: Input for target URL, Textarea for JSON selector config with example placeholder ({"title": "h1", "price": ".price"}), optional Input for cron schedule6// 4. Per-job detail page at app/dashboard/jobs/[id]/page.tsx showing: result history Table, latest extracted data in a pre/code block, re-run Button7// 5. API Keys section with Table showing key prefix, name, status Badge, and Revoke Button with AlertDialog8// 6. Generate API Key Button opening Dialog with Input for key name, showing the generated key once with copy-to-clipboard9// 7. Summary Cards: total jobs, successful scrapes today, failed scrapes, active API keysExpected result: A dashboard with job Table, status Badges, API key management, and per-job result history with extracted data preview.
Add result caching and ScrapingBee fallback
Add result caching to avoid re-scraping the same URL within a TTL period, and integrate ScrapingBee as a fallback for JavaScript-rendered pages that Cheerio cannot parse.
1// Paste this prompt into V0's AI chat:2// Enhance the scraping API with:3// 1. Result caching: before scraping, check scrape_results for a result with the same URL scraped within the last hour. If found, return the cached data with a 'cached: true' flag.4// 2. ScrapingBee fallback: add a 'render_js' boolean parameter to the API. When true, call ScrapingBee's API (api.scrapingbee.com/api/v1?api_key=KEY&url=URL&render_js=true) instead of direct fetch. This handles JavaScript-rendered SPAs.5// 3. Set SCRAPINGBEE_API_KEY in V0's Vars tab (server-only).6// 4. Add a 'Test Scrape' Button on the job form that runs a single scrape and shows the result in a preview Card before saving the job.7// 5. Error handling: catch ETIMEDOUT, ECONNREFUSED, and HTTP 4xx/5xx with descriptive error messages.8// Update the scraping API route to support both modes.Expected result: The API checks for cached results before scraping. JavaScript-heavy pages can be scraped via ScrapingBee fallback. A test button lets users preview extraction results before saving jobs.
Complete code
1import { NextRequest, NextResponse } from 'next/server'2import { createClient } from '@supabase/supabase-js'3import * as cheerio from 'cheerio'4import crypto from 'crypto'56export const maxDuration = 6078const supabase = createClient(9 process.env.NEXT_PUBLIC_SUPABASE_URL!,10 process.env.SUPABASE_SERVICE_ROLE_KEY!11)1213export async function POST(req: NextRequest) {14 const apiKey = req.headers.get('x-api-key')15 if (!apiKey) {16 return NextResponse.json({ error: 'Missing API key' }, { status: 401 })17 }1819 const keyHash = crypto.createHash('sha256').update(apiKey).digest('hex')20 const { data: keyRecord } = await supabase21 .from('api_keys')22 .select('owner_id, is_active')23 .eq('key_hash', keyHash)24 .single()2526 if (!keyRecord?.is_active) {27 return NextResponse.json({ error: 'Invalid API key' }, { status: 401 })28 }2930 const { url, selectors } = await req.json()31 const start = Date.now()3233 const res = await fetch(url, {34 headers: { 'User-Agent': 'RapidScraper/1.0' },35 signal: AbortSignal.timeout(15000),36 })3738 const html = await res.text()39 const $ = cheerio.load(html)40 const data: Record<string, string | string[]> = {}4142 for (const [key, selector] of Object.entries(43 selectors as Record<string, string>44 )) {45 const [sel, attr] = selector.split('@')46 const elements = $(sel)47 if (elements.length > 1) {48 data[key] = elements49 .map((_, el) =>50 attr ? $(el).attr(attr) ?? '' : $(el).text().trim()51 )52 .get()53 } else {54 data[key] = attr55 ? elements.attr(attr) ?? ''56 : elements.text().trim()57 }58 }5960 return NextResponse.json({61 data,62 status_code: res.status,63 duration_ms: Date.now() - start,64 })65}Customization ideas
Add scheduled scraping with Vercel Cron
Configure vercel.json cron jobs to trigger scrape jobs on a schedule. Store the cron expression per job and check which jobs are due on each cron invocation.
Add webhook notifications
Let users configure a webhook URL per job. After each scrape, POST the results to their webhook. Useful for triggering downstream workflows when prices change.
Add data diff detection
Compare each scrape result against the previous one. Highlight changed fields and optionally send an email alert when monitored values (like prices) change.
Add CSV/JSON export
Add download buttons on the results page that export scrape history as CSV or JSON files for analysis in spreadsheets or data tools.
Add rate limiting per API key
Track API calls per key in a Redis counter (Upstash) and enforce limits like 100 requests/hour. Return 429 Too Many Requests when exceeded.
Common pitfalls
Pitfall: Using Puppeteer or Playwright for web scraping on Vercel
How to avoid: Use Cheerio for HTML parsing. It loads the HTML string and provides jQuery-like CSS selectors without a browser. For JavaScript-rendered pages, use a third-party service like ScrapingBee.
Pitfall: Not setting maxDuration in the route segment config
How to avoid: Export const maxDuration = 60 at the top of the route file. This extends the timeout to 60 seconds on Pro plans. On Hobby, the maximum is 10 seconds.
Pitfall: Storing API keys in plain text
How to avoid: Hash API keys with SHA-256 before storing. Show the full key only once during generation. Store only the prefix (first 8 chars) for display in the dashboard.
Pitfall: Not setting a fetch timeout for target URLs
How to avoid: Use AbortSignal.timeout(15000) in the fetch call. This aborts the request after 15 seconds, leaving time for Cheerio parsing within the overall maxDuration limit.
Best practices
- Use Cheerio instead of headless browsers for Vercel serverless — it parses HTML in milliseconds without Chromium overhead
- Set export const maxDuration = 60 in the route segment config to extend Vercel's serverless timeout
- Hash API keys with SHA-256 before storage and only show the full key once during generation
- Use AbortSignal.timeout() on fetch calls to prevent hanging requests from consuming your entire timeout budget
- Cache scrape results in Supabase with a TTL to avoid re-scraping unchanged pages repeatedly
- Use the @attr suffix convention in selectors (e.g., 'img.hero@src') to extract HTML attributes instead of text content
- Set SUPABASE_SERVICE_ROLE_KEY and SCRAPINGBEE_API_KEY in V0's Vars tab without NEXT_PUBLIC_ prefix
AI prompts to try
Copy these prompts to build this project faster.
I'm building a web scraping API with Next.js App Router, Supabase, and Cheerio. I need: 1) API key authentication using SHA-256 hashing, 2) Cheerio-based HTML extraction using configurable CSS selectors from a JSON map, 3) Vercel serverless timeout handling with maxDuration, 4) Result caching to avoid duplicate scrapes. Help me design the API and key management patterns.
Create a scraping API route at app/api/scrape/route.ts with: 1) export const maxDuration = 60 for extended timeout, 2) X-API-Key header authentication using SHA-256 hash lookup in Supabase, 3) fetch() with AbortSignal.timeout(15000) for the target URL, 4) Cheerio parsing with configurable CSS selectors from the request body, 5) Support for @attr suffix to extract HTML attributes, 6) Duration tracking and structured JSON response.
Frequently asked questions
Why Cheerio instead of Puppeteer?
Cheerio parses HTML strings using CSS selectors in milliseconds without needing a browser. Puppeteer requires 200MB+ of Chromium, exceeds Vercel serverless size limits, and takes 5-10 seconds just to launch. For static HTML pages, Cheerio is 100x faster and actually works on Vercel.
What about JavaScript-rendered pages?
Cheerio only parses the initial HTML response. For SPAs that render content with JavaScript, integrate with ScrapingBee (1,000 free requests) which runs a headless browser on their infrastructure and returns the rendered HTML for Cheerio to parse.
What is the Vercel timeout limit?
Hobby plan: 10 seconds. Pro plan: 60 seconds (with maxDuration = 60). Enterprise: 300 seconds. Set export const maxDuration = 60 in your route file to use the extended timeout on Pro.
How do API keys work?
When a user generates a key, the server creates a random string (rsk_...), shows it once, and stores only the SHA-256 hash. On each API request, the provided key is hashed and compared against stored hashes. This means even database access does not reveal the actual keys.
What V0 plan do I need?
V0 Premium ($20/month) is recommended. The scraping API involves multiple API routes, API key management, and a dashboard that require several prompt iterations.
How do I deploy this?
Click Share then Publish in V0. Set SUPABASE_SERVICE_ROLE_KEY in V0's Vars tab (no NEXT_PUBLIC_ prefix). Optionally add SCRAPINGBEE_API_KEY for JavaScript rendering fallback. The scraping API is immediately accessible at your Vercel production URL.
Can RapidDev help build a custom data extraction platform?
Yes. RapidDev has built 600+ apps including data pipeline platforms with scraping, transformation, and visualization. Book a free consultation to discuss your data extraction requirements.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation