Build a recommendations engine in Lovable using pgvector for item embeddings and collaborative filtering via Supabase. OpenAI generates vector embeddings for each item, stored in a pgvector column. A nightly pg_cron job regenerates user preference vectors from interaction history, and a For You feed queries the nearest neighbors using cosine similarity — no external ML infrastructure needed.
What you're building
pgvector is a PostgreSQL extension that adds a vector data type and similarity search operators. You store each item's OpenAI embedding (1536 dimensions for text-embedding-3-small) in a vector(1536) column. To find similar items, you use the <=> cosine distance operator in a SQL query with ORDER BY and LIMIT — the database handles all the math.
Content-based filtering works by comparing item vectors directly. When a user views item A, items with a cosine similarity above 0.8 to item A are surfaced as related recommendations. This works even for new items that have no interaction history.
Collaborative filtering creates a user preference vector by averaging the embeddings of all items the user has positively interacted with, weighted by recency and interaction strength (a purchase weights more than a view). This 'taste vector' is stored in a user_preferences table. The For You feed queries items with the smallest cosine distance to the user's taste vector.
The nightly pg_cron job calls a Supabase Edge Function that identifies users with new interactions since the last run, recomputes their preference vectors, and updates the user_preferences table. This batch approach scales well because only users with new activity need recomputation.
Final result
A fully functional recommendations engine with pgvector content-based and collaborative filtering, a For You feed, and related-items sections — no external ML infrastructure required.
Tech stack
Prerequisites
- Lovable Pro account for Edge Function and complex query generation
- Supabase project with pgvector extension enabled (Dashboard → Extensions → vector)
- Supabase Pro plan recommended for pg_cron (used for nightly regeneration)
- OpenAI API key saved to Cloud tab → Secrets as OPENAI_API_KEY
- Supabase service role key saved as SUPABASE_SERVICE_ROLE_KEY
Build steps
Enable pgvector and set up the recommendations schema
Prompt Lovable to create the database schema with the pgvector extension, items table with embedding column, interaction tracking, and user preference vectors. The HNSW index is critical for fast similarity search at scale.
1Set up a recommendations engine database in Supabase:23First, enable the vector extension: CREATE EXTENSION IF NOT EXISTS vector;45Tables:6- items: id, title, description, category (text), tags (text array), image_url, metadata (jsonb), embedding vector(1536), embedded_at (timestamptz), created_at7- user_interactions: id, user_id, item_id, interaction_type (view|like|save|purchase|skip), weight (float: view=0.1, like=0.5, save=0.7, purchase=1.0, skip=-0.2), created_at8- user_preferences: id (references auth.users), preference_vector vector(1536), vector_updated_at, interaction_count (int)9- recommendation_logs: id, user_id, item_id, algorithm (content_based|collaborative|hybrid), similarity_score (float), shown_at, clicked (bool default false)1011RLS:12- items: public SELECT. Service role for INSERT/UPDATE.13- user_interactions: users can INSERT/SELECT their own rows (user_id = auth.uid())14- user_preferences: users can SELECT their own row. Service role for INSERT/UPDATE.15- recommendation_logs: users can INSERT their own logs. Service role for SELECT.1617Indexes:18- CREATE INDEX idx_items_embedding ON items USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);19- CREATE INDEX idx_interactions_user ON user_interactions(user_id, interaction_type, created_at DESC);20- CREATE INDEX idx_user_preferences ON user_preferences USING hnsw (preference_vector vector_cosine_ops);2122Create a Supabase RPC function get_similar_items(p_item_id uuid, p_limit int) that returns items ordered by embedding <=> (SELECT embedding FROM items WHERE id = p_item_id) LIMIT p_limit, excluding the input item itself.2324Create a RPC function get_recommendations_for_user(p_user_id uuid, p_limit int) that returns items ordered by embedding <=> (SELECT preference_vector FROM user_preferences WHERE id = p_user_id) LIMIT p_limit.Pro tip: Start with the HNSW index (ef_construction=64, m=16) rather than IVFFlat. HNSW is faster for queries and doesn't require a pre-training VACUUM step — it builds incrementally as you insert vectors.
Expected result: pgvector extension is enabled. All tables are created with correct RLS. HNSW indexes are in place. The two RPC functions are ready. TypeScript types are generated.
Build the embedding generation Edge Function
Create a Supabase Edge Function that accepts an item ID (or a batch of IDs), generates OpenAI text embeddings from the item's title and description, and updates the embedding column. This is called when new items are inserted.
1// supabase/functions/generate-embeddings/index.ts2import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'3import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'45const corsHeaders = {6 'Access-Control-Allow-Origin': '*',7 'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',8 'Content-Type': 'application/json',9}1011serve(async (req: Request) => {12 if (req.method === 'OPTIONS') {13 return new Response('ok', { headers: corsHeaders })14 }1516 const supabase = createClient(17 Deno.env.get('SUPABASE_URL') ?? '',18 Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') ?? ''19 )2021 // Fetch items without embeddings (or specific IDs from request body)22 const body = await req.json().catch(() => ({}))23 let query = supabase24 .from('items')25 .select('id, title, description, tags')26 .limit(body.limit ?? 50)2728 if (body.item_ids?.length) {29 query = query.in('id', body.item_ids)30 } else {31 query = query.is('embedding', null)32 }3334 const { data: items, error } = await query35 if (error) return new Response(JSON.stringify({ error: error.message }), { status: 500, headers: corsHeaders })3637 const processed: string[] = []38 const failed: string[] = []3940 for (const item of items ?? []) {41 try {42 const inputText = [43 item.title,44 item.description,45 (item.tags ?? []).join(' '),46 ].filter(Boolean).join('. ')4748 const embeddingRes = await fetch('https://api.openai.com/v1/embeddings', {49 method: 'POST',50 headers: {51 'Authorization': `Bearer ${Deno.env.get('OPENAI_API_KEY')}`,52 'Content-Type': 'application/json',53 },54 body: JSON.stringify({55 model: 'text-embedding-3-small',56 input: inputText,57 }),58 })5960 const embData = await embeddingRes.json()61 const vector = embData.data?.[0]?.embedding6263 if (!vector) throw new Error('No embedding returned')6465 await supabase66 .from('items')67 .update({ embedding: vector, embedded_at: new Date().toISOString() })68 .eq('id', item.id)6970 processed.push(item.id)71 } catch {72 failed.push(item.id)73 }74 }7576 return new Response(77 JSON.stringify({ processed: processed.length, failed: failed.length }),78 { headers: corsHeaders }79 )80})Pro tip: Process items in batches of 50 and add a delay between batches to stay under OpenAI's rate limits. The text-embedding-3-small model supports up to 8,191 tokens of input — more than enough for a title + description.
Expected result: The Edge Function processes items without embeddings and updates their vector columns. Calling it with item_ids regenerates specific item embeddings. Processed and failed counts are returned.
Build the user preference vector computation
Create the Edge Function that computes user preference vectors from interaction history. It averages the embeddings of positively-interacted items with recency weighting and stores the result in user_preferences.
1// supabase/functions/update-user-preferences/index.ts2import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'3import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'45const corsHeaders = { 'Content-Type': 'application/json' }67serve(async (req: Request) => {8 const body = await req.json().catch(() => ({}))9 // Process a specific user or all users with recent interactions10 const userId = body.user_id as string | undefined1112 const supabase = createClient(13 Deno.env.get('SUPABASE_URL') ?? '',14 Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') ?? ''15 )1617 // Get users to update18 let userIds: string[] = []19 if (userId) {20 userIds = [userId]21 } else {22 const since = new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString()23 const { data } = await supabase24 .from('user_interactions')25 .select('user_id')26 .gte('created_at', since)27 .not('interaction_type', 'eq', 'skip')28 userIds = [...new Set((data ?? []).map(r => r.user_id))]29 }3031 const results = { updated: 0, skipped: 0 }3233 for (const uid of userIds) {34 const { data: interactions } = await supabase35 .from('user_interactions')36 .select('weight, created_at, items(embedding)')37 .eq('user_id', uid)38 .gt('weight', 0)39 .not('items.embedding', 'is', null)40 .order('created_at', { ascending: false })41 .limit(100)4243 if (!interactions?.length) { results.skipped++; continue }4445 // Weighted average with recency decay46 const now = Date.now()47 const VECTOR_DIM = 153648 const sumVector = new Array(VECTOR_DIM).fill(0)49 let totalWeight = 05051 for (const interaction of interactions) {52 const embedding = (interaction.items as any)?.embedding as number[] | null53 if (!embedding) continue5455 const ageMs = now - new Date(interaction.created_at).getTime()56 const ageDays = ageMs / (1000 * 60 * 60 * 24)57 // Exponential decay: interactions from 30 days ago get half the weight58 const recencyWeight = Math.exp(-ageDays * Math.log(2) / 30)59 const totalW = interaction.weight * recencyWeight6061 for (let i = 0; i < VECTOR_DIM; i++) {62 sumVector[i] += embedding[i] * totalW63 }64 totalWeight += totalW65 }6667 if (totalWeight === 0) { results.skipped++; continue }6869 const preferenceVector = sumVector.map(v => v / totalWeight)7071 await supabase.from('user_preferences').upsert({72 id: uid,73 preference_vector: preferenceVector,74 vector_updated_at: new Date().toISOString(),75 interaction_count: interactions.length,76 }, { onConflict: 'id' })7778 results.updated++79 }8081 return new Response(JSON.stringify(results), { headers: corsHeaders })82})Expected result: The Edge Function computes weighted preference vectors for users with recent interactions and upserts to user_preferences. Running it manually for a test user populates their preference vector.
Build the For You feed and related items UI
Create the main recommendations feed using the RPC functions built in step 1. Show a For You card grid with similarity scores, and add a related items section to each item detail page.
1Build two pages:231. For You feed at src/pages/ForYouFeed.tsx:4- Call the get_recommendations_for_user(user.id, 20) Supabase RPC function5- Render results as a responsive grid of Cards (2 cols mobile, 3 cols tablet, 4 cols desktop)6- Each Card shows: item image, title, category Badge, a 'Match' Badge showing the similarity score as a percentage (e.g. similarity score 0.89 = '89% Match')7- Below the match badge, show a reason label based on the item category and user's top interaction category (e.g. 'Because you liked: [category]')8- Add a Skeleton grid that shows while loading9- Add an 'Explore' Tab alongside 'For You' that shows random items from categories the user has not interacted with10- Log a recommendation_logs INSERT whenever an item Card is shown (shown_at = now, algorithm = 'collaborative')11- Log clicked = true when a Card is clicked12132. Related items section for item detail pages:14- After the main item content, add a 'Similar Items' section15- Call get_similar_items(item.id, 6) and render as a horizontal ScrollArea of Cards16- Each card shows image, title, and a 'Similarity: X%' muted text17- Show 'No similar items yet' if embeddings have not been generated for this item yetExpected result: The For You feed shows personalized recommendations from the user's preference vector. The related items section shows content-based neighbors. Impression and click logs populate recommendation_logs.
Set up nightly regeneration and the recommendation dashboard
Configure the pg_cron schedule for nightly vector updates and build the recommendations analytics dashboard showing impression rates, click-through rates, and coverage metrics.
1Build two things:231. Prompt Lovable to create the pg_cron schedule:4Set up a pg_cron job named 'nightly-preference-update' that runs every night at 3am UTC. It should call the Supabase Edge Function update-user-preferences using the net.http_post function from pg_net extension. Pass no body (process all users with recent interactions). Also set up a second cron job 'embed-new-items' that runs every hour and calls the generate-embeddings Edge Function to process any items with a NULL embedding column.562. Analytics dashboard at src/pages/RecommendationAnalytics.tsx (admin only):7- Total recommendations shown today (count from recommendation_logs where shown_at >= today)8- Click-through rate: (clicked = true) / total as a percentage, shown as a large stat Card9- Coverage: percentage of items table that has a non-null embedding10- User preference coverage: percentage of auth.users that have a row in user_preferences11- A Recharts LineChart showing daily CTR over the last 30 days12- A BarChart showing CTR by algorithm (content_based vs collaborative)13- Top 10 most recommended items (by count) as a DataTable with recommendation count and CTR columnsPro tip: Ask Lovable to add a 'Regenerate All Embeddings' admin Button that calls generate-embeddings with no filter. Use it after updating your item text or switching to a different OpenAI embedding model.
Expected result: pg_cron jobs are scheduled. The analytics dashboard shows live recommendation metrics. Coverage stats expose items and users that still need embeddings or preference vectors.
Complete code
1import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'2import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'34const corsHeaders = {5 'Access-Control-Allow-Origin': '*',6 'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',7 'Content-Type': 'application/json',8}910serve(async (req: Request) => {11 if (req.method === 'OPTIONS') {12 return new Response('ok', { headers: corsHeaders })13 }1415 const supabase = createClient(16 Deno.env.get('SUPABASE_URL') ?? '',17 Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') ?? ''18 )1920 const body = await req.json().catch(() => ({}))2122 let query = supabase23 .from('items')24 .select('id, title, description, tags')25 .limit(body.limit ?? 50)2627 if (body.item_ids?.length) {28 query = query.in('id', body.item_ids)29 } else {30 query = query.is('embedding', null)31 }3233 const { data: items, error } = await query34 if (error) {35 return new Response(JSON.stringify({ error: error.message }), {36 status: 500, headers: corsHeaders,37 })38 }3940 const processed: string[] = []41 const failed: string[] = []4243 for (const item of items ?? []) {44 try {45 const inputText = [46 item.title,47 item.description,48 (item.tags ?? []).join(' '),49 ].filter(Boolean).join('. ').slice(0, 8000)5051 const embRes = await fetch('https://api.openai.com/v1/embeddings', {52 method: 'POST',53 headers: {54 Authorization: `Bearer ${Deno.env.get('OPENAI_API_KEY')}`,55 'Content-Type': 'application/json',56 },57 body: JSON.stringify({ model: 'text-embedding-3-small', input: inputText }),58 })5960 const embData = await embRes.json()61 const vector: number[] = embData.data?.[0]?.embedding62 if (!vector) throw new Error(`No embedding for item ${item.id}`)6364 const { error: uErr } = await supabase65 .from('items')66 .update({ embedding: vector, embedded_at: new Date().toISOString() })67 .eq('id', item.id)6869 if (uErr) throw uErr70 processed.push(item.id)71 } catch {72 failed.push(item.id)73 }74 }7576 return new Response(77 JSON.stringify({ processed: processed.length, failed: failed.length, failed_ids: failed }),78 { headers: corsHeaders }79 )80})Customization ideas
Hybrid scoring with popularity boost
Blend the cosine similarity score with an item's overall popularity (view count, like count) using a weighted formula: final_score = 0.7 * similarity + 0.3 * normalized_popularity. This prevents the engine from only recommending niche items that happen to be semantically close but have no social proof.
Category diversity injection
After computing the top 20 nearest neighbors for a user, apply a diversification step: ensure no more than 3 items from the same category appear in the first 10 results. Rotate in items from underrepresented categories to expose users to new content areas. Implement this as a JavaScript function on the results array after fetching.
Explicit feedback collection
Add thumbs up and thumbs down buttons to each recommendation card. Negative feedback (thumbs down) stores an interaction_type='dislike' with weight=-1.5, which pulls the preference vector away from similar items. Show a 'Not interested in [category]' option that adds a category exclusion to the user's preferences table.
Multi-modal embeddings with image content
Use OpenAI's text-embedding model on a combined text prompt that describes the item's image: call GPT-4o Vision to generate a description of the item image, then combine it with the title and description text before embedding. This makes the similarity search aware of visual content, not just text.
Recommendation explanations
Add an explanations column to recommendation_logs that stores the top 2–3 items from the user's interaction history that most contributed to this recommendation (the items with highest cosine similarity to the recommended item). Display these in a 'Because you liked: X, Y' tooltip on each card.
Common pitfalls
Pitfall: Using IVFFLAT index without running VACUUM ANALYZE first
How to avoid: Use HNSW index instead (as specified in step 1). HNSW builds incrementally and works well from the first insert. Switch to IVFFlat only if you have millions of vectors and need lower memory usage.
Pitfall: Storing embeddings as a JSONB array instead of a vector type
How to avoid: Use the vector(1536) type provided by the pgvector extension. Make sure the extension is enabled before creating the table. Verify with: SELECT * FROM pg_extension WHERE extname = 'vector'.
Pitfall: Including skip interactions in the preference vector
How to avoid: Filter out skip interactions when computing the preference vector. Handle negative feedback separately by computing a 'dislike vector' and subtracting it from the preference vector: final_vector = preference_vector - 0.5 * dislike_vector.
Pitfall: Recomputing all user preference vectors on every page load
How to avoid: Use the nightly batch approach. The pg_cron job updates preference vectors for users who have new interactions. The frontend reads from the pre-computed user_preferences table — a single indexed lookup.
Best practices
- Use text-embedding-3-small (1536 dimensions) rather than text-embedding-ada-002. It is cheaper per token, equally accurate for recommendation tasks, and the vectors are smaller which reduces storage and query time.
- Cap the interaction history used for preference vector computation at 100–200 items. Very old interactions are less relevant and including them dilutes the signal from recent behavior.
- Log every recommendation impression in recommendation_logs. This lets you measure CTR, detect distribution shift, and identify items that are over-recommended (always appear but never get clicked).
- Add an embedding freshness check: if an item's description or tags change significantly, clear its embedding column so the hourly cron job regenerates it. Add a trigger that sets embedding = NULL on UPDATE of title, description, or tags.
- Test recommendation quality manually by creating test users with known preferences and verifying the top recommendations make semantic sense. Automated metrics like CTR can miss qualitative failures.
- Set a minimum similarity threshold (e.g. 0.65) on recommendations. If the user's preference vector has no good matches above the threshold, fall back to popular items in their most-interacted category rather than showing poor matches.
- Add a new_user cold start path: for users with fewer than 5 interactions, show popular items in each category and ask them to rate 3–5 items to bootstrap their preference vector.
AI prompts to try
Copy these prompts to build this project faster.
I'm building a recommendations engine using pgvector in Supabase. I have a user_interactions table with user_id, item_id, weight (float), and created_at. I want to compute a user's preference vector by averaging the item embeddings weighted by interaction weight and recency decay. Write a TypeScript function that takes an array of interaction records (each with weight, created_at, and embedding: number[]) and returns a weighted average vector. Use exponential decay with a half-life of 30 days for recency weighting.
Add a 'Taste Profile' page at /profile/taste. Fetch the current user's preference_vector from user_preferences. Call an Edge Function that finds the 5 items with the highest cosine similarity to this vector and returns them as 'defining items'. Show these 5 items in a prominent Card grid with the label 'Your taste is defined by'. Also show a radar chart using Recharts with 6 category axes, each showing the proportion of the user's interactions in that category. Derive category preferences from user_interactions joined to items.
In Supabase, write a SQL function get_recommendations_for_user(p_user_id uuid, p_limit int, p_min_similarity float DEFAULT 0.65) that returns item id, title, category, image_url, and (1 - (embedding <=> pv.preference_vector)) as similarity_score from items cross join (select preference_vector from user_preferences where id = p_user_id) as pv where (1 - (embedding <=> pv.preference_vector)) >= p_min_similarity and items.id not in (select item_id from user_interactions where user_id = p_user_id) order by similarity_score desc limit p_limit. Handle the case where the user has no preference vector by returning popular items.
Frequently asked questions
Do I need to know anything about machine learning to build this?
No. The Lovable prompts and Edge Function code provided handle all the vector math. pgvector's <=> operator does the cosine similarity computation inside PostgreSQL — you write SQL, not ML code. OpenAI generates the embeddings via a single API call per item. The concepts (averaging vectors, cosine distance) can be treated as black boxes for building purposes.
How much does it cost to generate embeddings for all my items?
OpenAI text-embedding-3-small costs $0.02 per million tokens. A typical item title + description is about 100 tokens. Generating embeddings for 10,000 items costs approximately $0.02 — essentially free. Regenerating them monthly for 10,000 items adds $0.24/year. Monitor your OpenAI usage dashboard to confirm actual costs for your content.
What happens when a new user has no interaction history?
New users have no preference vector. The get_recommendations_for_user function returns empty results. Handle this cold-start problem by showing an onboarding screen where users select 3–5 categories or rate 5 sample items. Insert those ratings as 'onboarding' user_interactions with weight 0.5, then call the update-user-preferences Edge Function immediately to bootstrap the preference vector.
Can I use a different embedding model instead of OpenAI?
Yes. Any model that returns a fixed-length float array works. If you switch models, change the vector(1536) column to match the new dimension (e.g. vector(768) for smaller models). You must regenerate all item embeddings with the new model — mixing vectors from different models produces meaningless similarity scores. Set all embedding columns to NULL and run the embedding Edge Function again.
How many items can pgvector handle efficiently?
The HNSW index performs well up to millions of vectors on a reasonably sized Postgres instance. Supabase Pro (8GB RAM) handles approximately 500,000–1,000,000 1536-dimension vectors with sub-100ms query times. For larger catalogs, consider reducing vector dimensions using PCA, or switching to a dedicated vector database like Qdrant.
Can RapidDev help build a more sophisticated recommendation system?
Yes. RapidDev builds production recommendation engines including real-time preference updates, multi-objective ranking (balancing relevance, diversity, and freshness), and A/B testing frameworks for ranking algorithms. Reach out if you need recommendation quality that goes beyond the cosine similarity baseline.
How do I prevent the same items from appearing every time in the For You feed?
Exclude items the user has already interacted with in the get_recommendations_for_user query using a subquery: AND items.id NOT IN (SELECT item_id FROM user_interactions WHERE user_id = p_user_id). For the feed UI, track which items have been shown in this session using React state and filter them out before rendering. Add a 'Load more' button that fetches the next page of recommendations.
How long does it take for new user interactions to affect recommendations?
With the nightly batch approach, new interactions take up to 24 hours to affect recommendations. For a more responsive system, call the update-user-preferences Edge Function via a Supabase Database Trigger whenever a new interaction is inserted. This updates the preference vector in near real-time at the cost of more Edge Function invocations. Use a queue pattern to batch rapid interactions.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation