Build a collaborative filtering recommendations engine in Replit using Express and PostgreSQL in 2-4 hours. You'll track user interactions, compute item similarity scores with a nightly Scheduled Deployment, and serve personalized recommendations via a fast cache layer — a lightweight Netflix-style system without any ML infrastructure.
What you're building
A recommendations engine adds personalization to any app with a catalog — a product store, a content library, a job board, or a recipe collection. Instead of showing the same items to every user, the engine learns from behavior: users who interacted with item A also tend to interact with item B, so when a new user engages with A, suggest B. This is collaborative filtering — the same technique behind Amazon's 'Customers also bought' and Netflix's recommendation rows.
Replit Agent builds the full backend with five PostgreSQL tables: items, users, interactions, similarity_scores, and a recommendation_cache. The architecture separates concerns clearly: the interactions API is fast (just an insert), the similarity calculation is slow but runs offline as a Scheduled Deployment, and recommendation lookups are instant because they hit the pre-built cache. This pattern handles thousands of items without specialized ML infrastructure — just PostgreSQL functions.
The most technically interesting piece is the cosine similarity formula implemented as a PostgreSQL function. For two items A and B, similarity = SUM(wa * wb) / (SQRT(SUM(wa^2)) * SQRT(SUM(wb^2))), where w is each user's interaction weight for that item. This is O(N^2) across items — expensive to compute live, but cheap to read from the pre-computed similarity_scores table. The nightly batch job recalculates only the pairs that changed based on new interactions.
Final result
A recommendations engine API with interaction tracking, nightly similarity recalculation, and per-user recommendation cache — deployable as a standalone microservice or embedded in your existing Replit app.
Tech stack
Prerequisites
- A Replit account (Free tier is sufficient for under 10K items)
- An existing app or catalog you want to add recommendations to (products, articles, videos, etc.)
- Basic understanding of what collaborative filtering means (users who liked X also liked Y)
- No external API keys needed
Build steps
Scaffold the project with Replit Agent
Create a new Replit App and paste this prompt. Agent builds the complete recommendations backend with all five tables and core API routes.
1// Build a recommendations engine with Express and PostgreSQL using Drizzle ORM.2//3// Tables:4// 1. items: id serial primary key, name text not null, category text,5// tags text[] (PostgreSQL array), metadata jsonb, created_at timestamp default now()6// 2. users: id serial primary key, user_id text not null unique,7// preferences jsonb, created_at timestamp default now()8// 3. interactions: id serial primary key, user_id integer references users not null,9// item_id integer references items not null, type text not null10// (enum: view/click/purchase/rating/wishlist), value numeric (1-5 for ratings),11// weight numeric default 1.0, created_at timestamp default now()12// 4. similarity_scores: id serial primary key, item_a_id integer references items not null,13// item_b_id integer references items not null, score numeric not null (0 to 1),14// algorithm text default 'cosine', calculated_at timestamp default now(),15// unique(item_a_id, item_b_id, algorithm)16// 5. recommendation_cache: id serial primary key, user_id integer references users not null unique,17// recommendations jsonb not null (array of {item_id, score, reason}),18// calculated_at timestamp default now()19//20// Routes:21// POST /api/interactions (record interaction — no auth required, just user_id in body)22// GET /api/recommendations/:userId (get personalized recommendations, uses cache)23// GET /api/items/:id/similar (get similar items from similarity_scores)24// GET /api/popular (globally popular items fallback)25// POST /api/admin/recalculate (trigger similarity matrix rebuild — requires ADMIN_KEY header)26//27// Use Replit Auth for admin routes. Bind to 0.0.0.0:3000.Pro tip: Seed the items table with at least 20 test items before testing recommendations. Without a catalog, there's nothing to recommend. Ask Agent: 'Insert 20 sample items into the items table with varied categories and tags.'
Expected result: Running Express app with all five tables. GET /api/popular returns an empty array until interactions are recorded.
Build the interaction recording API
The interactions endpoint is called every time a user does something with an item. Different interaction types have different weights — a purchase signals much stronger preference than a view.
1const express = require('express');2const { db } = require('../db');3const { users, interactions } = require('../schema');4const { eq } = require('drizzle-orm');5const { withDbRetry } = require('../lib/retryDb');67const router = express.Router();89// Interaction weights — higher = stronger signal10const WEIGHTS = {11 view: 0.5,12 click: 1.0,13 wishlist: 1.5,14 rating: 2.0,15 purchase: 3.0,16};1718router.post('/api/interactions', express.json(), async (req, res) => {19 const { userId, itemId, type, value } = req.body;2021 if (!userId || !itemId || !type) {22 return res.status(400).json({ error: 'userId, itemId, and type are required' });23 }24 if (!WEIGHTS[type]) {25 return res.status(400).json({ error: `Invalid type. Must be: ${Object.keys(WEIGHTS).join(', ')}` });26 }2728 // Upsert the user record (create if first interaction)29 await withDbRetry(() =>30 db.insert(users)31 .values({ userId: String(userId) })32 .onConflictDoNothing()33 );3435 // Get internal user ID36 const [user] = await db.select().from(users)37 .where(eq(users.userId, String(userId))).limit(1);3839 const weight = WEIGHTS[type];4041 await withDbRetry(() =>42 db.insert(interactions).values({43 userId: user.id,44 itemId: parseInt(itemId),45 type,46 value: value ? parseFloat(value) : null,47 weight,48 })49 );5051 return res.json({ recorded: true });52});5354module.exports = router;Pro tip: Call POST /api/interactions from your main app whenever a user views, clicks, or purchases an item. Send it as a fire-and-forget request so tracking never slows down the user experience. In the browser: fetch('/api/interactions', {method:'POST', ...}).catch(() => {})
Expected result: POST /api/interactions with {userId: 'u1', itemId: 5, type: 'view'} returns {recorded: true} and inserts into the interactions table with weight 0.5.
Build the similarity calculation batch job
The similarity matrix is pre-computed nightly by a Scheduled Deployment. This PostgreSQL function calculates cosine similarity between all item pairs based on their interaction vectors.
1// server/jobs/calculateSimilarity.js — run as a Scheduled Deployment nightly2const { db } = require('../db');3const { similarityScores } = require('../schema');4const { sql } = require('drizzle-orm');56async function calculateSimilarityMatrix() {7 console.log('[similarity] Starting similarity matrix calculation...');89 // Calculate cosine similarity between all item pairs10 // that share at least 3 common users (reduces noise)11 await db.execute(sql`12 INSERT INTO similarity_scores (item_a_id, item_b_id, score, algorithm, calculated_at)13 SELECT14 a.item_id AS item_a_id,15 b.item_id AS item_b_id,16 -- Cosine similarity: SUM(wa*wb) / (SQRT(SUM(wa^2)) * SQRT(SUM(wb^2)))17 SUM(a.weight * b.weight) /18 (SQRT(SUM(a.weight * a.weight)) * SQRT(SUM(b.weight * b.weight))) AS score,19 'cosine' AS algorithm,20 NOW() AS calculated_at21 FROM interactions a22 JOIN interactions b23 ON a.user_id = b.user_id24 AND a.item_id < b.item_id -- only compute each pair once25 GROUP BY a.item_id, b.item_id26 HAVING COUNT(DISTINCT a.user_id) >= 2 -- at least 2 shared users27 AND SUM(a.weight * a.weight) > 028 AND SUM(b.weight * b.weight) > 029 ON CONFLICT (item_a_id, item_b_id, algorithm)30 DO UPDATE SET31 score = EXCLUDED.score,32 calculated_at = EXCLUDED.calculated_at33 `);3435 // Update recommendation cache for all active users36 const usersResult = await db.execute(sql`SELECT id, user_id FROM users LIMIT 1000`);37 for (const user of usersResult.rows) {38 await buildUserRecommendations(user.id, user.user_id);39 }4041 console.log('[similarity] Done.');42}4344async function buildUserRecommendations(internalUserId, externalUserId) {45 // Get items the user has already interacted with46 const userItems = await db.execute({47 sql: 'SELECT DISTINCT item_id FROM interactions WHERE user_id = $1',48 params: [internalUserId],49 });50 const seen = new Set(userItems.rows.map(r => r.item_id));5152 if (seen.size === 0) return; // Skip cold-start users5354 // Find similar items the user hasn't seen yet55 const recs = await db.execute({56 sql: `57 SELECT58 CASE WHEN ss.item_a_id = ANY($1::int[]) THEN ss.item_b_id ELSE ss.item_a_id END AS rec_item_id,59 MAX(ss.score) AS score,60 MIN(CASE WHEN ss.item_a_id = ANY($1::int[]) THEN ss.item_a_id ELSE ss.item_b_id END) AS because_item_id61 FROM similarity_scores ss62 WHERE (ss.item_a_id = ANY($1::int[]) OR ss.item_b_id = ANY($1::int[]))63 AND NOT (ss.item_a_id = ANY($1::int[]) AND ss.item_b_id = ANY($1::int[]))64 GROUP BY rec_item_id65 ORDER BY score DESC66 LIMIT 2067 `,68 params: [Array.from(seen)],69 });7071 const recommendations = recs.rows.map(r => ({72 item_id: r.rec_item_id,73 score: parseFloat(r.score),74 because_item_id: r.because_item_id,75 }));7677 await db.execute({78 sql: `79 INSERT INTO recommendation_cache (user_id, recommendations, calculated_at)80 VALUES ($1, $2, NOW())81 ON CONFLICT (user_id) DO UPDATE SET82 recommendations = EXCLUDED.recommendations,83 calculated_at = EXCLUDED.calculated_at84 `,85 params: [internalUserId, JSON.stringify(recommendations)],86 });87}8889calculateSimilarityMatrix()90 .then(() => process.exit(0))91 .catch(err => { console.error('[similarity] Error:', err); process.exit(1); });Pro tip: In Replit's Publish pane, create a Scheduled Deployment for this job set to run at 2:00 AM daily. Set the entrypoint to server/jobs/calculateSimilarity.js. The first run will be slow if you have many items — subsequent runs are faster because of the ON CONFLICT DO UPDATE upsert.
Expected result: After the job runs, the similarity_scores table fills with cosine similarity values between item pairs. recommendation_cache shows pre-ranked recommendations per user.
Build the recommendations retrieval API
The recommendation endpoint serves from the pre-built cache for fast response. For cold-start users with no interactions, it falls back to the globally popular items.
1const express = require('express');2const { db } = require('../db');3const { users, recommendationCache, items, interactions } = require('../schema');4const { eq, sql, desc } = require('drizzle-orm');56const router = express.Router();78router.get('/api/recommendations/:userId', async (req, res) => {9 const { userId } = req.params;1011 // Find internal user record12 const [user] = await db.select().from(users)13 .where(eq(users.userId, userId)).limit(1);1415 if (!user) {16 // New user — return popular items (cold start)17 return servePopular(res);18 }1920 // Check cache freshness (accept if calculated in last 24 hours)21 const [cache] = await db.select().from(recommendationCache)22 .where(eq(recommendationCache.userId, user.id)).limit(1);2324 if (cache && new Date() - new Date(cache.calculatedAt) < 24 * 60 * 60 * 1000) {25 return res.json({26 userId,27 source: 'cache',28 recommendations: cache.recommendations,29 });30 }3132 // Cache stale or missing — return popular as fallback until next nightly run33 return servePopular(res);34});3536async function servePopular(res) {37 const popular = await db.execute(sql`38 SELECT item_id, COUNT(*) AS interaction_count, COUNT(DISTINCT user_id) AS user_count39 FROM interactions40 GROUP BY item_id41 ORDER BY user_count DESC, interaction_count DESC42 LIMIT 1043 `);4445 return res.json({46 source: 'popular',47 recommendations: popular.rows.map(r => ({48 item_id: r.item_id,49 score: null,50 reason: 'Popular with other users',51 })),52 });53}5455router.get('/api/items/:id/similar', async (req, res) => {56 const itemId = parseInt(req.params.id);57 const similar = await db.execute({58 sql: `59 SELECT60 CASE WHEN item_a_id = $1 THEN item_b_id ELSE item_a_id END AS similar_item_id,61 score62 FROM similarity_scores63 WHERE item_a_id = $1 OR item_b_id = $164 ORDER BY score DESC65 LIMIT 866 `,67 params: [itemId],68 });69 return res.json({ itemId, similar: similar.rows });70});7172module.exports = router;Pro tip: Enrich the recommendation response by joining item IDs against the items table before returning. The cache stores just IDs and scores for compactness — expand them in the route handler to return full item objects with name, image_url, and category.
Expected result: GET /api/recommendations/u1 returns personalized recommendations from cache. A new user gets popular items instead. GET /api/items/5/similar returns the 8 most similar items.
Build the React recommendations widget
Ask Agent to build the frontend recommendations widget and admin dashboard that shows the similarity matrix heatmap.
1// Ask Agent to build the React frontend with this prompt:2// Build a React recommendations UI with two components:3//4// 1. RecommendationsWidget component:5// - Accept userId prop6// - Fetch GET /api/recommendations/:userId on mount7// - Display a horizontal scrollable row of item cards8// - Each card shows: item image (from metadata.image_url), item name, category badge,9// match score as percentage (score * 100, rounded)10// - If recommendations have because_item_id, show a tooltip on hover:11// 'Because you interacted with [item name]'12// - Show 'Popular picks' heading if source='popular' (cold start state)13//14// 2. Admin Dashboard page at /admin/recommendations:15// - Require ADMIN_KEY header (store in localStorage)16// - Fetch GET /api/admin/recalculate stats: count of items, interactions, similarity pairs17// - Show three stat cards: Total Items, Total Interactions, Similarity Pairs Computed18// - Show a 'Recalculate Now' button that calls POST /api/admin/recalculate19// - Show a heatmap-style similarity matrix: sample 20 random items,20// show a 20x20 grid where each cell color = similarity score (white=0, blue=1)21// - Show recent calculation timestamp22//23// Also add a GET /api/items route that returns items with their recommendation stats24// (interaction_count, unique_users from joining interactions).Expected result: The RecommendationsWidget renders a horizontal card row with match scores. The admin dashboard shows system stats and a colored similarity matrix grid.
Complete code
1const express = require('express');2const { db } = require('../db');3const { users, interactions, items } = require('../schema');4const { eq } = require('drizzle-orm');5const { withDbRetry } = require('../lib/retryDb');67const router = express.Router();89const INTERACTION_WEIGHTS = {10 view: 0.5,11 click: 1.0,12 wishlist: 1.5,13 rating: 2.0,14 purchase: 3.0,15};1617router.post('/api/interactions', express.json(), async (req, res) => {18 const { userId, itemId, type, value } = req.body;19 if (!userId || !itemId || !type) {20 return res.status(400).json({ error: 'userId, itemId, and type are required' });21 }22 if (!INTERACTION_WEIGHTS[type]) {23 return res.status(400).json({24 error: `Invalid type. Valid: ${Object.keys(INTERACTION_WEIGHTS).join(', ')}`,25 });26 }27 // Verify item exists28 const [item] = await db.select({ id: items.id })29 .from(items).where(eq(items.id, parseInt(itemId))).limit(1);30 if (!item) return res.status(404).json({ error: 'Item not found' });3132 // Upsert user33 await withDbRetry(() =>34 db.insert(users).values({ userId: String(userId) }).onConflictDoNothing()35 );36 const [user] = await db.select().from(users)37 .where(eq(users.userId, String(userId))).limit(1);3839 await withDbRetry(() =>40 db.insert(interactions).values({41 userId: user.id,42 itemId: item.id,43 type,44 value: value != null ? parseFloat(value) : null,45 weight: INTERACTION_WEIGHTS[type],46 })47 );4849 return res.json({ recorded: true });50});5152router.get('/api/popular', async (req, res) => {53 const result = await db.execute({54 sql: `55 SELECT i.id, i.name, i.category, i.tags, i.metadata,56 COUNT(int.id) AS interaction_count,57 COUNT(DISTINCT int.user_id) AS user_count58 FROM items i59 LEFT JOIN interactions int ON i.id = int.item_id60 GROUP BY i.id61module.exports = router;Customization ideas
Content-based filtering fallback
For items with very few interactions, use content-based similarity instead: compare item tags arrays using the Jaccard index (intersection / union). This ensures new items get recommended before they accumulate behavioral data.
Recency decay on interaction weights
Multiply interaction weights by a recency factor: weight * EXP(-days_since_interaction / 30). Interactions from 30 days ago count half as much as today's. This keeps recommendations fresh as user tastes evolve.
Recommendation explanation API
Build GET /api/recommendations/:userId/explain that returns the top 3 interactions that drove each recommendation — 'Because you purchased Item A (score: 0.82) and viewed Item B (score: 0.71)'. Shows users why they're seeing suggestions.
Category diversity enforcement
Add a diversity step to the recommendation builder: after ranking by score, ensure at most 40% of recommendations are from the same category. This prevents a user who views one product category from getting recommendations exclusively from that category.
Common pitfalls
Pitfall: Calculating cosine similarity in real time on every recommendation request
How to avoid: Always pre-compute and store similarity_scores via the nightly Scheduled Deployment. Recommendation API reads from similarity_scores and recommendation_cache — both are indexed lookups.
Pitfall: Not handling the cold-start problem for new users
How to avoid: Always check if the user has interactions before querying the cache. Return popular items (GET /api/popular) as the fallback for users with fewer than 3 interactions.
Pitfall: Recommending items the user already interacted with
How to avoid: The recommendation builder queries exclude items already in the user's interaction history using WHERE NOT (item_id = ANY(seen_item_ids)).
Pitfall: Running the similarity job without a PostgreSQL retry wrapper
How to avoid: Wrap the first DB call in withDbRetry() to handle the cold-start connection failure. The retry with 250ms delay is enough for the database to wake up.
Best practices
- Pre-compute similarity_scores nightly via Scheduled Deployment — never calculate cosine similarity on the request path.
- Use the recommendation_cache table for instant recommendation serving — check if cache is less than 24 hours old before falling back to real-time computation.
- Weight different interaction types differently: purchases (3.0) >> ratings (2.0) >> wishlist (1.5) >> click (1.0) >> view (0.5). Purchase signals are 6x more valuable than views.
- Keep the item catalog under 10K items with Replit's free PostgreSQL. The O(N^2) similarity calculation becomes slow above this — move to a dedicated vector database for larger catalogs.
- Use Drizzle Studio (Database tab) to inspect similarity_scores after the first batch job run — verify that similar items are being grouped logically before wiring up the frontend.
- Add a minimum shared-users threshold (at least 2) to the similarity calculation to filter out coincidental single-user overlap that would produce artificially high similarity scores.
- Deploy on Autoscale for the main API (recommendations serve from cache = fast). The nightly batch job runs as a separate Scheduled Deployment on Reserved VM for reliability.
AI prompts to try
Copy these prompts to build this project faster.
I'm building a collaborative filtering recommendations engine with Express and PostgreSQL. I have an interactions table (user_id integer, item_id integer, weight numeric) and a similarity_scores table (item_a_id, item_b_id, score). Help me write a PostgreSQL query that calculates the cosine similarity between all item pairs using the interaction weight matrix. The query should: only consider pairs with at least 2 shared users, use the formula SUM(wa*wb)/(SQRT(SUM(wa^2))*SQRT(SUM(wb^2))), and insert results into similarity_scores using INSERT ... ON CONFLICT DO UPDATE.
Add real-time interaction analytics to the recommendations engine. Build a GET /api/analytics/interactions route that returns: total interactions by type (bar chart data), top 10 most-interacted items with interaction counts, most active users by interaction count, interactions over time (daily counts for last 30 days as line chart data), and coverage metric (% of items that have at least 1 interaction). Display this on a React analytics page with four recharts visualizations: bar chart for type breakdown, bar chart for top items, line chart for daily trend, and a stat card showing catalog coverage percentage.
Frequently asked questions
How many items can this handle before it gets too slow?
With Replit's free PostgreSQL and the pre-computed similarity approach, 10K items works well. The nightly similarity job computes up to 50M pairs (10K^2 / 2) — this takes 10-30 minutes but runs offline. The recommendation API always reads from the cache so users never see slowdowns.
What's the cold-start problem and how is it handled?
New users have no interaction history, so there's nothing to base personalized recommendations on. The engine handles this by returning globally popular items (sorted by unique_users from the interactions table) for users with zero or very few interactions. Once they have 3+ interactions, the nightly job will build their personalized cache.
Can I use this to recommend content (articles, videos) instead of products?
Yes — the engine is content-agnostic. The items table has a metadata JSONB column for any attributes. Track 'view' interactions when users read articles, 'rating' when they rate videos. The collaborative filtering algorithm works identically regardless of what type of item is being recommended.
What Replit plan do I need?
Free tier for the main API (Autoscale). The nightly Scheduled Deployment also runs on Free tier. For large catalogs (5K+ items), the similarity calculation benefits from Reserved VM to avoid the 30-second cold start of the job triggering at 2 AM.
How often should I run the similarity recalculation?
Nightly is a good default for most apps. If your catalog changes frequently (new items added hourly) or your users are highly active, run it every 6 hours. For very small or slowly-changing catalogs, weekly is sufficient.
Should I deploy on Autoscale or Reserved VM?
Autoscale for the main recommendation API (cache reads are fast and handle cold starts). Reserved VM for the nightly batch job — you want it to start reliably at 2 AM without cold-start delays causing timeouts on the similarity computation.
Can RapidDev help build a recommendations engine for my platform?
Yes. RapidDev has built recommendation systems for e-commerce and content platforms across 600+ client projects. We can extend this engine with content-based filtering, A/B testing of recommendation strategies, and real-time similarity updates. Free consultation available.
How is this different from just sorting by 'most popular'?
Popularity sorting shows the same items to everyone — the top sellers dominate. Collaborative filtering personalizes results based on what similar users liked. A user who buys niche products gets recommendations tailored to their taste, not just bestsellers. Personalization increases click-through rates by 2-5x over simple popularity sorting in typical e-commerce apps.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation