Skip to main content
RapidDev - Software Development Agency

How to Build a Recommendations Engine with Replit

Build a collaborative filtering recommendations engine in Replit using Express and PostgreSQL in 2-4 hours. You'll track user interactions, compute item similarity scores with a nightly Scheduled Deployment, and serve personalized recommendations via a fast cache layer — a lightweight Netflix-style system without any ML infrastructure.

What you'll build

  • Interaction tracking API that records views, clicks, purchases, ratings, and wishlist events with configurable weights
  • Item-based collaborative filtering using cosine similarity calculated as a PostgreSQL function
  • Pre-computed similarity_scores table updated nightly by a Scheduled Deployment batch job
  • Recommendation cache per user that stores pre-ranked suggestions for instant retrieval
  • Cold start fallback that returns popularity-based recommendations for new users with no interactions
  • React recommendations widget with match score display and 'Because you liked X' explanation tooltip
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced15 min read2-4 hoursReplit FreeApril 2026RapidDev Engineering Team
TL;DR

Build a collaborative filtering recommendations engine in Replit using Express and PostgreSQL in 2-4 hours. You'll track user interactions, compute item similarity scores with a nightly Scheduled Deployment, and serve personalized recommendations via a fast cache layer — a lightweight Netflix-style system without any ML infrastructure.

What you're building

A recommendations engine adds personalization to any app with a catalog — a product store, a content library, a job board, or a recipe collection. Instead of showing the same items to every user, the engine learns from behavior: users who interacted with item A also tend to interact with item B, so when a new user engages with A, suggest B. This is collaborative filtering — the same technique behind Amazon's 'Customers also bought' and Netflix's recommendation rows.

Replit Agent builds the full backend with five PostgreSQL tables: items, users, interactions, similarity_scores, and a recommendation_cache. The architecture separates concerns clearly: the interactions API is fast (just an insert), the similarity calculation is slow but runs offline as a Scheduled Deployment, and recommendation lookups are instant because they hit the pre-built cache. This pattern handles thousands of items without specialized ML infrastructure — just PostgreSQL functions.

The most technically interesting piece is the cosine similarity formula implemented as a PostgreSQL function. For two items A and B, similarity = SUM(wa * wb) / (SQRT(SUM(wa^2)) * SQRT(SUM(wb^2))), where w is each user's interaction weight for that item. This is O(N^2) across items — expensive to compute live, but cheap to read from the pre-computed similarity_scores table. The nightly batch job recalculates only the pairs that changed based on new interactions.

Final result

A recommendations engine API with interaction tracking, nightly similarity recalculation, and per-user recommendation cache — deployable as a standalone microservice or embedded in your existing Replit app.

Tech stack

ReplitIDE & Hosting
ExpressBackend Framework
PostgreSQLDatabase
Drizzle ORMDatabase ORM
Replit AuthAuth

Prerequisites

  • A Replit account (Free tier is sufficient for under 10K items)
  • An existing app or catalog you want to add recommendations to (products, articles, videos, etc.)
  • Basic understanding of what collaborative filtering means (users who liked X also liked Y)
  • No external API keys needed

Build steps

1

Scaffold the project with Replit Agent

Create a new Replit App and paste this prompt. Agent builds the complete recommendations backend with all five tables and core API routes.

prompt.txt
1// Build a recommendations engine with Express and PostgreSQL using Drizzle ORM.
2//
3// Tables:
4// 1. items: id serial primary key, name text not null, category text,
5// tags text[] (PostgreSQL array), metadata jsonb, created_at timestamp default now()
6// 2. users: id serial primary key, user_id text not null unique,
7// preferences jsonb, created_at timestamp default now()
8// 3. interactions: id serial primary key, user_id integer references users not null,
9// item_id integer references items not null, type text not null
10// (enum: view/click/purchase/rating/wishlist), value numeric (1-5 for ratings),
11// weight numeric default 1.0, created_at timestamp default now()
12// 4. similarity_scores: id serial primary key, item_a_id integer references items not null,
13// item_b_id integer references items not null, score numeric not null (0 to 1),
14// algorithm text default 'cosine', calculated_at timestamp default now(),
15// unique(item_a_id, item_b_id, algorithm)
16// 5. recommendation_cache: id serial primary key, user_id integer references users not null unique,
17// recommendations jsonb not null (array of {item_id, score, reason}),
18// calculated_at timestamp default now()
19//
20// Routes:
21// POST /api/interactions (record interaction — no auth required, just user_id in body)
22// GET /api/recommendations/:userId (get personalized recommendations, uses cache)
23// GET /api/items/:id/similar (get similar items from similarity_scores)
24// GET /api/popular (globally popular items fallback)
25// POST /api/admin/recalculate (trigger similarity matrix rebuild — requires ADMIN_KEY header)
26//
27// Use Replit Auth for admin routes. Bind to 0.0.0.0:3000.

Pro tip: Seed the items table with at least 20 test items before testing recommendations. Without a catalog, there's nothing to recommend. Ask Agent: 'Insert 20 sample items into the items table with varied categories and tags.'

Expected result: Running Express app with all five tables. GET /api/popular returns an empty array until interactions are recorded.

2

Build the interaction recording API

The interactions endpoint is called every time a user does something with an item. Different interaction types have different weights — a purchase signals much stronger preference than a view.

server/routes/interactions.js
1const express = require('express');
2const { db } = require('../db');
3const { users, interactions } = require('../schema');
4const { eq } = require('drizzle-orm');
5const { withDbRetry } = require('../lib/retryDb');
6
7const router = express.Router();
8
9// Interaction weights — higher = stronger signal
10const WEIGHTS = {
11 view: 0.5,
12 click: 1.0,
13 wishlist: 1.5,
14 rating: 2.0,
15 purchase: 3.0,
16};
17
18router.post('/api/interactions', express.json(), async (req, res) => {
19 const { userId, itemId, type, value } = req.body;
20
21 if (!userId || !itemId || !type) {
22 return res.status(400).json({ error: 'userId, itemId, and type are required' });
23 }
24 if (!WEIGHTS[type]) {
25 return res.status(400).json({ error: `Invalid type. Must be: ${Object.keys(WEIGHTS).join(', ')}` });
26 }
27
28 // Upsert the user record (create if first interaction)
29 await withDbRetry(() =>
30 db.insert(users)
31 .values({ userId: String(userId) })
32 .onConflictDoNothing()
33 );
34
35 // Get internal user ID
36 const [user] = await db.select().from(users)
37 .where(eq(users.userId, String(userId))).limit(1);
38
39 const weight = WEIGHTS[type];
40
41 await withDbRetry(() =>
42 db.insert(interactions).values({
43 userId: user.id,
44 itemId: parseInt(itemId),
45 type,
46 value: value ? parseFloat(value) : null,
47 weight,
48 })
49 );
50
51 return res.json({ recorded: true });
52});
53
54module.exports = router;

Pro tip: Call POST /api/interactions from your main app whenever a user views, clicks, or purchases an item. Send it as a fire-and-forget request so tracking never slows down the user experience. In the browser: fetch('/api/interactions', {method:'POST', ...}).catch(() => {})

Expected result: POST /api/interactions with {userId: 'u1', itemId: 5, type: 'view'} returns {recorded: true} and inserts into the interactions table with weight 0.5.

3

Build the similarity calculation batch job

The similarity matrix is pre-computed nightly by a Scheduled Deployment. This PostgreSQL function calculates cosine similarity between all item pairs based on their interaction vectors.

server/jobs/calculateSimilarity.js
1// server/jobs/calculateSimilarity.js — run as a Scheduled Deployment nightly
2const { db } = require('../db');
3const { similarityScores } = require('../schema');
4const { sql } = require('drizzle-orm');
5
6async function calculateSimilarityMatrix() {
7 console.log('[similarity] Starting similarity matrix calculation...');
8
9 // Calculate cosine similarity between all item pairs
10 // that share at least 3 common users (reduces noise)
11 await db.execute(sql`
12 INSERT INTO similarity_scores (item_a_id, item_b_id, score, algorithm, calculated_at)
13 SELECT
14 a.item_id AS item_a_id,
15 b.item_id AS item_b_id,
16 -- Cosine similarity: SUM(wa*wb) / (SQRT(SUM(wa^2)) * SQRT(SUM(wb^2)))
17 SUM(a.weight * b.weight) /
18 (SQRT(SUM(a.weight * a.weight)) * SQRT(SUM(b.weight * b.weight))) AS score,
19 'cosine' AS algorithm,
20 NOW() AS calculated_at
21 FROM interactions a
22 JOIN interactions b
23 ON a.user_id = b.user_id
24 AND a.item_id < b.item_id -- only compute each pair once
25 GROUP BY a.item_id, b.item_id
26 HAVING COUNT(DISTINCT a.user_id) >= 2 -- at least 2 shared users
27 AND SUM(a.weight * a.weight) > 0
28 AND SUM(b.weight * b.weight) > 0
29 ON CONFLICT (item_a_id, item_b_id, algorithm)
30 DO UPDATE SET
31 score = EXCLUDED.score,
32 calculated_at = EXCLUDED.calculated_at
33 `);
34
35 // Update recommendation cache for all active users
36 const usersResult = await db.execute(sql`SELECT id, user_id FROM users LIMIT 1000`);
37 for (const user of usersResult.rows) {
38 await buildUserRecommendations(user.id, user.user_id);
39 }
40
41 console.log('[similarity] Done.');
42}
43
44async function buildUserRecommendations(internalUserId, externalUserId) {
45 // Get items the user has already interacted with
46 const userItems = await db.execute({
47 sql: 'SELECT DISTINCT item_id FROM interactions WHERE user_id = $1',
48 params: [internalUserId],
49 });
50 const seen = new Set(userItems.rows.map(r => r.item_id));
51
52 if (seen.size === 0) return; // Skip cold-start users
53
54 // Find similar items the user hasn't seen yet
55 const recs = await db.execute({
56 sql: `
57 SELECT
58 CASE WHEN ss.item_a_id = ANY($1::int[]) THEN ss.item_b_id ELSE ss.item_a_id END AS rec_item_id,
59 MAX(ss.score) AS score,
60 MIN(CASE WHEN ss.item_a_id = ANY($1::int[]) THEN ss.item_a_id ELSE ss.item_b_id END) AS because_item_id
61 FROM similarity_scores ss
62 WHERE (ss.item_a_id = ANY($1::int[]) OR ss.item_b_id = ANY($1::int[]))
63 AND NOT (ss.item_a_id = ANY($1::int[]) AND ss.item_b_id = ANY($1::int[]))
64 GROUP BY rec_item_id
65 ORDER BY score DESC
66 LIMIT 20
67 `,
68 params: [Array.from(seen)],
69 });
70
71 const recommendations = recs.rows.map(r => ({
72 item_id: r.rec_item_id,
73 score: parseFloat(r.score),
74 because_item_id: r.because_item_id,
75 }));
76
77 await db.execute({
78 sql: `
79 INSERT INTO recommendation_cache (user_id, recommendations, calculated_at)
80 VALUES ($1, $2, NOW())
81 ON CONFLICT (user_id) DO UPDATE SET
82 recommendations = EXCLUDED.recommendations,
83 calculated_at = EXCLUDED.calculated_at
84 `,
85 params: [internalUserId, JSON.stringify(recommendations)],
86 });
87}
88
89calculateSimilarityMatrix()
90 .then(() => process.exit(0))
91 .catch(err => { console.error('[similarity] Error:', err); process.exit(1); });

Pro tip: In Replit's Publish pane, create a Scheduled Deployment for this job set to run at 2:00 AM daily. Set the entrypoint to server/jobs/calculateSimilarity.js. The first run will be slow if you have many items — subsequent runs are faster because of the ON CONFLICT DO UPDATE upsert.

Expected result: After the job runs, the similarity_scores table fills with cosine similarity values between item pairs. recommendation_cache shows pre-ranked recommendations per user.

4

Build the recommendations retrieval API

The recommendation endpoint serves from the pre-built cache for fast response. For cold-start users with no interactions, it falls back to the globally popular items.

server/routes/recommendations.js
1const express = require('express');
2const { db } = require('../db');
3const { users, recommendationCache, items, interactions } = require('../schema');
4const { eq, sql, desc } = require('drizzle-orm');
5
6const router = express.Router();
7
8router.get('/api/recommendations/:userId', async (req, res) => {
9 const { userId } = req.params;
10
11 // Find internal user record
12 const [user] = await db.select().from(users)
13 .where(eq(users.userId, userId)).limit(1);
14
15 if (!user) {
16 // New user — return popular items (cold start)
17 return servePopular(res);
18 }
19
20 // Check cache freshness (accept if calculated in last 24 hours)
21 const [cache] = await db.select().from(recommendationCache)
22 .where(eq(recommendationCache.userId, user.id)).limit(1);
23
24 if (cache && new Date() - new Date(cache.calculatedAt) < 24 * 60 * 60 * 1000) {
25 return res.json({
26 userId,
27 source: 'cache',
28 recommendations: cache.recommendations,
29 });
30 }
31
32 // Cache stale or missing — return popular as fallback until next nightly run
33 return servePopular(res);
34});
35
36async function servePopular(res) {
37 const popular = await db.execute(sql`
38 SELECT item_id, COUNT(*) AS interaction_count, COUNT(DISTINCT user_id) AS user_count
39 FROM interactions
40 GROUP BY item_id
41 ORDER BY user_count DESC, interaction_count DESC
42 LIMIT 10
43 `);
44
45 return res.json({
46 source: 'popular',
47 recommendations: popular.rows.map(r => ({
48 item_id: r.item_id,
49 score: null,
50 reason: 'Popular with other users',
51 })),
52 });
53}
54
55router.get('/api/items/:id/similar', async (req, res) => {
56 const itemId = parseInt(req.params.id);
57 const similar = await db.execute({
58 sql: `
59 SELECT
60 CASE WHEN item_a_id = $1 THEN item_b_id ELSE item_a_id END AS similar_item_id,
61 score
62 FROM similarity_scores
63 WHERE item_a_id = $1 OR item_b_id = $1
64 ORDER BY score DESC
65 LIMIT 8
66 `,
67 params: [itemId],
68 });
69 return res.json({ itemId, similar: similar.rows });
70});
71
72module.exports = router;

Pro tip: Enrich the recommendation response by joining item IDs against the items table before returning. The cache stores just IDs and scores for compactness — expand them in the route handler to return full item objects with name, image_url, and category.

Expected result: GET /api/recommendations/u1 returns personalized recommendations from cache. A new user gets popular items instead. GET /api/items/5/similar returns the 8 most similar items.

5

Build the React recommendations widget

Ask Agent to build the frontend recommendations widget and admin dashboard that shows the similarity matrix heatmap.

prompt.txt
1// Ask Agent to build the React frontend with this prompt:
2// Build a React recommendations UI with two components:
3//
4// 1. RecommendationsWidget component:
5// - Accept userId prop
6// - Fetch GET /api/recommendations/:userId on mount
7// - Display a horizontal scrollable row of item cards
8// - Each card shows: item image (from metadata.image_url), item name, category badge,
9// match score as percentage (score * 100, rounded)
10// - If recommendations have because_item_id, show a tooltip on hover:
11// 'Because you interacted with [item name]'
12// - Show 'Popular picks' heading if source='popular' (cold start state)
13//
14// 2. Admin Dashboard page at /admin/recommendations:
15// - Require ADMIN_KEY header (store in localStorage)
16// - Fetch GET /api/admin/recalculate stats: count of items, interactions, similarity pairs
17// - Show three stat cards: Total Items, Total Interactions, Similarity Pairs Computed
18// - Show a 'Recalculate Now' button that calls POST /api/admin/recalculate
19// - Show a heatmap-style similarity matrix: sample 20 random items,
20// show a 20x20 grid where each cell color = similarity score (white=0, blue=1)
21// - Show recent calculation timestamp
22//
23// Also add a GET /api/items route that returns items with their recommendation stats
24// (interaction_count, unique_users from joining interactions).

Expected result: The RecommendationsWidget renders a horizontal card row with match scores. The admin dashboard shows system stats and a colored similarity matrix grid.

Complete code

server/routes/interactions.js
1const express = require('express');
2const { db } = require('../db');
3const { users, interactions, items } = require('../schema');
4const { eq } = require('drizzle-orm');
5const { withDbRetry } = require('../lib/retryDb');
6
7const router = express.Router();
8
9const INTERACTION_WEIGHTS = {
10 view: 0.5,
11 click: 1.0,
12 wishlist: 1.5,
13 rating: 2.0,
14 purchase: 3.0,
15};
16
17router.post('/api/interactions', express.json(), async (req, res) => {
18 const { userId, itemId, type, value } = req.body;
19 if (!userId || !itemId || !type) {
20 return res.status(400).json({ error: 'userId, itemId, and type are required' });
21 }
22 if (!INTERACTION_WEIGHTS[type]) {
23 return res.status(400).json({
24 error: `Invalid type. Valid: ${Object.keys(INTERACTION_WEIGHTS).join(', ')}`,
25 });
26 }
27 // Verify item exists
28 const [item] = await db.select({ id: items.id })
29 .from(items).where(eq(items.id, parseInt(itemId))).limit(1);
30 if (!item) return res.status(404).json({ error: 'Item not found' });
31
32 // Upsert user
33 await withDbRetry(() =>
34 db.insert(users).values({ userId: String(userId) }).onConflictDoNothing()
35 );
36 const [user] = await db.select().from(users)
37 .where(eq(users.userId, String(userId))).limit(1);
38
39 await withDbRetry(() =>
40 db.insert(interactions).values({
41 userId: user.id,
42 itemId: item.id,
43 type,
44 value: value != null ? parseFloat(value) : null,
45 weight: INTERACTION_WEIGHTS[type],
46 })
47 );
48
49 return res.json({ recorded: true });
50});
51
52router.get('/api/popular', async (req, res) => {
53 const result = await db.execute({
54 sql: `
55 SELECT i.id, i.name, i.category, i.tags, i.metadata,
56 COUNT(int.id) AS interaction_count,
57 COUNT(DISTINCT int.user_id) AS user_count
58 FROM items i
59 LEFT JOIN interactions int ON i.id = int.item_id
60 GROUP BY i.id
61module.exports = router;

Customization ideas

Content-based filtering fallback

For items with very few interactions, use content-based similarity instead: compare item tags arrays using the Jaccard index (intersection / union). This ensures new items get recommended before they accumulate behavioral data.

Recency decay on interaction weights

Multiply interaction weights by a recency factor: weight * EXP(-days_since_interaction / 30). Interactions from 30 days ago count half as much as today's. This keeps recommendations fresh as user tastes evolve.

Recommendation explanation API

Build GET /api/recommendations/:userId/explain that returns the top 3 interactions that drove each recommendation — 'Because you purchased Item A (score: 0.82) and viewed Item B (score: 0.71)'. Shows users why they're seeing suggestions.

Category diversity enforcement

Add a diversity step to the recommendation builder: after ranking by score, ensure at most 40% of recommendations are from the same category. This prevents a user who views one product category from getting recommendations exclusively from that category.

Common pitfalls

Pitfall: Calculating cosine similarity in real time on every recommendation request

How to avoid: Always pre-compute and store similarity_scores via the nightly Scheduled Deployment. Recommendation API reads from similarity_scores and recommendation_cache — both are indexed lookups.

Pitfall: Not handling the cold-start problem for new users

How to avoid: Always check if the user has interactions before querying the cache. Return popular items (GET /api/popular) as the fallback for users with fewer than 3 interactions.

Pitfall: Recommending items the user already interacted with

How to avoid: The recommendation builder queries exclude items already in the user's interaction history using WHERE NOT (item_id = ANY(seen_item_ids)).

Pitfall: Running the similarity job without a PostgreSQL retry wrapper

How to avoid: Wrap the first DB call in withDbRetry() to handle the cold-start connection failure. The retry with 250ms delay is enough for the database to wake up.

Best practices

  • Pre-compute similarity_scores nightly via Scheduled Deployment — never calculate cosine similarity on the request path.
  • Use the recommendation_cache table for instant recommendation serving — check if cache is less than 24 hours old before falling back to real-time computation.
  • Weight different interaction types differently: purchases (3.0) >> ratings (2.0) >> wishlist (1.5) >> click (1.0) >> view (0.5). Purchase signals are 6x more valuable than views.
  • Keep the item catalog under 10K items with Replit's free PostgreSQL. The O(N^2) similarity calculation becomes slow above this — move to a dedicated vector database for larger catalogs.
  • Use Drizzle Studio (Database tab) to inspect similarity_scores after the first batch job run — verify that similar items are being grouped logically before wiring up the frontend.
  • Add a minimum shared-users threshold (at least 2) to the similarity calculation to filter out coincidental single-user overlap that would produce artificially high similarity scores.
  • Deploy on Autoscale for the main API (recommendations serve from cache = fast). The nightly batch job runs as a separate Scheduled Deployment on Reserved VM for reliability.

AI prompts to try

Copy these prompts to build this project faster.

ChatGPT Prompt

I'm building a collaborative filtering recommendations engine with Express and PostgreSQL. I have an interactions table (user_id integer, item_id integer, weight numeric) and a similarity_scores table (item_a_id, item_b_id, score). Help me write a PostgreSQL query that calculates the cosine similarity between all item pairs using the interaction weight matrix. The query should: only consider pairs with at least 2 shared users, use the formula SUM(wa*wb)/(SQRT(SUM(wa^2))*SQRT(SUM(wb^2))), and insert results into similarity_scores using INSERT ... ON CONFLICT DO UPDATE.

Build Prompt

Add real-time interaction analytics to the recommendations engine. Build a GET /api/analytics/interactions route that returns: total interactions by type (bar chart data), top 10 most-interacted items with interaction counts, most active users by interaction count, interactions over time (daily counts for last 30 days as line chart data), and coverage metric (% of items that have at least 1 interaction). Display this on a React analytics page with four recharts visualizations: bar chart for type breakdown, bar chart for top items, line chart for daily trend, and a stat card showing catalog coverage percentage.

Frequently asked questions

How many items can this handle before it gets too slow?

With Replit's free PostgreSQL and the pre-computed similarity approach, 10K items works well. The nightly similarity job computes up to 50M pairs (10K^2 / 2) — this takes 10-30 minutes but runs offline. The recommendation API always reads from the cache so users never see slowdowns.

What's the cold-start problem and how is it handled?

New users have no interaction history, so there's nothing to base personalized recommendations on. The engine handles this by returning globally popular items (sorted by unique_users from the interactions table) for users with zero or very few interactions. Once they have 3+ interactions, the nightly job will build their personalized cache.

Can I use this to recommend content (articles, videos) instead of products?

Yes — the engine is content-agnostic. The items table has a metadata JSONB column for any attributes. Track 'view' interactions when users read articles, 'rating' when they rate videos. The collaborative filtering algorithm works identically regardless of what type of item is being recommended.

What Replit plan do I need?

Free tier for the main API (Autoscale). The nightly Scheduled Deployment also runs on Free tier. For large catalogs (5K+ items), the similarity calculation benefits from Reserved VM to avoid the 30-second cold start of the job triggering at 2 AM.

How often should I run the similarity recalculation?

Nightly is a good default for most apps. If your catalog changes frequently (new items added hourly) or your users are highly active, run it every 6 hours. For very small or slowly-changing catalogs, weekly is sufficient.

Should I deploy on Autoscale or Reserved VM?

Autoscale for the main recommendation API (cache reads are fast and handle cold starts). Reserved VM for the nightly batch job — you want it to start reliably at 2 AM without cold-start delays causing timeouts on the similarity computation.

Can RapidDev help build a recommendations engine for my platform?

Yes. RapidDev has built recommendation systems for e-commerce and content platforms across 600+ client projects. We can extend this engine with content-based filtering, A/B testing of recommendation strategies, and real-time similarity updates. Free consultation available.

How is this different from just sorting by 'most popular'?

Popularity sorting shows the same items to everyone — the top sellers dominate. Collaborative filtering personalizes results based on what similar users liked. A user who buys niche products gets recommendations tailored to their taste, not just bestsellers. Personalization increases click-through rates by 2-5x over simple popularity sorting in typical e-commerce apps.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help building your app?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.