To integrate Replit with Google Cloud AI Platform (Vertex AI), create a service account with Vertex AI User permissions, download the JSON key file, store it in Replit Secrets as GOOGLE_CREDENTIALS_JSON, and use the Google Cloud client library to call Vertex AI prediction endpoints. For online prediction, send feature inputs to a deployed model endpoint and receive real-time inference results. Use Reserved VM for persistent model serving and Autoscale for batch-triggered jobs.
Vertex AI Prediction Endpoints from Replit
Google Cloud AI Platform — now unified under the Vertex AI brand — is Google's enterprise ML operations platform. It provides managed infrastructure for training models, evaluating experiments, and deploying trained models to prediction endpoints that scale automatically. For developers who have already trained and deployed a model on Vertex AI, the challenge is often building the application layer that calls the prediction endpoint and serves results to users. Replit is well-suited for this: a lightweight Python or Node.js server that receives user inputs, transforms them into the feature format your model expects, calls the Vertex AI prediction endpoint, and returns the results.
Vertex AI prediction endpoints expose an HTTP interface that accepts instance data (the input features your model was trained on) and return predictions (the model's output). For tabular models, instances are JSON objects with named feature fields. For image models, instances contain base64-encoded image data. For NLP models, instances contain text strings. The exact schema depends on how your model was deployed.
Authentication to Vertex AI requires a Google Cloud service account — a non-human identity that represents your application. The service account has a JSON key file that contains a private key used to generate short-lived access tokens. This key file must be treated as a highly sensitive secret: anyone who has it can call your Vertex AI endpoints and incur billing charges on your Google Cloud project. Store it in Replit Secrets as a JSON string, never in your source code or repository.
Vertex AI also provides access to Google's pre-built foundation models — Gemini for text, Imagen for images, and others — through the same prediction endpoint interface. If you need to call Gemini or other Google AI models from Replit, the setup process described here is identical.
Integration method
Google Cloud AI Platform integrates with Replit through the Vertex AI REST API and Python client library, authenticated by a service account JSON key stored in Replit Secrets. Your Replit server reads the service account credentials, initializes the Vertex AI client, and sends prediction requests to deployed model endpoints. The response contains model predictions that your application can use for real-time inference serving.
Prerequisites
- A Replit account with a Python or Node.js Repl ready
- A Google Cloud account with billing enabled and a project with Vertex AI API activated
- A trained model deployed to a Vertex AI prediction endpoint in your Google Cloud project
- Service account creation permissions in Google Cloud IAM (typically Project Editor or Owner)
- Python packages: google-cloud-aiplatform, flask; Node.js packages: axios, express, google-auth-library
Step-by-step guide
Create a Service Account and Download Credentials
Create a Service Account and Download Credentials
Google Cloud uses service accounts for machine-to-machine authentication. You need to create a service account, grant it the minimum permissions required to call your Vertex AI endpoint, and download its JSON key file. In the Google Cloud Console, navigate to IAM & Admin → Service Accounts. Click 'Create Service Account'. Give it a name like 'replit-vertex-ai' and a description. Click 'Create and continue'. On the permissions step, add the 'Vertex AI User' role (roles/aiplatform.user). This role grants permission to call prediction endpoints, list models, and read endpoint metadata. If you only need to call predictions (not train or manage models), 'Vertex AI User' is sufficient — do not use 'Vertex AI Admin' or 'Project Editor' which would grant unnecessary broad access. Click 'Done' to create the service account. Then click on the service account in the list, go to the 'Keys' tab, click 'Add Key' → 'Create new key', and select JSON format. Google Cloud downloads a JSON file to your computer. This file contains the private key for your service account. Also note your Google Cloud project ID (visible in the Console top bar) and your Vertex AI endpoint ID (found under Vertex AI → Endpoints in the Console). You will need both when calling the prediction API.
1// Example structure of a Google Cloud service account JSON key file2// (Never store this in code — paste the content into Replit Secrets)3{4 "type": "service_account",5 "project_id": "your-project-id",6 "private_key_id": "key-id",7 "private_key": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----\n",8 "client_email": "replit-vertex-ai@your-project-id.iam.gserviceaccount.com",9 "client_id": "...",10 "auth_uri": "https://accounts.google.com/o/oauth2/auth",11 "token_uri": "https://oauth2.googleapis.com/token"12}Pro tip: The JSON key file is downloaded once. If you lose it, you must create a new key — you cannot re-download existing keys. Store a copy in a secure password manager immediately after download.
Expected result: A service account exists in Google Cloud IAM with the Vertex AI User role. A JSON key file is downloaded to your local machine. You have your project ID and endpoint ID noted.
Store Service Account Credentials in Replit Secrets
Store Service Account Credentials in Replit Secrets
The service account JSON key file must be stored as a Replit Secret — never committed to version control or pasted into source code. The JSON key contains a private key that grants access to your Google Cloud project. Click the lock icon (🔒) in the Replit sidebar to open the Secrets panel. Add the following secrets: GOOGLE_CREDENTIALS_JSON: paste the entire contents of your service account JSON key file as a single string. The JSON may contain newlines — ensure the secret value is the complete JSON text. GOOGLE_CLOUD_PROJECT: your Google Cloud project ID (e.g., 'my-project-12345'). VERTEX_AI_ENDPOINT_ID: the numeric ID of your deployed Vertex AI endpoint (found in Vertex AI → Endpoints in Google Cloud Console). VERTEX_AI_LOCATION: the Google Cloud region where your endpoint is deployed (e.g., 'us-central1'). In your server code, read GOOGLE_CREDENTIALS_JSON, parse it as JSON with JSON.parse(process.env.GOOGLE_CREDENTIALS_JSON), and pass it to the Google authentication library. This avoids the need for a credentials file on disk, which would not persist reliably across Replit container restarts. Replit's Secret Scanner monitors for private key patterns. If GOOGLE_CREDENTIALS_JSON is accidentally included in source code, Replit will detect the BEGIN RSA PRIVATE KEY marker and alert you.
1# Python: Parse credentials from Replit Secrets2import os3import json45credentials_json = os.environ.get('GOOGLE_CREDENTIALS_JSON')6if not credentials_json:7 raise ValueError('GOOGLE_CREDENTIALS_JSON not set. Add it in Replit Secrets (lock icon 🔒).')89credentials_dict = json.loads(credentials_json)10print('Service account:', credentials_dict.get('client_email'))11print('Project:', os.environ.get('GOOGLE_CLOUD_PROJECT'))12print('Endpoint:', os.environ.get('VERTEX_AI_ENDPOINT_ID'))13print('Location:', os.environ.get('VERTEX_AI_LOCATION'))Pro tip: When pasting the JSON key into Replit Secrets, do not add extra quotes around the value. The secret value should be the raw JSON text starting with '{' — not a string containing JSON.
Expected result: GOOGLE_CREDENTIALS_JSON, GOOGLE_CLOUD_PROJECT, VERTEX_AI_ENDPOINT_ID, and VERTEX_AI_LOCATION are set in Replit Secrets. The verification script prints the service account email without errors.
Call Vertex AI Predictions from Python
Call Vertex AI Predictions from Python
The google-cloud-aiplatform Python library provides a high-level interface for calling Vertex AI prediction endpoints. Install it in the Replit Shell: pip install google-cloud-aiplatform flask. The library uses the google.oauth2.service_account.Credentials class to authenticate from a credentials dictionary. You parse GOOGLE_CREDENTIALS_JSON from environment variables and create credentials in memory — no temporary file needed. For online prediction, the aiplatform.Endpoint class provides a predict() method that accepts a list of instances (the input data for your model) and returns a Prediction object with the model's output. The exact structure of instances and predictions depends on your model's serving signature. The Flask server below creates a /predict endpoint that accepts POST requests with instance data, calls Vertex AI, and returns predictions. The model_init() function initializes the Vertex AI client once at startup using the service account credentials. IMPORTANT: The predict() call is synchronous and can take 100ms-2 seconds depending on model size and load. For high-traffic APIs, consider adding caching for repeated identical inputs or using Vertex AI's batch prediction for large volumes.
1# vertex_ai.py — Vertex AI prediction client for Replit (Python)2import os3import json4from flask import Flask, request, jsonify5from google.cloud import aiplatform6from google.oauth2 import service_account78# Load credentials from Replit Secrets9credentials_dict = json.loads(os.environ['GOOGLE_CREDENTIALS_JSON'])10credentials = service_account.Credentials.from_service_account_info(11 credentials_dict,12 scopes=['https://www.googleapis.com/auth/cloud-platform']13)1415PROJECT_ID = os.environ['GOOGLE_CLOUD_PROJECT']16LOCATION = os.environ.get('VERTEX_AI_LOCATION', 'us-central1')17ENDPOINT_ID = os.environ['VERTEX_AI_ENDPOINT_ID']1819# Initialize Vertex AI with service account credentials20aiplatform.init(21 project=PROJECT_ID,22 location=LOCATION,23 credentials=credentials24)2526# Load the endpoint once at startup27endpoint = aiplatform.Endpoint(28 endpoint_name=f'projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}'29)3031app = Flask(__name__)3233@app.route('/predict', methods=['POST'])34def predict():35 """Call Vertex AI online prediction endpoint."""36 body = request.get_json()37 instances = body.get('instances')38 39 if not instances:40 return jsonify({'error': 'instances field required in request body'}), 40041 42 if not isinstance(instances, list):43 instances = [instances] # Wrap single instance in list44 45 try:46 prediction = endpoint.predict(instances=instances)47 return jsonify({48 'predictions': prediction.predictions,49 'deployed_model_id': prediction.deployed_model_id,50 'model_display_name': prediction.model_display_name51 })52 except Exception as e:53 return jsonify({'error': str(e)}), 5005455@app.route('/health')56def health():57 return jsonify({'status': 'ok', 'project': PROJECT_ID, 'endpoint': ENDPOINT_ID})5859if __name__ == '__main__':60 app.run(host='0.0.0.0', port=3000)61 print(f'Vertex AI server ready. Endpoint: {ENDPOINT_ID}')Pro tip: For tabular models, instances are dictionaries with feature names as keys: [{'feature1': 1.5, 'feature2': 'category_A'}]. For text models: [{'content': 'text to classify'}]. Check your model's serving signature in Vertex AI Console for the exact instance schema.
Expected result: POST /predict with {"instances": [{"feature": "value"}]} returns model predictions from Vertex AI. The /health endpoint confirms the project and endpoint IDs are loaded correctly.
Call Vertex AI Predictions from Node.js
Call Vertex AI Predictions from Node.js
For Node.js Replit projects, call Vertex AI prediction endpoints using the REST API directly with a Google-issued access token. The google-auth-library package handles token generation from your service account credentials. Install packages in the Replit Shell: npm install google-auth-library axios express. The google-auth-library's GoogleAuth class reads credentials from a JSON object and generates short-lived Bearer tokens. These tokens are passed in the Authorization header of HTTP requests to the Vertex AI REST API. Token generation is handled by the library — it automatically refreshes tokens before they expire. The Vertex AI prediction REST endpoint URL follows this pattern: https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}:predict The request body is a JSON object with an 'instances' array. The response contains a 'predictions' array with one prediction per instance. For production deployments, the Node.js version below is equivalent to the Python version — choose based on your project's language. Both authenticate using the same service account credentials stored in GOOGLE_CREDENTIALS_JSON.
1// vertex_ai.js — Vertex AI prediction client for Replit (Node.js)2const { GoogleAuth } = require('google-auth-library');3const axios = require('axios');4const express = require('express');56const credentialsJson = process.env.GOOGLE_CREDENTIALS_JSON;7if (!credentialsJson) throw new Error('GOOGLE_CREDENTIALS_JSON not set. Add it in Replit Secrets (lock icon 🔒).');89const credentials = JSON.parse(credentialsJson);10const PROJECT_ID = process.env.GOOGLE_CLOUD_PROJECT;11const LOCATION = process.env.VERTEX_AI_LOCATION || 'us-central1';12const ENDPOINT_ID = process.env.VERTEX_AI_ENDPOINT_ID;1314// Initialize Google Auth client15const auth = new GoogleAuth({16 credentials,17 scopes: 'https://www.googleapis.com/auth/cloud-platform'18});1920const PREDICTION_URL = `https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:predict`;2122async function predictVertexAI(instances) {23 const client = await auth.getClient();24 const token = await client.getAccessToken();25 26 const response = await axios.post(PREDICTION_URL, { instances }, {27 headers: {28 Authorization: `Bearer ${token.token}`,29 'Content-Type': 'application/json'30 }31 });32 33 return response.data;34}3536const app = express();37app.use(express.json());3839app.post('/predict', async (req, res) => {40 let { instances } = req.body;41 if (!instances) return res.status(400).json({ error: 'instances required' });42 if (!Array.isArray(instances)) instances = [instances];43 44 try {45 const result = await predictVertexAI(instances);46 res.json(result);47 } catch (err) {48 const status = err.response?.status || 500;49 res.status(status).json({ error: err.response?.data?.error || err.message });50 }51});5253app.get('/health', (req, res) => {54 res.json({ status: 'ok', project: PROJECT_ID, endpoint: ENDPOINT_ID, location: LOCATION });55});5657app.listen(3000, '0.0.0.0', () => {58 console.log(`Vertex AI Node.js server running. Endpoint: ${ENDPOINT_ID}`);59});Pro tip: The GoogleAuth client automatically caches and refreshes access tokens. You do not need to manage token expiry manually — each call to client.getAccessToken() returns a valid token, refreshing it transparently when needed.
Expected result: POST /predict with instances array returns Vertex AI predictions. The Node.js server authenticates with the service account credentials and calls the correct regional Vertex AI endpoint.
Common use cases
Real-Time Model Prediction API
Build an Express or Flask server on Replit that acts as a prediction API: receive feature data from your app's frontend, transform it into the format your Vertex AI model expects, call the prediction endpoint, and return the result. This pattern decouples your application from Vertex AI and allows you to add caching, feature engineering, and response transformation.
Build a Flask prediction API that accepts JSON feature data from POST requests, calls a Vertex AI endpoint with the features, and returns the model's predicted class and confidence score. Store the service account JSON in Replit Secrets.
Copy this prompt to try it in Replit
Batch Inference Trigger
Create a Replit server that receives a list of items (product descriptions, customer records, images) and sends them to Vertex AI in batches, collecting predictions for each item. This is useful for triggering inference on new data uploads without requiring a continuously running batch job.
Build an API endpoint that accepts a list of text descriptions, sends them to a Vertex AI text classification endpoint, and returns a list of predictions with confidence scores for each item.
Copy this prompt to try it in Replit
ML Model Monitoring Dashboard
Build a dashboard that queries Vertex AI model monitoring metrics and prediction logs to track model performance over time. Your Replit server calls the Vertex AI API to fetch recent prediction requests, check for data drift alerts, and surface model health metrics in a custom UI.
Create a Node.js server that calls the Vertex AI API to list recent model evaluation metrics and prediction statistics for a deployed endpoint, then returns them as a JSON summary for a monitoring dashboard.
Copy this prompt to try it in Replit
Troubleshooting
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials
Cause: The google-cloud-aiplatform library is looking for Application Default Credentials (ADC) from a file path or environment variable, but GOOGLE_APPLICATION_CREDENTIALS environment variable is not set to a file path. Replit requires passing credentials as a dictionary from the Secret, not as a file.
Solution: Explicitly pass credentials to aiplatform.init() and the Endpoint constructor using service_account.Credentials.from_service_account_info(). Do not set GOOGLE_APPLICATION_CREDENTIALS to a file path — instead parse GOOGLE_CREDENTIALS_JSON and pass the resulting credentials object directly.
1# Correct pattern: parse JSON from env, create credentials object2import json3from google.oauth2 import service_account4from google.cloud import aiplatform56creds = service_account.Credentials.from_service_account_info(7 json.loads(os.environ['GOOGLE_CREDENTIALS_JSON']),8 scopes=['https://www.googleapis.com/auth/cloud-platform']9)10aiplatform.init(project=PROJECT_ID, location=LOCATION, credentials=creds)403 PermissionDenied: Permission 'aiplatform.endpoints.predict' denied
Cause: The service account does not have the Vertex AI User role, or the role was granted at the wrong level (e.g., on the service account itself rather than on the project).
Solution: In Google Cloud Console, go to IAM & Admin → IAM. Find your service account's email. Click the pencil icon to edit permissions. Verify 'Vertex AI User' (roles/aiplatform.user) is listed under Roles for the service account at the project level. If missing, add it and wait 1-2 minutes for IAM changes to propagate.
JSON parse error or unexpected token when loading GOOGLE_CREDENTIALS_JSON
Cause: The service account JSON was copied into Replit Secrets with extra quotes, truncation, or encoding issues. The JSON must be valid and complete — including the private key with its newlines preserved.
Solution: In Replit Secrets, click the GOOGLE_CREDENTIALS_JSON entry to edit. Delete the current value and re-paste the entire contents of your downloaded .json key file. Verify the value starts with '{"type":"service_account"' and ends with '}'. Test parsing in Replit Shell with: node -e "JSON.parse(process.env.GOOGLE_CREDENTIALS_JSON); console.log('OK')"
1# Test JSON parsing in Python Shell2import os, json3try:4 creds = json.loads(os.environ['GOOGLE_CREDENTIALS_JSON'])5 print('JSON valid. Service account:', creds.get('client_email'))6except json.JSONDecodeError as e:7 print('JSON invalid:', e)8 print('First 100 chars:', os.environ.get('GOOGLE_CREDENTIALS_JSON', '')[:100])Prediction returns 404 Not Found for the endpoint URL
Cause: The VERTEX_AI_ENDPOINT_ID or VERTEX_AI_LOCATION is incorrect. Vertex AI endpoint URLs are region-specific — an endpoint in us-central1 cannot be reached via the europe-west1 URL.
Solution: In Google Cloud Console, go to Vertex AI → Endpoints and click your endpoint. The detail page shows the full endpoint name including the correct project ID and location. Copy the location exactly (e.g., 'us-central1') and the numeric endpoint ID from the URL or the endpoint details panel.
1# Print the constructed URL to verify it is correct2PREDICTION_URL = f'https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}:predict'3print('Prediction URL:', PREDICTION_URL)4# Expected: https://us-central1-aiplatform.googleapis.com/v1/projects/my-project/locations/us-central1/endpoints/123456789:predictBest practices
- Store GOOGLE_CREDENTIALS_JSON in Replit Secrets (lock icon 🔒) as the complete JSON string — never write service account credentials to a file in your Replit project
- Grant the service account only the Vertex AI User role, not broader Project Editor or Owner roles — follow the principle of least privilege
- Parse credentials in memory using service_account.Credentials.from_service_account_info() rather than writing to a temp file, which may persist unexpectedly in Replit containers
- Initialize the Vertex AI client and endpoint object once at server startup rather than on each request to avoid repeated authentication overhead
- Add input validation before calling Vertex AI — verify instance schema, data types, and required fields match your model's serving signature to prevent prediction errors
- Cache prediction results for identical inputs using a short TTL (30-60 seconds) to reduce Vertex AI calls for repeated queries
- Deploy as Reserved VM for services that make frequent Vertex AI calls — Autoscale cold starts add 2-5 seconds latency on the first request after idle periods
- Monitor Vertex AI prediction costs in Google Cloud Console — online prediction is billed per node hour for the deployed endpoint, even when idle
Alternatives
OpenAI's API is simpler to integrate with key-only authentication and no service account setup, better for GPT-based text generation tasks versus custom-trained models on Vertex AI.
TensorFlow runs models locally in Replit without external API calls, better for smaller models where you want to avoid Vertex AI costs and network latency.
IBM Watson provides similar managed NLP and ML services with IAM API key authentication (simpler than service accounts), better for teams already in the IBM Cloud ecosystem.
Frequently asked questions
How do I connect Replit to Google Cloud Vertex AI?
Create a Google Cloud service account with the Vertex AI User role, download its JSON key file, and paste the entire JSON into Replit Secrets (lock icon 🔒) as GOOGLE_CREDENTIALS_JSON. In Python, use google-cloud-aiplatform and pass credentials via service_account.Credentials.from_service_account_info(). In Node.js, use google-auth-library to generate Bearer tokens for REST API calls.
How do I securely store Google service account credentials in Replit?
Click the lock icon (🔒) in the Replit sidebar, add a secret named GOOGLE_CREDENTIALS_JSON, and paste the complete JSON key file contents as the value. Access it in Python with json.loads(os.environ['GOOGLE_CREDENTIALS_JSON']) and in Node.js with JSON.parse(process.env.GOOGLE_CREDENTIALS_JSON). Never write the JSON to a file or hardcode it in source code.
Can I call Google Gemini or other foundation models from Replit using this method?
Yes. Google Gemini, Imagen, and other Vertex AI foundation models use the same service account authentication and prediction endpoint pattern. The main difference is the endpoint URL — foundation models use publisher model endpoints rather than custom-trained model endpoints. The google-cloud-aiplatform library has GenerativeModel and ImageGenerationModel classes that wrap this.
What deployment type should I use for a Vertex AI integration on Replit?
Use Reserved VM for services that handle frequent real-time predictions — Vertex AI cold starts add latency, and Autoscale's container warm-up adds additional delay on the first request after idle. If your prediction service is low-traffic and latency tolerance is high (several seconds is acceptable), Autoscale works and is more cost-effective.
Why is my Vertex AI endpoint returning 403 even though the service account has the right role?
IAM permission changes take 1-2 minutes to propagate. If you just added the Vertex AI User role, wait a moment and retry. Also verify the role was added at the project level (not just on the service account itself) — go to IAM & Admin → IAM, find the service account email, and confirm 'Vertex AI User' appears in the roles column.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation