PostgreSQL integration with AI (Vector Search in PostgreSQL)

Based on Ignite 20225 – LAB515 Build Advanced Ai Agents with PostgreSQL.

Source Code: https://github.com/MariuszFerdyn/ignite25-LAB515-build-advanced-ai-agents-with-postgresql.

Transcript:

Source code: https://github.com/MariuszFerdyn/ignite25-LAB515-build-advanced-ai-agents-with-postgresql

— ============================================================================
— PART 1: Initialize Database and Verify Data
— ============================================================================

— Load initial dataset from SQL script
\i ./Scripts/initialize_dataset.sql;

— Enable auto-formatting for better readability in psql
\x auto

— Verify that data was loaded correctly – show first 5 case names
SELECT name FROM cases LIMIT 5;

— ============================================================================
— PART 2: Install and Configure Azure AI Extension
— ============================================================================

— Check which Azure extensions are available in the server
SHOW azure.extensions;

— Install the azure_ai extension to enable Azure OpenAI integration
— This extension provides functions to interact with Azure OpenAI Service
CREATE EXTENSION IF NOT EXISTS azure_ai;

— Switch to PowerShell context to load environment variables
cd C:\Lab\Scripts\
.\get_env.ps1

— Configure Azure OpenAI endpoint URL
— Replace {AZURE_OPENAI_ENDPOINT} with your actual endpoint from Azure Portal
SELECT azure_ai.set_setting(‘azure_openai.endpoint’, ‘{AZURE_OPENAI_ENDPOINT}’);

— Configure Azure OpenAI API key for authentication
— Replace {AZURE_OPENAI_KEY} with your actual API key from Azure Portal
SELECT azure_ai.set_setting(‘azure_openai.subscription_key’, ‘{AZURE_OPENAI_KEY}’);

— Verify the endpoint configuration was saved correctly
SELECT azure_ai.get_setting(‘azure_openai.endpoint’);

— Verify the API key configuration was saved correctly
SELECT azure_ai.get_setting(‘azure_openai.subscription_key’);

— ============================================================================
— PART 3: Test Embeddings Generation
— ============================================================================

— Test embeddings creation with sample text
— text-embedding-3-small is the Azure OpenAI embedding model
— Show only first 100 characters of the resulting vector for preview
SELECT LEFT(azure_openai.create_embeddings(‘text-embedding-3-small’, ‘Sample text for PostgreSQL Lab’)::text, 100) AS vector_preview;

— ============================================================================
— PART 4: Traditional Keyword Search (for comparison)
— ============================================================================

— Perform traditional keyword-based search using ILIKE pattern matching
— This will miss semantically similar content that uses different words
SELECT id, name, opinion
FROM cases
WHERE opinion ILIKE ‘%Water leaking into the apartment from the floor above’;

— ============================================================================
— PART 5: Prepare Table for Vector Search
— ============================================================================

— Install pgvector extension to support vector data type and operations
CREATE EXTENSION IF NOT EXISTS vector;

— Add a new column to store embeddings (1536 dimensions for text-embedding-3-small)
ALTER TABLE cases ADD COLUMN opinions_vector vector(1536);

— Generate embeddings for all existing cases
— Concatenate name and first 8000 chars of opinion for better context
— max_attempts and retry_delay_ms handle API rate limiting gracefully
UPDATE cases
SET opinions_vector = azure_openai.create_embeddings(‘text-embedding-3-small’, name || LEFT(opinion, 8000), max_attempts => 5, retry_delay_ms => 500)::vector
WHERE opinions_vector IS NULL;

— ============================================================================
— PART 6: Create Vector Index for Efficient Similarity Search
— ============================================================================

— Install pg_diskann extension for high-performance vector indexing
CREATE EXTENSION IF NOT EXISTS pg_diskann;

— Create DiskANN index for fast cosine similarity searches
— This dramatically improves performance for large datasets
CREATE INDEX cases_cosine_diskann ON cases USING diskann(opinions_vector vector_cosine_ops);

— Verify embeddings were created – show one example vector
SELECT opinions_vector FROM cases LIMIT 1;

— ============================================================================
— PART 7: Semantic Search Using Vector Similarity
— ============================================================================

— Perform semantic search using cosine similarity (<=> operator)
— Find the 10 most semantically similar cases to the query
— This finds similar content even if exact words don’t match
SELECT
id, name
FROM
cases
ORDER BY opinions_vector <=> azure_openai.create_embeddings(‘text-embedding-3-small’, ‘Water leaking into the apartment from the floor above.’)::vector
LIMIT 10;

— Get the most similar case with full opinion text
— Shows the best semantic match to the query
SELECT
id, opinion
FROM cases
ORDER BY opinions_vector <=> azure_openai.create_embeddings(‘text-embedding-3-small’, ‘Water leaking into the apartment from the floor above.’)::vector
LIMIT 1;

— ============================================================================
— PART 8: Compare Semantic Search vs Keyword Search
— ============================================================================

— Re-run the traditional keyword search for comparison
— Notice how semantic search (above) may find more relevant results
— that don’t contain the exact phrase
SELECT id, name, opinion
FROM cases
WHERE opinion ILIKE ‘%Water leaking into the apartment from the floor above’;

Video:

PostgreSQL integration with AI (Vector Search in PostgreSQL)

Added to Cart