Exploring vector-based search in Databases for PostgreSQL using pgvector
pgVector is an extension for your IBM Cloud® Databases for PostgreSQL that adds support for vector similarity search. It allows you to store and query vectors (arrays of numbers) and to calculate their similarity to each other. This is particularly useful in machine learning applications where you might want to find vectors (for instance, image features or word embeddings) that are similar to a given vector.
pgVector is upported on PostgreSQL version 15 and above.
Defining and managing vector columns
To use pgVector to store and perform similarity searches on vector data in Databases for PostgreSQL, perform the following steps:
-
Use the query below to enable the extension:
CREATE EXTENSION pg_vector;
-
You can create a table with a vector column by specifying the dimension of the vectors to be stored, as in the following query:
eg: exampledb=# CREATE TABLE sample_image (id SERIAL PRIMARY KEY, embedded_vector vector(5)); CREATE TABLE
-
Insert data entries and fetch them using a SELECT statement.
eg: exampledb=# INSERT INTO sample_image (embedded_vector) values ('[1,2,3,4,5]'); INSERT 0 1 exampledb=# INSERT INTO sample_image (embedded_vector) values (array[0.1, 0.2, 0.3, 0.4, 0.5]); INSERT 0 1 exampledb=# select * from sample_image; id | embedded_vector ----+----------------------- 1 | [1,2,3,4,5] 2 | [0.1,0.2,0.3,0.4,0.5] (2 rows)
-
To improve search performance, create an index on the vector column.
Hierarchical Navigable Small World (HNSW) and Inverse Vertex File Flat (IVFFlat) are indexing methods used in the context of vector similarity search, commonly employed in applications involving machine learning, data mining, or information retrieval.
exampledb=# CREATE INDEX sample_image_embedded_vector_idx on sample_image using hnsw(embedded_vector vector_l2_ops); CREATE INDEX
-
There are various operators available in pgvector that can help you perform nearest neighbor searches efficiently. For more information on supported operators and usage examples, see the pgvector documentation {: external}.
exampledb=# select * from sample_image ORDER BY embedded_vector <=>'[0.1,0.2,0.3,0.4,0.5]'::vector ; id | embedded_vector ----+----------------------- 2 | [0.1,0.2,0.3,0.4,0.5] 1 | [1,2,3,4,5] (2 rows)