@hummusonrails ‹#›

Sometimes, you really love your chair @hummusonrails ‹#›

@hummusonrails ‹#›

I’m looking for my old chair, it’s leg part went up when I pressed a button but not too high! The leather was comfortable but not too expensive. It was brown, but like an earthy kind of brown, not brown brown. @hummusonrails ‹#›

Will Joey find his chair? @hummusonrails ‹#›

61% say that if they don’t find what they’re looking for within , they’ll go to another site. 5 seconds @hummusonrails ‹#›

@hummusonrails ‹#›

No brown chairs No earthy brown chairs No manual recliners No automatic recliners … at least there were chairs? @hummusonrails ‹#›

But, this is not really about @hummusonrails ‹#›

@hummusonrails ‹#›

@hummusonrails ‹#›

Hello, my name is Ben. @hummusonrails

What we’re going to talk about • What’s a vector? (i.e. Demystifying the jargon) • (i.e. How vector search works. What’s it got to do with GenAI?) • When would you use it? • Ben, show me an example! @hummusonrails ‹#›

What’s a vector? ‹#›

Warning: It involves NUMBER ‹#›

A vector is a list of numbers that represents words, phrases, images and more. Each number in the list holds some information about the meaning of the text. @hummusonrails ‹#›

A vector embedding turns words or sentences into short, dense lists of numbers. Embeddings are compact and efficient, making tasks like searching and organizing text easier. “lowerdimensional” @hummusonrails ‹#›

I’m looking for my old chair, it’s leg part went up when I pressed a button but not too high! The leather was comfortable but not too expensive. It was brown, but like an earthy kind of brown, not brown brown. @hummusonrails ‹#›

Moments later using an embeddings API… @hummusonrails ‹#›

@hummusonrails ‹#›

Your application data transforms from… @hummusonrails ‹#›

@hummusonrails ‹#›

Into vector representations, unlocking more accurate search, content recommendations and document comparisons @hummusonrails ‹#›

Is this really magic? It’s time to discuss EMBEDDING MODELS @hummusonrails ‹#›

Joey [ Needs A New Embedding Model ] 2.5, 1.1, 2.0, 0, 2.0, 2.1 … Chair @hummusonrails ‹#›

What are these embedding models? @hummusonrails ‹#›

Neural Networks @hummusonrails ‹#›

BERT Bidirectional Encoder Representations from Transformers Bidirectional Reads what comes before and after a word for full context Trained on Wikipedia Pre-trained on a vast amount of data Top NLP benchmarks Achieved top scores for comprehension, common sense, sentiment analysis, and more ‹#› @hummusonrails Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2024. All rights reserved.

GPT Generative PreTrained Transformer Autoregressive Predicts what word comes next based on previous words ‹#› @hummusonrails Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2024. All rights reserved.

How vector search works ‹#›

LET’S TALK ABOUT That’s a little too close for comfort @hummusonrails ‹#›

Closeness in vector search refers to how similar two vectors are, measured by the distance between them, with smaller distances indicating greater similarity. @hummusonrails ‹#›

The search query as a vector @hummusonrails ‹#›

How is closeness determined? @hummusonrails ‹#›

@hummusonrails ‹#›

Euclidian • Measures the straight-line distance between two points in Euclidean space. • Often used in clustering algorithms like K-means, where the actual distance between points is important. @hummusonrails grouping by similarity Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2024. All rights reserved. ‹#›

Cosine • Measures the cosine of the angle between two vectors, indicating how similar their directions are. • Commonly used in text analysis and information retrieval where the orientation (not the meaning of magnitude) of the vectors text, not size matters. of text @hummusonrails Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2024. All rights reserved. ‹#›

Manhattan Distance (L1) • Measures the sum of the lots of absolute differences of their information coordinates. • Useful in high-dimensional spaces and in scenarios where the differences are linear and not squared, such as certain machine learning algorithms and optimization problems. @hummusonrails straightforward differences Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2024. All rights reserved. ‹#›

Dot Product Similarity • Measures the sum of the products of corresponding elements of two vectors. • Higher sums equate to more similarity. • Often used in information retrieval and recommendation systems, where the magnitude and direction of vectors are important. @hummusonrails Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2024. All rights reserved.

@hummusonrails ‹#›

Keyword search starts @hummusonrails ‹#›

But, have you ever seen something like this? @hummusonrails ‹#›

@hummusonrails ‹#›

@hummusonrails ‹#›

When the query is more complex, vector search can be more performant @hummusonrails ‹#›

It’s even useful for more than just @hummusonrails ‹#›

@hummusonrails ‹#›

Images, audio and video can be transformed into @hummusonrails ‹#›

What’s it got to do with GENAI @hummusonrails ‹#›

Chatbots without context @hummusonrails ‹#›

HALLUCINATE @hummusonrails ‹#›

@hummusonrails ‹#›

Retrieval Augmented Generation Query Embedding & Retrieval Convert to vectors Find relevant documents Contextual Augmentation Text Generation Enhance with content Add background details Ensure relevance Generate coherent text @hummusonrails Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2024. All rights reserved. ‹#›

@hummusonrails ‹#›

When would you use it? ‹#›

Finding and retrieving similar items quickly and accurately in large datasets @hummusonrails ‹#›

1.Recomendation systems 2.Anomaly detection 3.Natural language processing @hummusonrails @hummusonrails ‹#›

  1. Exact match retrieval 2. Real time transaction processing 3. Small simple datasets @hummusonrails ‹#›

@hummusonrails ‹#›

Online fraud costs £27 billion annually in the UK @hummusonrails

Ben, show me an example! ‹#›

Where does every developer experiment with new technologies? @hummusonrails ‹#›

PERSONAL SITE @hummusonrails ‹#›

@hummusonrails ‹#›

@hummusonrails

Mastering Vector Search for Developers Est. Publish Date: February 2025, Published by: The Pragmatic Bookshelf Intro to vector search Building vector search in Node.js Implementing similarity measures with JavaScript Creating search functionality with Node.js examples Optimizing search performance in JavaScript Real-world Node.js applications for vector search vectorsearchbook.com @hummusonrails ‹#›

Thank you! @hummusonrails ‹#›