Skip to main content

Search (RAG)

The POST endpoint /v1/search allows you to search for chunks within collections using a query prompt. This increases the context available to a language model through RAG (Retrieval-Augmented Generation). RAG search retrieves relevant chunks from your collections based on your query, enabling language models to generate responses grounded in your specific documents and knowledge base.

RAG search flow

Search Methods

OpenGateLLM supports multiple search methods:

MethodDescription
semanticVector similarity search using embeddings
lexicalKeyword-based search (BM25)
hybridCombination of semantic and lexical search

Search query

  • prompt: Search query (required)
  • collections: List of collection IDs to search in (required)
  • method: Search method (default: semantic)
  • limit: Number of results to return (default: 10, max: 200)
  • offset: Pagination offset (default: 0)
  • rff_k: RRF constant for hybrid search (default: 20)
  • score_threshold: Minimum similarity score (0.0-1.0, only for semantic)
curl -X POST http://localhost:8000/v1/search \
-H "Authorization: Bearer <api_key>" \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is machine learning?",
"collections": [1, 2],
"method": "semantic",
"limit": 10,
"score_threshold": 0.7
}'
info

See Configuration for more details.