Skip to main content

Overview

Re-ranking is an essential step in the RAG framework that refines search results by adjusting their order based on contextual relevance. Using a local re-ranking AI model, the RAG framework ensures that the most meaningful and relevant content is prioritized, leading to highly accurate responses tailored to user queries.

How It Works

In a hybrid search setup, initial search results are retrieved through a combination of vector similarity (semantic relevance) and text-based metrics (keyword relevance). While this initial approach is effective, the relevance of each result can vary, especially for complex queries requiring specific context. Re-ranking enhances accuracy by analyzing the content of top results and adjusting their order to prioritize those that most closely match the query’s intent. The re-ranking model is deployed locally within the Docker setup, providing fast, cost-effective processing without reliance on external APIs. This model re-evaluates the relevance of the retrieved documents based on the full query context, adding an extra layer of precision.

Re-ranking Model

The RAG framework uses a cross-encoder model (cross-encoder-ms-marco-MiniLM-L-6-v2) from Weaviate’s reranker-transformers module. This model compares the query with each retrieved result, scoring them based on contextual fit. Higher-scoring results are moved to the top of the list, ensuring that the response is generated from the most relevant documents.

Docker Setup

Docker compose example:
services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.27.0
    ports:
      - 8080:8080
    environment:
      ENABLE_MODULES: "text2vec-openai,reranker-transformers"
      RERANKER_INFERENCE_API: "http://reranker-transformers:8080"
      # Additional Weaviate configuration

  reranker-transformers:
    image: cr.weaviate.io/semitechnologies/reranker-transformers:cross-encoder-ms-marco-MiniLM-L-6-v2
    environment:
      ENABLE_CUDA: "0" # Use CPU for inference
    ports:
      - 5000:8080
Configuration Details:
  • Weaviate Service: Configured to use the reranker-transformers module and communicate with the local re-ranking model at http://reranker-transformers:8080.
  • Re-ranking Model: Runs locally as a separate service, using the cross-encoder model to re-evaluate and reorder search results based on contextual alignment with the query.

Example Workflow

1

Initial Retrieval

The RAG framework retrieves an initial set of documents using hybrid search (or NearText), returning more documents than needed for comprehensive evaluation.
2

Re-ranking with Cross-encoder

The re-ranking model compares each document to the query, assigning scores that reflect contextual relevance, ensuring the best matches are identified.
3

Re-ordered Results

The results are re-ranked according to their scores. The top results (e.g., 5 or 10, based on configuration) are retained for further processing by an LLM, ensuring that only the most relevant information is used for the final response.