Re-Ranking - Peak RAG Framework

Overview

Re-ranking is an essential step in the RAG framework that refines search results by adjusting their order based on contextual relevance. Using a local re-ranking AI model, the RAG framework ensures that the most meaningful and relevant content is prioritized, leading to highly accurate responses tailored to user queries.

How It Works

In a hybrid search setup, initial search results are retrieved through a combination of vector similarity (semantic relevance) and text-based metrics (keyword relevance). While this initial approach is effective, the relevance of each result can vary, especially for complex queries requiring specific context. Re-ranking enhances accuracy by analyzing the content of top results and adjusting their order to prioritize those that most closely match the query’s intent. The re-ranking model is deployed locally within the Docker setup, providing fast, cost-effective processing without reliance on external APIs. This model re-evaluates the relevance of the retrieved documents based on the full query context, adding an extra layer of precision.

Re-ranking Model

The RAG framework uses a cross-encoder model (cross-encoder-ms-marco-MiniLM-L-6-v2) from Weaviate’s reranker-transformers module. This model compares the query with each retrieved result, scoring them based on contextual fit. Higher-scoring results are moved to the top of the list, ensuring that the response is generated from the most relevant documents.

Docker Setup

Docker compose example:

services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.27.0
    ports:
      - 8080:8080
    environment:
      ENABLE_MODULES: "text2vec-openai,reranker-transformers"
      RERANKER_INFERENCE_API: "http://reranker-transformers:8080"
      # Additional Weaviate configuration

  reranker-transformers:
    image: cr.weaviate.io/semitechnologies/reranker-transformers:cross-encoder-ms-marco-MiniLM-L-6-v2
    environment:
      ENABLE_CUDA: "0" # Use CPU for inference
    ports:
      - 5000:8080

Configuration Details:

Weaviate Service: Configured to use the reranker-transformers module and communicate with the local re-ranking model at http://reranker-transformers:8080.
Re-ranking Model: Runs locally as a separate service, using the cross-encoder model to re-evaluate and reorder search results based on contextual alignment with the query.

Example Workflow

Initial Retrieval

The RAG framework retrieves an initial set of documents using hybrid search (or NearText), returning more documents than needed for comprehensive evaluation.

Re-ranking with Cross-encoder

The re-ranking model compares each document to the query, assigning scores that reflect contextual relevance, ensuring the best matches are identified.

Re-ordered Results

The results are re-ranked according to their scores. The top results (e.g., 5 or 10, based on configuration) are retained for further processing by an LLM, ensuring that only the most relevant information is used for the final response.

Vectorization

​Overview

​How It Works

​Re-ranking Model

​Docker Setup

​Example Workflow

Overview

How It Works

Re-ranking Model

Docker Setup

Example Workflow