Overview
Re-ranking is an essential step in the RAG framework that refines search results by adjusting their order based on contextual relevance. Using a local re-ranking AI model, the RAG framework ensures that the most meaningful and relevant content is prioritized, leading to highly accurate responses tailored to user queries.How It Works
In a hybrid search setup, initial search results are retrieved through a combination of vector similarity (semantic relevance) and text-based metrics (keyword relevance). While this initial approach is effective, the relevance of each result can vary, especially for complex queries requiring specific context. Re-ranking enhances accuracy by analyzing the content of top results and adjusting their order to prioritize those that most closely match the query’s intent. The re-ranking model is deployed locally within the Docker setup, providing fast, cost-effective processing without reliance on external APIs. This model re-evaluates the relevance of the retrieved documents based on the full query context, adding an extra layer of precision.Re-ranking Model
The RAG framework uses a cross-encoder model (cross-encoder-ms-marco-MiniLM-L-6-v2) from Weaviate’s reranker-transformers module. This model compares the query with each retrieved result, scoring them based on contextual fit. Higher-scoring results are moved to the top of the list, ensuring that the response is generated from the most relevant documents.Docker Setup
Docker compose example:- Weaviate Service: Configured to use the reranker-transformers module and communicate with the local re-ranking model at http://reranker-transformers:8080.
- Re-ranking Model: Runs locally as a separate service, using the cross-encoder model to re-evaluate and reorder search results based on contextual alignment with the query.
Example Workflow
Initial Retrieval
The RAG framework retrieves an initial set of documents using hybrid search
(or NearText), returning more documents than needed for comprehensive
evaluation.
Re-ranking with Cross-encoder
The re-ranking model compares each document to the query, assigning scores
that reflect contextual relevance, ensuring the best matches are identified.