Overview
Re-ranking is an essential step in the RAG framework that refines search results by adjusting their order based on contextual relevance. Using a local re-ranking AI model, the RAG framework ensures that the most meaningful and relevant content is prioritized, leading to highly accurate responses tailored to user queries.How It Works
In a hybrid search setup, initial search results are retrieved through a combination of vector similarity (semantic relevance) and text-based metrics (keyword relevance). While this initial approach is effective, the relevance of each result can vary, especially for complex queries requiring specific context. Re-ranking enhances accuracy by analyzing the content of top results and adjusting their order to prioritize those that most closely match the query’s intent. The re-ranking model is deployed locally within the Docker setup, providing fast, cost-effective processing without reliance on external APIs. This model re-evaluates the relevance of the retrieved documents based on the full query context, adding an extra layer of precision.Re-ranking Model
The RAG framework uses a cross-encoder model (cross-encoder-ms-marco-MiniLM-L-6-v2) from Weaviate’s reranker-transformers module. This model compares the query with each retrieved result, scoring them based on contextual fit. Higher-scoring results are moved to the top of the list, ensuring that the response is generated from the most relevant documents.Docker Setup
Docker compose example:- Weaviate Service: Configured to use the reranker-transformers module and communicate with the local re-ranking model at http://reranker-transformers:8080.
- Re-ranking Model: Runs locally as a separate service, using the cross-encoder model to re-evaluate and reorder search results based on contextual alignment with the query.
Example Workflow
1
Initial Retrieval
The RAG framework retrieves an initial set of documents using hybrid search
(or NearText), returning more documents than needed for comprehensive
evaluation.
2
Re-ranking with Cross-encoder
The re-ranking model compares each document to the query, assigning scores
that reflect contextual relevance, ensuring the best matches are identified.
3
Re-ordered Results
The results are re-ranked according to their scores. The top results (e.g.,
5 or 10, based on configuration) are retained for further processing by an
LLM, ensuring that only the most relevant information is used for the final
response.