Most Retrieval-Augmented Generation (RAG) systems treat retrieval as static. The retriever fetches documents using fixed parameters (k, reranker, embedding model), and the generator produces an answer.
But retrieval quality depends on:
A static pipeline cannot adapt to these variations.
Treat retrieval as a multi-armed bandit problem.
Then optimize the generator using reinforcement learning (e.g., GRPO), with rewards shaped by correctness, helpfulness, citations, or cost.
Retrieval influences generation quality. Generation reward updates retrieval strategy.
The result: an adaptive RAG system that improves over time instead of degrading.