Architecting Enterprise AI Series: 7 Failure Points of RAG (Retrieval Augmented Generation) Systems

Artificial intelligence (AI) is becoming increasingly important in the enterprise as businesses seek ways to enhance decision-making, automate processes, and gain insights from data. AI systems, which generate outputs that can influence real or virtual environments, can help organizations achieve these objectives.

Enterprise AI is applying AI technologies to optimize operations, enhance decision-making, and create new products and services. This includes various AI systems, such as Classifiers, Generative Models, and Recommenders.

Enterprise AI aims to leverage these AI capabilities to improve efficiency, reduce costs, gain a competitive advantage, and achieve strategic objectives.

RAG – Retrieval Augmented Generation and its Potential Failure Points.

LLM has main issues like hallucinations and inaccuracy context understanding, therefore since 2023 many attempts has been done to reduce it with many architecture of RAG. However, RAG needs to be monitored as well for potential failures (FP).

  • Missing content (FP1): A question is posed that cannot be answered with the available documents. In the ideal scenario, the RAG system responds with a message like “Sorry, I don’t know.” However, for questions related to content without clear answers, the system might be misled into providing a response.
  • Missed the top ranked documents (FP2): The answer to a question is present in the document, but did not rank highly enough to be included in the results returned to the user. While all documents are theoretically ranked and utilized in subsequent steps, in practice only the top K documents are returned, with K being a value selected based on performance.
  • Not in context – consolidation strategy limitations (FP3): Documents containing the answer are retrieved from the database but fail to make it into the context for generating a response. This occurs when a substantial number of documents are returned, leading to a consolidation process where the relevant answer retrieval is hindered.
  • Not extracted (FP4): The answer is present in the context, but the model fails to extract the correct information. This typically happens when there is excessive noise or conflicting information in the context.
  • Wrong format (FP5): The question involves extracting information in a specific format, such as a table or list, and the model disregards the instruction.
  • Incorrect specificity (FP6): The response includes the answer but lacks the required specificity or is overly specific, failing to address the user’s needs. This occurs when RAG system designers have a predetermined outcome for a given question, such as teachers seeking educational content. In such cases, specific educational content should be provided along with answers. Incorrect specificity also arises when users are unsure how to phrase a question and are too general.
  • Incomplete (FP7): Incomplete answers are accurate but lack some information, even though that information was present in the context and available for extraction. For instance, a question like “What are the key points covered in documents A, B, and C?” would be better approached by asking these questions separately.

The table below contains the lessons they learned from solving each problem. Therefore, thesee need to be considered in architecting our enterprise RAG system.

Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z. and Abdelrazek, M., 2024, April. Seven failure points when engineering a retrieval augmented generation system. In Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering-Software Engineering for AI (pp. 194-199).