
AI Series Blog: What Marketers Need to Know About RAG Pipelines
Content Context Is King!:
What Marketers Need to Know About Retrieval Augmented Generation (RAG) Pipelines
By Srinivasan Margabandhu, Executive Director, AI Research
and Chris Tratar, Sr. Director of Product Marketing
As marketers we have lived by the phrase “content is king” when it comes to engaging with our customers and prospective buyers. Good content keeps people engaged, but only if they can find it. Websites have been the main source for delivering content to our buyers. While content and the website still play very important roles, with the emergence of AI, context may be emerging as the new king. AI must understand the context of what the visitor wants to learn in order to deliver a personalized experience and accurate answers.
In the world of Generative AI, Retrieval-Augmented Generation (RAG) is a critical component for making sure AI answers are accurate and relevant in context. RAG Pipelines provide context in a sea of noise. If you’re unfamiliar, RAG enhances Large Language Models (LLMs) by equipping them with external knowledge, making them incredibly powerful for search and question-answering tasks. Imagine your agents pulling information from your website, campaigns, resource sheets, and blog pages to provide accurate and up-to-date responses!
But what exactly does all that mean and how does it work?
RAG Pipelines SuperCharge Large Language Models (LLMs)
First, let’s refresh on what LLM’s and RAG pipelines are and how they work together. Large Language Models (LLMs) are powerful AI engines trained on vast amounts of data. However, they have limitations: their knowledge is based on their training data and can quickly become outdated, they aren’t trained on specific proprietary data, and they can struggle with contextual understanding. This can lead to inaccurate or “hallucinated” responses. Retrieval-Augmented Generation (RAG) addresses these issues by providing LLMs with temporary, real-time access to specific, up-to-date content relevant to each query, essentially giving the LLM and your agents a “cheat sheet” for any given conversation.
Why RAG Matters
Secure Access to Your Data
Many businesses worry about AI security: Will my data be safe? Will it be used to train AI models? RAG offers a solution. It lets LLMs temporarily use your company’s specific data without using it for training. Your data stays secure because RAG only accesses it when needed to answer a specific question. This security makes AI usable in sensitive business environments.
Up-To-Date and Relevant Information
LLMs aren’t usually trained on your company’s specific details (like your team, products, or value proposition). Without RAG, they might struggle to understand what your customers need or answer their questions accurately. This can lead to a poor customer experience and potentially lost business.
Understanding Context
How does RAG help AI understand context? Consider the word “figure.” It can mean a historical person, a math calculation, the human body, or a figure of speech. While LLMs know these meanings, they need RAG to determine the correct context for a specific query. Without RAG providing the right context and up-to-date information, the AI experience can be frustrating.
Even though LLMs have broad knowledge, without RAG to pinpoint the right context and apply recent and specific information, the AI interaction might be very poor for the user.
How RAG Works
Do you remember open book exams? (if you do – then congrats – you’re now the audience for classic rock radio stations – also do you remember radio?). Think of RAG as giving a student access to the right textbooks before an open-book test. This helps ensure they give the most accurate and complete answers.
In short, RAG is all about efficiently organizing information and providing the right context to LLMs so they can give accurate and helpful answers.
Let’s break it down even further and see how PathFactory uses RAG Pipelines. There are three key capabilities that make PathFactory RAG work.
#1 Chunking: Breaking Down Content
Content comes in many forms. To make it easier for LLMs to use and faster to get the right answers, we need to break it into smaller, manageable pieces called “chunks.” Good chunking involves:
- Chunk Size: Finding the right size—not too small (losing information) and not too big (too confusing).
- Chunking Methods: Using techniques like sentence/paragraph-level, size-based, semantic, or agentic chunking.
- Purposeful Chunking: Choosing the method based on the specific goal, like information retrieval or recommendations.
#2 Data Preparation (Offline)
This is the initial setup. Content is processed, broken into chunks, and converted into numerical representations (vector embeddings). These embeddings are stored in a “vector store.”
Example: For the word “figure,” chunks might be created for “mathematical figure” (containing math terms), “historical figure” (containing history terms), and “human figure” (containing body terms).
#3 Answering Questions (Online)
When someone asks a question, the system processes the query, finds the most relevant chunks from the vector store, and uses those chunks to provide context for the LLM to generate a response.
Example: If the question is “who are some famous figures from the 18th century?”, the system identifies “figure” chunks, and then can understand from the context the question that it is referring to a “historical” figure. It then retrieves the right chunks, and sends them to the LLM to help it form the answer. The LLM also uses its own knowledge and any company-specific data to give the best response.
RAG Challenges and How PathFactory Solves Them
While RAG pipelines greatly improve AI, it does have limitations. Without careful development and fine-tuning of RAG pipelines, issues can arise that negatively impact response accuracy and customer experience. Some key challenges include:
- Limited Context Window: Large Language Models (LLMs) can only process a certain amount of text at a time.
- “Lost-in-the-Middle” Problem: LLMs sometimes have difficulty processing information located in the middle of very long text passages.
- Text-Based Focus: Traditional RAG primarily deals with text, which can be restrictive.
- Handling Structured Data: Processing tables and other structured information can be complex.
ChatFactory AI Agents address these RAG limitations through its “Ensemble RAG” approach. This involves using multiple RAG agents, acting as “research assistants,” that collaborate to locate the correct information depending on the specific query type.
ChatFactory’s Ensemble RAG: A Multi-Layered Approach
Most RAG systems do what is called a search and re-rank on the content they are trained on. PathFactory is using innovative approaches to RAG by focusing on a question-based approach using a unique ensemble and graph RAG approach. PathFactory makes significant investments in refining its RAG pipeline in ChatFactory, making it suitable for large, complex enterprises. Rather than using a single RAG method, ChatFactory employs an “Ensemble RAG” strategy, using multiple RAG pipelines working together. This ensures the most accurate and helpful answers for any question. Our Ensemble RAG includes:
- Direct/Quick RAG: For simple, straightforward queries. They are relatively quick to respond back, as compared to the other methods.
- Query Intent RAG: For more complex queries, like follow-ups, that require deeper analysis.
- Graph RAG: For intricate queries that involve interconnected information and reasoning. It uses knowledge graphs to link entities and provide context.
Controlling Context and Content: Conversation Collections
ChatFactory uses “Conversation Collections” to define the specific scope of information used to answer a buyer’s question. This “ring-fencing” ensures responses are relevant and accurate.
Besides this, there is a query exception handler which can operate in different modes:
- Strict Mode: Only answers questions related to the defined collection.
- Open Mode: Allows the LLM to answer any question, but with a higher chance of errors, data in the answers that you don’t want or the ability for users to game the algorithm..
- Conditional Mode: Checks if the question is relevant to the collection before allowing the LLM to answer.
The Future is RAG
It is important for us as marketers to understand RAG pipelines and to carefully evaluate vendors based on how they are providing AI services in their agents and apps. AI agents don’t have to be a mysterious black box and as you are evaluating AI vendors be sure to ask these three questions.
- Does your AI keep my data safe and out of the training for the overall LLM? The answer should be yes.
- Can I train your AI on a limited amount of data that is unique to my organization? Again the answer should be yes.
- How does your AI determine the context of a users question and ensure it is providing the highest quality of answer? You can use this blog to know what to look for.
Picking a vendor with very strong RAG pipeline capabilities will ensure that your data is secure and your customers and prospects have a high quality, valuable conversation where they can educate themselves on your company, products and services in a completely 1:1 personalized experience where context is king.