RAG in Detail — How it works ?

9 min readApr 28, 2024

Banking compliance and regulatory oversight, particularly in Anti-Money Laundering (AML), Counter-Terrorist Financing (TF), and Credit Risk, are cornerstones of a robust financial sector. Daily, compliance officers must interpret and apply intricate regulations, monitor transactions for suspicious activity, assess customer risk profiles, and ensure that loans and financial activities align with both internal and external risk thresholds. The challenge lies in the sheer volume and complexity of these tasks, often under the pressure of dynamic regulatory updates.

Generative AI LLMs, fine-tuned through prompt engineering, have the potential to aid compliance officers by quickly generating insights or responses to regulatory inquiries. However, these generic LLMs typically fall short due to their lack of access to customized, institution-specific data, which can lead to generic, less precise guidance that fails to reflect the unique risk frameworks and compliance policies of individual banks.

Retrieval-Augmented Generation (RAG) offers a sophisticated solution to these challenges. By integrating specific, up-to-date data from a particular bank’s documents, RAG tailors responses to the nuanced requirements of AML, TF, and Credit Risk compliance. It ensures that the output not only aligns with global regulations but also adheres to the bank’s internal policies and risk assessments, delivering custom, actionable guidance that supports compliance officers in their critical roles.

Let’s now see how it works

Retrieval-Augmented Generation (RAG) processes hinges on two key processes: retrieval and generation.

Retrieval Section

The retrieval process is the first critical step.

Here, when a compliance officer queries a system regarding a complex regulation or a risk assessment issue, the RAG system scours through an extensive database of internal and external compliance documents. Utilizing advanced algorithms, it identifies and retrieves the most relevant document chunks that pertain to the specific question. This step ensures that the subsequent generation process is informed by accurate and up-to-date information, reflecting the intricacies of AML/TF and Credit Risk protocols.

Process involved in the Retrieval section includes:

A.1. Input Data

In the retrieval phase of a RAG system, ‘Input Data’ forms the foundational step.

This involves curating a comprehensive dataset of compliance-related materials such as regulations like the Bank Secrecy Act, FATF recommendations, internal bank policies, KYC requirements, and past audit findings from diverse sources including legal databases and regulatory bodies. The integrity and breadth of this dataset are crucial as they underpin the system’s response capabilities to specific and complex queries.

For compliance officers, this dataset acts as a pivotal resource for all retrieval activities, requiring meticulous compilation to reflect both global and local compliance landscapes accurately.

A.2. Data Chunking

The process of “Data Chunking” is a foundational step in preparing a Retrieval-Augmented Generation (RAG) system, particularly tailored for banking compliance. This involves the strategic segmentation of lengthy compliance documents into smaller, manageable pieces or “chunks” that can be efficiently processed and analyzed by the system.

There are various strategies for document chunking that cater to different needs.

Small vs Large chunks: Small chunks are used for detailed queries and contain specific information, while large chunks provide a broader context suitable for more general questions.

Sliding or tumbling window for chunking: The sliding window approach overlaps chunks to ensure continuity and prevent information loss, ideal for complex interpretations. Conversely, the tumbling window technique produces distinct, non-overlapping segments, preferred for queries requiring standalone information.

Retrieve parent or linked chunks: Additionally, systems may opt to retrieve parent or linked chunks for enhanced context or rely solely on the initially retrieved data to maintain focus and relevance.

Employing a mixed strategy can help. This setup allows for detailed and context-rich interpretations of compliance documents without losing crucial information. By sometimes retrieving linked chunks, we can further enhance the contextual depth of the answers provided, ensuring that the outputs are not only accurate but also comprehensively address the query within the complex framework of banking regulations.

A.3. Vectorisation and Embeddings

In this step, document chunks are transformed into vector embeddings through vectorisation process.

Vectorisation is the process where text from document chunks is converted into vector embeddings using algorithms such as TF-IDF or BERT.

Vector embeddings are numerical representations that capture the semantic meaning of the text, allowing systems to process and analyse the content effectively.

This crucial step ensures that each chunk’s relevance to a specific query can be accurately assessed.

In compliance settings, where precision is critical, the quality of these embeddings is paramount. They must accurately reflect the intricate legal and technical nuances of compliance documentation to ensure that queries are matched with the most relevant and precise information available.

A.4. Vector Database

The vector embeddings created from vectorisaiton are systematically catalogued and stored in a vector database, each tagged with a unique identifier for the corresponding document chunk. This database is a critical component of the RAG system.

Some factors to consider for the vector database include the choice between databases for efficient data handling, and hosting options ranging from local servers for security to cloud platforms for scalability. It’s also crucial to store detailed metadata and devise a strategic indexing plan to enhance retrieval processes, ensuring rapid and precise access to compliance information in a dynamic regulatory environment.

Generation Section

The generation process follows, where the retrieved information serves as a custom data source for a Generative AI Large Language Model (LLM). In this phase, the model uses the insights gained from the retrieval process to craft a nuanced, context-aware response. Prompt engineering techniques refine this process by guiding the LLM to consider the specifics of the compliance officer’s query, including the bank’s unique policies and the latest regulatory developments.

The generation process begins by receiving a specific query from a compliance officer, which targets a particular compliance or regulatory issue. This query sets the stage for the subsequent steps in the Retrieval-Augmented Generation (RAG) system.

Process involved in the Generation section includes:

B.1. Receive User Query

The first step involves capturing the exact question posed by the compliance officer. This query could range from seeking clarification on a particular regulation, asking about compliance procedures for a new product, or understanding risk assessment strategies. This step is critical as it defines the scope and focus of the retrieval and generation processes that follow.

B.2. Vectorisation and Embeddings of Question

In this critical step, the compliance officer’s query is transformed into a vector format.

This process involves converting the textual input (question) into a set of numerical values that represent the semantic essence of the query in a high-dimensional space. By doing this, the query is aligned with the vector space of the document chunks previously processed and stored in the system. This alignment is vital for the subsequent retrieval phase as it ensures that the query can be accurately compared against the vector embeddings of the document chunks.

The quality of the vectorization directly impacts the system’s ability to interpret and match the query with relevant documents from the vast compliance database, thereby determining the precision and relevance of the information retrieved.

Techniques such as TF-IDF or more sophisticated neural embeddings like BERT might be used to achieve a detailed and nuanced vector representation of the query.

B.3. Retrieve Document Chunk ID

In this stage, the system employs advanced search algorithms to determine which document chunks most closely align with the query based on their semantic similarity. This is done by comparing the vector embedding of the compliance officer’s query to the embeddings of the document chunks stored in the database.

The search techniques may include using metadata to guide the search or employing methods like Approximate Nearest Neighbors (ANN) to efficiently find the most similar embeddings. Hybrid search strategies might also be utilized, which combine multiple methods to optimize both the accuracy and speed of the search.

This comparison process is crucial as it identifies the document chunks that are most likely to contain the relevant information needed to answer the query accurately.

B.4. Retrieve Document Chunks from Storage

This step involves retrieving the actual text from the document chunks deemed most relevant in the previous step. Once the system has identified which chunks have the highest semantic similarity to the query — using their unique Document Chunk IDs — it then accesses the storage system to pull out these specific chunks. These IDs act as direct references or pointers to the locations of the chunks within the database or storage system, allowing for quick and precise retrieval.

This efficient mechanism ensures that compliance officers are provided with exactly the right sections of text needed to comprehensively address their queries. The process is crucial for delivering relevant and targeted information quickly, enabling effective and informed decision-making in compliance contexts.

B.5. Send to LLM with Question + Document Chunks + Prompt

In this phase of the generation process, the system integrates the text from the relevant document chunks that were retrieved, the original query from the compliance officer, and specifically crafted prompts into a cohesive input for the Large Language Model (LLM). This synthesis is designed to optimize the context provided to the LLM for generating a response.

Integration of Inputs: The retrieved document chunks and the query are combined to form a detailed context that encapsulates all the necessary information. This ensures that the LLM has access to both the specific query and the relevant background or supporting data, which helps in understanding the scope and specifics of the compliance question.

Prompt Engineering: Alongside the raw data, prompts are engineered to direct the LLM’s processing and response generation. These prompts are not simple commands but are carefully formulated to guide the LLM’s focus towards producing the most relevant and accurate answers. They might highlight key aspects of the query, suggest a format or structure for the response, or emphasize critical pieces of information that must be included in the output.

LLM Response Generation: With this rich backdrop of detailed document data and strategic prompts, the LLM is then tasked with generating a coherent and contextually appropriate answer. The effectiveness of this step hinges on the LLM’s ability to parse and synthesize the inputs into a response that not only answers the query directly but also adheres to compliance standards and practices.

This combined approach ensures that the answers generated are not only based on a deep understanding of the provided documents but are also tailored to the specific nuances and requirements of the query, resulting in highly informed and relevant responses for compliance officers.

B.6. LLM Generated Answer

In this final phase, the Large Language Model (LLM) utilizes integrated data from the document chunks and query specifics to produce a comprehensive natural language response.

The LLM employs advanced processing techniques and structured prompts to achieve precise relevance and alignment with the query’s needs.

This step highlights the effectiveness of the RAG system, showcasing how it combines detailed prompt engineering and sophisticated AI capabilities to provide precise, directly applicable compliance guidance, thereby aiding compliance officers in making informed decisions.

RAG helps by ensuring the generative process is continuously fed with tailored, relevant data from the retrieval phase, providing compliance officers with precise, actionable, and regulation-compliant answers to their pressing queries.

Performance Metrics and Evaluation Techniques

To ensure the Retrieval-Augmented Generation (RAG) system functions effectively and efficiently, we employ several strategies to measure its performance:

Accuracy:

We evaluate the relevance and correctness of responses from the Large Language Model (LLM) against predefined benchmarks using precision and recall metrics. Semantic analysis tools are also utilized to assess the depth and appropriateness of responses to compliance queries.

Speed:

The system’s response times are monitored through automated timers and logging. Real-time performance monitoring tools are also implemented to promptly identify and address processing delays.

Scalability:

Regular load tests help determine the system’s capability to handle increased data volumes and simultaneous queries. Additionally, we analyze CPU, memory, and storage usage to ensure the system scales effectively without losing efficiency.

User Satisfaction:

Feedback from compliance officers is collected via surveys to refine the system continuously. This feedback is crucial for iterative improvements and ensuring the system meets user needs and preferences.

These methods collectively maintain the RAG system’s reliability, helping compliance officers efficiently navigate regulatory challenges.

Conclusion

In conclusion, the Retrieval-Augmented Generation (RAG) system represents a transformative approach to handling complex decision-making and information management across diverse industries.

By effectively combining advanced retrieval techniques with cutting-edge generative AI, RAG equips professionals with accurate, timely, and contextually relevant insights tailored to specific inquiries. This system not only enhances the precision of responses by leveraging detailed, up-to-date data but also improves operational efficiency by simplifying the processing of large volumes of information amidst ongoing changes.

Looking forward, the continued adoption and adaptation of technologies like RAG will be crucial for organizations aiming to maintain a competitive edge and uphold the highest standards of efficiency and reliability in an increasingly complex world.

RAG in Detail — How it works ?

Let’s now see how it works

Retrieval Section

Generation Section

Performance Metrics and Evaluation Techniques

Accuracy:

Speed:

Scalability:

User Satisfaction:

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Aruna Pattam

No responses yet