# HyDE (Hypothetical Document Embeddings)

Instead of generating queries based on the original question, [HyDE](https://arxiv.org/pdf/2212.10496) focuses on generating hypothetical docuemnts for a given query. The intution behind generating such hypothetical documents is their embedding vectors can be used to identify a neighborhood in the corpus embedding space where similar real documents are retrieved based on vector similarity. In that case, RAG will be able to retrieve more relevant documents based on the hypothetical documents to answer the user query accurately. 

Let's try to use HyDE to answer questions through RAG!

In [1]:
%load_ext dotenv
%dotenv secrets/secrets.env

In [2]:
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

First, similar to the previous notebooks, first we create our vector store and initialize the retriever using `OpenAIEmbeddings` and `Chroma`.

In [3]:
loader = DirectoryLoader('data/',glob="*.pdf",loader_cls=PyPDFLoader)
documents = loader.load()

# Split text into chunks

text_splitter  = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=20)
text_chunks = text_splitter.split_documents(documents)

vectorstore = Chroma.from_documents(documents=text_chunks, 
                                    embedding=OpenAIEmbeddings(),
                                    persist_directory="data/vectorstore")
vectorstore.persist()

retriever = vectorstore.as_retriever(search_kwargs={'k':5})

  warn_deprecated(


Then we ask the LLM to write a "hypothetical" passage on the asked question through a chain.

In [5]:
from langchain.prompts import ChatPromptTemplate

hyde_prompt = ChatPromptTemplate.from_template(
    """
    Please write a scientific passage of a paper to answer the following question:\n
    Question: {question}\n
    Passage: 
    """
)

generate_doc_chain = (
    {'question': RunnablePassthrough()}
    | hyde_prompt
    | ChatOpenAI(model='gpt-4',temperature=0)
    | StrOutputParser()
)

In [21]:
question = "How Low Rank Adapters work in LLMs?"
generate_doc_chain.invoke(question)

"Low Rank Adapters (LRAs) are a recent development in the field of Large Language Models (LLMs) that aim to reduce the computational and memory requirements of these models while maintaining their performance. The fundamental principle behind LRAs is the use of low-rank approximations to reduce the dimensionality of the model's parameters.\n\nIn the context of LLMs, an adapter is a small neural network that is inserted between the layers of a pre-trained model. The purpose of this adapter is to adapt the pre-trained model to a new task without modifying the original parameters of the model. This allows for efficient transfer learning, as the pre-trained model can be adapted to a wide range of tasks with minimal computational cost.\n\nLow Rank Adapters take this concept a step further by using low-rank approximations to reduce the number of parameters in the adapter. This is achieved by decomposing the weight matrix of the adapter into two low-rank matrices. The resulting model has fewe

Using the generated passage, we then retrieve the similar documents using our retriever.

In [22]:
retrieval_chain = generate_doc_chain | retriever 
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

[Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping the pre-trained weights frozen, as shown in Figure 1. Using GPT-3', metadata={'page': 1, 'source': 'data/LoRA.pdf'}),
 Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping 

Finally, we use the retrieved docuemnts based on the "hypothetical" passage is used as the context to answer our original question through the `final_rag_chain`.

In [23]:
template = """Answer the following question based on the provided context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | ChatOpenAI(model='gpt-4',temperature=0)
    | StrOutputParser()
)

final_rag_chain.invoke({"context":retireved_docs,"question":question})

'Low-Rank Adapters (LoRA) work in Large Language Models (LLMs) by allowing the training of some dense layers in a neural network indirectly. This is done by optimizing rank decomposition matrices of the dense layers\' change during adaptation, while keeping the pre-trained weights frozen. The hypothesis is that the change in weights during model adaptation has a low "intrinsic rank".'

Even though this technique might help answer questions, there is a chance to the answer be wrong due to retrieving documents based on the incorrect/hallucinated hypothetical passage.

In the next section, we talk about "Routing" in RAG.