Integration: Pinecone Document Store
Use a Pinecone database with Haystack
Pinecone is a fast and scalable vector database which you can use in Haystack pipelines with the PineconeDocumentStore
For a detailed overview of all the available methods and settings for the PineconeDocumentStore
, visit the Haystack
API Reference
Installation
pip install farm-haystack[pinecone]
Usage
To use Pinecone as your data storage for your Haystack LLM pipelines, you must have an account with Pinecone and an API Key. Once you have those, you can initialize a PineconeDocumentStore
for Haystack:
from haystack.document_stores import PineconeDocumentStore
document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
similarity="cosine",
embedding_dim=768)
Writing Documents to PineconeDocumentStore
To write documents to your PineconeDocumentStore
, create an indexing pipeline, or use the write_documents()
function.
For this step, you may make use of the available
FileConverters and
PreProcessors, as well as other
Integrations that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Pinecone database.
Indexing Pipeline
from haystack import Pipeline
from haystack.document_stores import PineconeDocumentStore
from haystack.nodes import MarkdownConverter, PreProcessor
document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
similarity="cosine",
embedding_dim=768)
converter = MarkdownConverter()
preprocessor = PreProcessor()
indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"])
indexing_pipeline.run(file_paths=["filename.pdf"])
Using Pinecone in a Query Pipeline
Once you have documents in your PineconeDocumentStore
, it’s ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of a custom prompt that is designed to answer questions for the retrieved documents.
from haystack import Pipeline
from haystack.document_stores import PineconeDocumentStore
from haystack.nodes import AnswerParser, EmbeddingRetriever, PromptNode, PromptTemplate
document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
similarity="cosine",
embedding_dim=768)
retriever = EmbeddingRetriever(document_store = document_store,
embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
prompt_template = PromptTemplate(prompt = """"Answer the following query based on the provided context. If the context does
not include an answer, reply with 'I don't know'.\n
Query: {query}\n
Documents: {join(documents)}
Answer:
""",
output_parser=AnswerParser())
prompt_node = PromptNode(model_name_or_path = "gpt-4",
api_key = "YOUR_OPENAI_KEY",
default_prompt_template = prompt_template)
query_pipeline = Pipeline()
query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
query_pipeline.run(query = "What is Pinecone", params={"Retriever" : {"top_k": 5}})