Integration: Elasticsearch Document Store
Use an Elasticsearch database with Haystack
The ElasticsearchDocumentStore
is maintained within the core Haystack project. It allows you to use
Elasticsearch as data storage for your Haystack pipelines.
For a details on available methods, visit the API Reference
Installation
To run an Elasticsearch instance locally, first follow the installation and start up guides.
pip install farm-haystack[elasticsearch]
To install Elasticsearch 7, you can run pip install farm-haystac[elasticsearch7]
.
Usage
Once installed, you can start using your Elasticsearch database with Haystack by initializing it:
from haystack.document_stores import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(host = "localhost",
port = 9200,
embedding_dim = 768)
Writing Documents to ElasticsearchDocumentStore
To write documents to your ElasticsearchDocumentStore
, create an indexing pipeline, or use the write_documents()
function.
For this step, you may make use of the available
FileConverters and
PreProcessors, as well as other
Integrations that might help you fetch data from other resources.
Indexing Pipeline
from haystack import Pipeline
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.nodes import TextConverter, PreProcessor
document_store = ElasticsearchDocumentStore(host = "localhost", port = 9200)
converter = TextConverter()
preprocessor = PreProcessor()
indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=converter, name="TextConverter", inputs=["File"])
indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["TextConverter"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"])
indexing_pipeline.run(file_paths=["filename.txt"])
Using Elasticsearch in a Query Pipeline
Once you have documents in your ElasitsearchDocumentStore
, it’s ready to be used in any Haystack pipeline. Such as a Retrieval Augmented Generation (RAG) pipeline. Learn more about
Retrievers to make use of vector search within your LLM pipelines.
from haystack import Pipeline
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.nodes import EmbeddingRetriever, PromptNode
document_store = ElasticsearchDocumentStore()
retriever = EmbeddingRetriever(document_store = document_store,
embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1")
prompt_node = PromptNode(model_name_or_path = "google/flan-t5-xl", default_prompt_template = "deepset/question-answering")
query_pipeline = Pipeline()
query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
query_pipeline.run(query = "Where is Istanbul?")