Tutorial: Customizing PromptNode for NLP Tasks
Last Updated: July 3, 2023
- Level: Intermediate
- Time to complete: 20 minutes
- Nodes Used:
PromptNode
,PromptTemplate
- Goal: After completing this tutorial, you will have learned the basics of using PromptNode and PromptTemplates and you’ll have added titles to articles from The Guardian and categorized them.
Overview
Use large language models (LLMs) through PromptNode and PromptTemplate to summarize and categorize your documents, and find a suitable title for them. In this tutorial, we’ll use news from The Guardian as documents, but you can replace them with any text you want.
This tutorial introduces you to the basics of LLMs and PromptNode, showcases the pre-defined “deepset/summarization” template, and explains how to use PromptTemplate to generate titles for documents and categorize them with custom prompts.
Preparing the Colab Environment
Installing Haystack
To start, let’s install the latest release of Haystack with pip
:
%%bash
pip install --upgrade pip
pip install farm-haystack[colab]
Enabling Telemetry
Knowing you’re using this tutorial helps us decide where to invest our efforts to build a better product but you can always opt out by commenting the following line. See Telemetry for more details.
from haystack.telemetry import tutorial_running
tutorial_running(21)
Trying Out PromptNode
The PromptNode is the central abstraction in Haystack’s large language model (LLM) support. It uses
google/flan-t5-base
model by default, but you can replace the default model with a flan-t5 model of a different size such as google/flan-t5-large
or a model by OpenAI such as text-davinci-003
.
Large language models are huge models trained on enormous amounts of data. Thatâs why these models have general knowledge of the world, so you can ask them anything and they will be able to answer.
As a warm-up, let’s initialize PromptNode and see what it can do when run stand-alone:
- Initialize a PromptNode instance with
google/flan-t5-large
:
from haystack.nodes import PromptNode
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large")
Note: To use PromptNode with an OpenAI model, change the model name and provide an
api_key
:prompt_node = PromptNode(model_name_or_path="text-davinci-003", api_key=<YOUR_API_KEY>)
- Ask any general question that comes to your mind, for example:
prompt_node("What is the capital of Germany?")
prompt_node("What is the highest mountain?")
As google/flan-t5-large
was trained on school math problems dataset named
GSM8K you can also ask some basic math questions:
prompt_node("If Bob is 20 and Sara is 11, who is older?")
Now that you’ve initialized PromptNode and saw how it works, let’s see how we can use it for more advanced tasks.
Summarizing Documents with PromptNode
PromptNode is integrated to PromptHub that includes ready-made prompts for the most common NLP tasks such as summarization, question answering, question generation, and more. To use a prompt template from the PromptHub, just provide its name to the PromptNode.
For this task, we’ll use the deepset/summarization
template from the PromptHub and news from The Guardian. Let’s see how to do it.
- Define news to use as
documents
for the PromptNode. We’ll use these documents throughout the whole tutorial.
from haystack.schema import Document
# https://www.theguardian.com/business/2023/feb/12/inflation-may-have-peaked-but-the-cost-of-living-pain-is-far-from-over
news_economics = Document(
"""At long last, Britainâs annual inflation rate is on the way down. After hitting the highest level since the 1980s, heaping pressure on millions of households as living costs soared, official figures this week could bring some rare good news.
City economists expect UK inflation to have cooled for a third month running in January â the exact number is announced on Wednesday â helped by falling petrol prices and a broader decline in the global price of oil and gas in recent months. The hope now is for a sustained decline in the months ahead, continuing a steady drop from the peak of 11.1% seen in October.
The message from the Bank of England has been clear. Inflation is on track for a ârapidâ decline over the coming months, raising hopes that the worst of Britainâs cost of living crisis is now in the rearview mirror.
There are two good reasons for this. Energy costs are moving in the right direction, while the initial rise in wholesale oil and gas prices that followed Russiaâs invasion of Ukraine in February last year will soon drop from the calculation of the annual inflation rate."""
)
# https://www.theguardian.com/science/2023/feb/13/starwatch-orions-belt-and-sirius-lead-way-to-hydras-head
news_science = Document(
"""On northern winter nights, it is so easy to be beguiled by the gloriously bright constellations of Orion, the hunter, and Taurus, the bull, that one can overlook the fainter constellations.
So this week, find the three stars of Orionâs belt, follow them down to Sirius, the brightest star in the night sky, and then look eastward until you find the faint ring of stars that makes up the head of Hydra, the water snake. The chart shows the view looking south-east from London at 8pm GMT on Monday, but the view will be similar every night this week.
Hydra is the largest of the 88 modern constellations covering an area of 1,303 square degrees. To compare, nearby Orion only covers 594 square degrees. Hydra accounts for most of its area by its length, crossing more than 100 degrees of the sky (the full moon spans half a degree).
As evening becomes night and into the early hours, the rotation of Earth causes Hydra to slither its way across the southern meridian until dawn washes it from the sky. From the southern hemisphere, the constellation is easily visible in the eastern sky by mid-evening."""
)
# https://www.theguardian.com/music/2023/jan/30/salisbury-cathedral-pipe-organ-new-life-holst-the-planets
news_culture = Document(
"""A unique performance of Gustav Holstâs masterwork The Planets â played on a magnificent pipe organ rather than by an orchestra and punctuated by poems inspired by childrenâs responses to the music â is to be staged in the suitably vast Salisbury Cathedral.
The idea of the community music project is to introduce more people, young and old, to the 140-year-old âFatherâ Willis organ, one of the treasures of the cathedral.
It is also intended to get the children who took part and the adults who will watch and listen thinking afresh about the themes Holstâs suite tackles â war, peace, joy and mysticism â which seem as relevant now as when he wrote the work a century ago.
John Challenger, the cathedralâs principal organist, said: âWe have a fantastic pipe organ largely as it was when built. Itâs a thrilling thing. I view it as my purpose in life to share it with as many people as possible.â
The Planets is written for a large orchestra. âHolst calls for huge instrumental forces and an unseen distant choir of sopranos and altos,â said Challenger. But he has transposed the suite for the organ, not copying the effect of the orchestral instruments but finding a new version of the suite."""
)
# https://www.theguardian.com/sport/blog/2023/feb/14/multi-million-dollar-wpl-auction-signals-huge-step-forward-for-womens-sport
news_sport = Document(
"""It was only a few days ago that members of the Australian womenâs cricket team were contemplating how best to navigate the impending âdistractionâ of the inaugural Womenâs Premier League auction, scheduled during the first week of the T20 World Cup. âItâs a little bit awkward,â captain Meg Lanning said in South Africa last week. âBut itâs just trying to embrace that and understanding itâs actually a really exciting time and you actually donât have a lot of control over most of it, so youâve just got to wait and see.â
What a pleasant distraction it turned out to be. Lanning herself will be $192,000 richer for three weeksâ work with the Delhi Capitals. Her teammate, Ash Gardner, will earn three times that playing for the Gujarat Giants. The allrounderâs figure of $558,000 is more than Sam Kerr pockets in a season with Chelsea and more than the WNBAâs top earner, Jackie Young.
If that sounds like a watershed moment, itâs perhaps because it is. And it is not the only one this past week. The NRLW made its own wage-related headlines on Tuesday, to the effect that the next (agreed in principle) collective bargaining agreement will bring with it a $1.5m salary cap in 2027, at an average salary of $62,500. Womenâs rugby, too, is making moves, with news on the weekend that Rugby Australia will begin contracting the Wallaroos."""
)
news = [news_economics, news_science, news_culture, news_sport]
The token limit for
google/flan-t5-large
is 512. So, all news pieces should be shorter than the limit.
- Use the
deepset/summarization
template to generate a summary for each piece of news:
prompt_node.prompt(prompt_template="deepset/summarization", documents=news)
Here you go! You have generated summaries of your news articles. But we’re missing titles for them. Let’s see how PromptNode can help us there.
Generating Titles for News Articles with a Custom Template
The biggest benefit of PromptNode is its versatility. You can use it to perform practically any NLP task if you define your own prompt templates for them. By creating your prompt templates, you can extend the model’s capabilities and use it for a broader range of NLP tasks in Haystack.
You can define custom templates for each NLP task and register them with PromptNode. Let’s create a custom template to generate descriptive titles for news:
- Initialize a
PromptTemplate
instance by defining the prompt text inprompt
. To define any parameters for the prompt, add them to theprompt
wrapped with curly brackets. We need a template to generate titles for our news articles. The only parameter we need is{news}
, so let’s create a PromptTemplate for it:
from haystack.nodes import PromptTemplate
title_generator = PromptTemplate(
prompt="Provide a short, descriptive title for the given piece of news. News: {documents}; Title:"
)
- To use the new template, pass
title_generator
as theprompt_template
to theprompt()
method:
prompt_node.prompt(prompt_template=title_generator, documents=news)
There you go! You should have the titles for your news articles ready. Let’s now categorize them.
Categorizing Documents with PromptNode
You can customize PromptTemplates as much as you need. Let’s try to create a template to categorize the news articles.
- Define the
{news}
and{categories}
parameters. As we will accept an list of strings ascategories
, we need to join the list before injecting categories to the prompt with", ".join(categories)
function. See how you can further customize prompt variables in the documentation. Finally, in the prompt, ask the model not to categorize the news if it doesn’t fit in the provided category list:
news_categorizer = PromptTemplate(
prompt="Given the categories: {', '.join(categories)}; classify the news: {documents}. Only pick a category from the list, otherwise say: no suitable category"
)
- Run the
prompt()
method with thenews_categorizer
template:
prompt_node.prompt(
prompt_template=news_categorizer, documents=news, categories=["sport", "economics", "culture"]
) # Answer: ['economics', 'science', 'culture', 'sport']
Congratulations! You’ve summarized your documents, generated titles for them, and put them into categories, all using custom prompt templates.