Text summarization is one of the best examples of AI natural language processing (NLP) being put to practical use.
With vast amounts of information produced every day, the ability to quickly understand, evaluate and act on that information can be extremely valuable, both in the commercial world and in other fields such as scientific research.
Summarization is a task of producing a shorter version of a document while preserving its important information. Fundamentally, it involves extracting text from the original input then generating a new text that describes the essence of the original. In some cases the two parts many be managed by different AI models.
In this blog, we will demonstrate how to run the entire summarization process using BART-Large on Graphcore IPUs.
What is BART and why is it good for text summarization?
When Google launched BERT (Bidirectional Encoder Representations from Transformers) in 2018, it was described as being a model for "language understanding" which was defined as a broad range of applications, including sentiment analysis, text classification and question answering. Summarization was not explicity called-out as a use-case, at that time.
In the same year, Open-AI further advanced the field of Natural Language Understanding, proposing the concept of Generative Pre-Training (GPT).
In late 2019, Facebook AI researchers proposed a combination of bidirectional encoder (like BERT) and left-to-right decoder (like GPT) and gave it a name BART, which stands for Bidirectional and Auto-Regressive Transformers.
According to the original paper, the novelty in pretraining involves a new in-filling scheme when randomly shuffling the order of original sentences. The authors claimed that BART is particularly effective when fine tuned for text generation and for comprehension tasks - both of which are needed for text summarization.
Text summarization on Graphcore IPUs with Hugging Face pipeline
BART is one of the many NLP models supported within Optimum Graphcore, which is an interface between Hugging Face and Graphcore IPUs.
Here we demonstrate a text summarization task running BART-Large inference on Graphcore IPUs.
For each code block below, you can simply click to run the block in Paperspace - making any modifications to code/parameters, where relevant. We explain how to run the process in environments other than Paperspace Gradient Notebooks at the end of this blog.
Install dependencies
Model preparation
We start by preparing the model. First, we define the configuration needed to run the model on the IPU. IPUConfig
is a class that specifies attributes and configuration parameters to compile and put the model on the device:
Next, let's import pipeline from optimum.graphcore and create our summarization pipeline:
We define an input to test the model.
Compilation time for the 1st run: ~ 2:30
Faster fairy tales
The first call to the pipeline was a bit slow, taking several seconds to provide the answer. This behaviour is due to compilation of the model which happens on the first call. On subsequent prompts it is much faster:
Summarization of Wikipedia articles
Now let's use the Wikipedia API to search for some long text that can be summarized:
Summarization of medical health records
The summarization task may be also useful in summarising medical health records (MHR). Let's import an open source dataset with some medical samples.
We focus on the medical report labeled as "text" and from the training dataset select a random patient ID.
Running BART-Large on IPUs in non-Paperspace environments
To run the demo using other IPU hardware, you need to have the Poplar SDK enabled and the relevant PopTorch wheels installed. Refer to the getting started guide for your system for details on how to enable the Poplar SDK and install the PopTorch wheels.