Graphcore’s Intelligence Processor (IPU) was created to accelerate artificial intelligence.
In recent years - such is the power and versatility of the IPU’s architecture - we have seen it applied to a range of applications outside the category of AI – workloads that would traditionally have been the preserve of High Performance Computing (HPC).
Researchers putting the IPU to use in these ways have consistently reported performance far in excess of that possible with state-of-the-art GPUs and CPUs.
The latest example comes from a multi-national team of scientists from Berkeley, Cornell, Simula in Norway, and German’s Charité Universitätsmedizin, working on computational biology. Their findings are reported in the paper Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU (PDF download), to be presented at Supercomputing 2023.
For the computationally challenging task of DNA and protein sequence alignment, the IPU delivered a 10X speedup over leading GPUs and 4.65X acceleration compared to CPUs.
These impressive results led the paper’s authors to conclude that the IPU has “significant potential for accelerating irregular computations where low-level parallelism is difficult to exploit on highly instruction-parallel architectures such as GPUs.”
An irregular problem
Comparing and aligning genomic sequences is important for a wide range of reasons. While sequencing technology has progressed hugely in recent times, we still cannot sequence entire genomes in one go. Consequently, smaller sections need to be aligned then re-combined, like stitching together a panoramic photograph.
Sequence alignment is also used in applications such as protein structure prediction and searching for similar sequences within a database.
The X-Drop algorithm is a popular method of performing sequence alignment – usually run on CPUs, although work has recently been done to implement it on GPUs.
One of the characteristics of X-Drop is its irregular computation pattern which doesn’t lend itself to acceleration by GPUs.
The researchers set out to identify a processor that combined the CPU’s ability to manage non-uniform data access with the higher throughput of a GPU. Graphcore’s IPU, with its Multiple Instruction Multiple Data (MIMD) architecture and high-speed, on-chip SRAM was found to fit the bill.
Results
Using a number of optimisations to take advantage of the IPUs 1,472 parallel processing tiles and six threads per tile, the research group applied X-Drop running on IPUs to sequences from the model organisms E. coli and C. elegans.
They achieved 10-times faster performance compared to an nvidia A100 GPU and 4.65-times faster than a central processing unit (CPU) on a supercomputer.
Reflecting on the siginficance of the IPU's large on-chip SRAM, Dr Giulia Guidi - study lead and assistant professor of computer science in the Cornell Ann S. Bowers College of Computing and Information Science at Cornell University - said ”You can exploit the IPU high memory bandwidth, which allows you to make the whole processing faster."
The researchers also noted the ability to efficiently scale-up their IPU system, demonstrating “near-linear strong scaling properties on common IPU host configurations”.
Read the full technical results in Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU (PDF download).