Speeding-up the Verification Phase of Set Similarity Joins in the GPGPU paradigm
We investigate the problem of exact set similarity joins using a co-process CPU-GPU scheme. The state-of-the-art CPU solutions split the wok in two main phases. First, filtering and index building takes place to reduce the candidate sets to be compared as much as possible; then the pairs are compared to verify whether they should become part of the result. We investigate in-depth solutions for transferring the second, so-called verification phase, to the GPU addressing several challenges regarding the data serialization and layout, the thread management and the techniques to compare sets of tokens. Using real datasets, we provide concrete experimental proofs that our solutions have reached their maximum potential, since they totally overlap verification with CPU tasks, and manage to yield significant speed-ups, up to 2.6X in our cases.
READ FULL TEXT