NetMF+: Network Embedding Based on Fast and Effective Single-Pass Randomized Matrix Factorization
In this work, we propose NetMF+, a fast, memory-efficient, scalable, and effective network embedding algorithm developed for a single machine with CPU only. NetMF+ is based on the theoretically grounded embedding method NetMF and leverages the theories from randomized matrix factorization to learn embedding efficiently. We firstly propose a fast randomized eigen-decomposition algorithm for the modified Laplacian matrix. Then, sparse-sign randomized single-pass singular value decomposition (SVD) is utilized to avoid constructing dense matrix and generate promising embedding. To enhance the performance of embedding, we apply spectral propagation in NetMF+. Finally, A high-performance parallel graph processing stack GBBS is used to achieve memory-efficiency. Experiment results show that NetMF+ can learn a powerful embedding from a network with more than 10^11 edges within 1.5 hours at lower memory cost than state-of-the-art methods. The result on ClueWeb with 0.9 billion vertices and 75 billion edges shows that NetMF+ saves more than half of the memory and runtime than the state-of-the-art and has better performance. The source code of NetMF+ will be publicly available after the anonymous peer review.
READ FULL TEXT