Modern applications such as Internet traffic, telecommunication records, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multi-way arrays) provide a natural representation for such data. Consequently, tensor decompositions such as Tucker become important tools for summarization and analysis. One major challenge is how to deal with highdimensional, sparse data. In other words, how do we compute decompositions of tensors where most of the entries of the tensor are zero. Specialized techniques are needed for computing the Tucker decompositions for sparse tensors because standard algorithms do not account for the sparsity of the data. As a result, a surprising phenomenon is observed by practitioners: Despite the fact that there is enough memory to store both the input tensors and the factorized output tensors, memory overflows occur during the tensor factorization process. To address this intermediate blowup problem, we propose Memory-Efficient Tucker (MET). Based on the available memory, MET adaptively selects the right execution strategy during the decomposition. We provide quantitative and qualitative evaluation of MET on real tensors. It achieves over 1000X space reduction without sacrificing speed; it also allows us to work with much larger tensors that were too big to handle before. Finally, we demonstrate a data mining case-study using MET.
data mining, tensor decomposition, Tucker decomposition, sparse data
Best Theoretical/Algorithms Paper Award, IEEE International Conference on Data Mining (ICDM), Pisa, Italy, Dec. 2008
@inproceedings{KoSu08,
author = {Tamara G. Kolda and Jimeng Sun},
title = {Scalable Tensor Decompositions for Multi-aspect Data Mining},
booktitle = {ICDM 2008: Proceedings of the 8th IEEE International Conference on Data Mining},
venue = {Pisa, Italy},
eventdate = {2008-12-15/2008-12-19},
pages = {363--372},
year = {2008},
doi = {10.1109/ICDM.2008.89},
}