ABSTRACT
In this paper, we investigate the highly reliable subgraph problem, which arises in the context of uncertain graphs. This problem attempts to identify all induced subgraphs for which the probability of connectivity being maintained under uncertainty is higher than a given threshold. This problem arises in a wide range of network applications, such as protein-complex discovery, network routing, and social network analysis. Since exact discovery may be computationally intractable, we introduce a novel sampling scheme which enables approximate discovery of highly reliable subgraphs with high probability. Furthermore, we transform the core mining task into a new frequent cohesive set problem in deterministic graphs. Such transformation enables the development of an efficient two-stage approach which combines novel peeling techniques for maximal set discovery with depth-first search for further enumeration. We demonstrate the effectiveness and efficiency of the proposed algorithms on real and synthetic data sets.
- E. Adar and C. Re. Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2):15--22, 2007.Google Scholar
- C. C. Aggarwal, editor. Managing and Mining Uncertain Data. Advances in Database Systems. Springer, 2009. Google ScholarDigital Library
- S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Res, 14(6):1170--1175, June 2004.Google ScholarCross Ref
- J. S. Bader, A. Chaudhuri, J. M. Rothberg, and J. Chant. Gaining confidence in high-throughput protein interaction networks. Nature Biotechnology, 22(1):78--85, December 2003.Google ScholarCross Ref
- M. O. Ball. Computational complexity of network reliability analysis: An overview. IEEE Transactions on Reliability, 35:230--239, 1986.Google ScholarCross Ref
- Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289--300, 1995.Google ScholarCross Ref
- Y. Benjamini and D. Yekutieli. The Control of the False Discovery Rate in Multiple Testing under Dependency. The Annals of Statistics, 29(4):1165--1188, 2001.Google ScholarCross Ref
- C. Chen, X. Yan, F. Zhu, and J. Han. gapprox: Mining frequent approximate patterns from a massive network. In ICDM, pages 445--450, 2007. Google ScholarDigital Library
- H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493--507, 1952.Google ScholarCross Ref
- C. J. Colbourn. The Combinatorics of Network Reliability. Oxford University Press, Inc., 1987. Google ScholarDigital Library
- J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao. On a Routing Problem Within Probabilistic Graphs and its Application to Intermittently Connected Networks. In INFOCOM'07, pages 1721--1729, 2007.Google ScholarDigital Library
- R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In WWW'04, pages 403--412, 2004. Google ScholarDigital Library
- J. Han, H. Cheng, D. Xin, and X. Yan. Frequent Pattern Mining: Current Status and Future Directions. Data Mining and Knowledge Discovery, 14(1), 2007. Google ScholarDigital Library
- J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques, Second Edition (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, 2nd edition, 2006. Google ScholarDigital Library
- S. Hanhijarvi, K. Puolamaki, and G. C. Garriga. Multiple hypothesis testing in pattern discovery, 2009. arXiv:0906.5263v1 {stat.ML}.Google Scholar
- P. Hintsanen. The most reliable subgraph problem. In PKDD, pages 471--478, 2007. Google ScholarDigital Library
- P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Min. Knowl. Discov., 17(1):3--23, 2008. Google ScholarDigital Library
- P. Hintsanen, H. Toivonen, and P. Sevon. Fast discovery of reliable subnetworks. In ASONAM, pages 104--111, 2010. Google ScholarDigital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.Google ScholarCross Ref
- J. Hopcroft and R. Tarjan. Algorithm 447: efficient algorithms for graph manipulation. Commun. ACM, 16:372--378, June 1973. Google ScholarDigital Library
- R. Jiang, Z. Tu, T. Chen, and F. Sun. Network motif identification in stochastic networks. PNAS, 103(25):9404--9409, June 2006.Google ScholarCross Ref
- R. Jin, L. Liu, B. Ding, and H. Wang. Distance-constraint reachability computation in uncertain graphs. In Proceedings of the VLDB Endowment, volume 4, 2011. Google ScholarDigital Library
- R. Jin, S. McCallen, and E. Almaas. Trend motif: A graph mining approach for analysis of dynamic complex networks. In ICDM, pages 541--546, 2007. Google ScholarDigital Library
- B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Phys. Rev. E, 83(1):016107, Jan 2011.Google ScholarCross Ref
- M. Kasari, H. Toivonen, and P. Hintsanen. Fast discovery of reliable k-terminal subgraphs. In PAKDD (2), pages 168--177, 2010. Google ScholarDigital Library
- D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarDigital Library
- A. Kirsch, M. Mitzenmacher, A. Pietracaprina, G. Pucci, E. Upfal,and F. Vandin. An efficient rigorous approach for identifying statistically significant frequent itemsets. In PODS'09, pages 117--126, 2009. Google ScholarDigital Library
- V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Charu C. Aggarwal and Haixun Wang, editors, Managing and Mining Graph Data, pages 303--336. Springer US, 2010.Google ScholarCross Ref
- V. Manfredi, R. Hancock, and J. Kurose. Robust routing in dynamic manets. In Annual Conference of the International Technology Alliance (ACITA), 2008.Google Scholar
- M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. PVLDB, 3(1):997--1008, 2010. Google ScholarDigital Library
- G. Rubino. Network reliability evaluation. In Network performance modeling and simulation, pages 275--302. 1999. Google ScholarDigital Library
- M. Stoer and F. Wagner. A simple min-cut algorithm. J. ACM, 44:585--591, July 1997. Google ScholarDigital Library
- G. Swamynathan, C. Wilson, B. Boe, K. Almeroth, and B. Y. Zhao. Do social networks improve e-commerce?: a study on social marketplaces. In WOSP '08: Proceedings of the first workshop on Online social networks, pages 1--6, 2008. Google ScholarDigital Library
- L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.Google ScholarDigital Library
- D. R. White and F. Harary. The cohesiveness of blocks in social networks: Node connectivity and conditional density. Sociological Methodology, 31:305--359, 2001.Google ScholarCross Ref
- X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD '05, 2005. Google ScholarDigital Library
- D. M. Yellin. An algorithm for dynamic subset and intersection testing. Theoretical Computer Science, 129(2):397--406, 1994. Google ScholarDigital Library
- Y. Yuan, L. Chen, and G. Wang. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In DASFAA, pages 155--170, 2010. Google ScholarDigital Library
- Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633--642, 2010. Google ScholarDigital Library
- Z. Zou, J. Li, H. Gao, and S. Zhang. Finding top-k maximal cliques in an uncertain graph. In ICDE, pages 649--652, 2010.Google ScholarCross Ref
- Z. Zou, J. Li, H. Gao, and S. Zhang. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. on Knowl. and Data Eng., 22(9), 2010. Google ScholarDigital Library
Index Terms
- Discovering highly reliable subgraphs in uncertain graphs
Recommendations
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics
Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over ...
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningFrequent subgraph mining has been extensively studied on certain graph data. However, uncertainties are inherently accompanied with graph data in practice, and there is very few work on mining uncertain graph data. This paper investigates frequent ...
Mining Frequent Subgraph Patterns from Uncertain Graph Data
In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Comments