poster

Discovering highly reliable subgraphs in uncertain graphs

Authors:
Ruoming Jin

Kent State University, Kent, OH, USA

Kent State University, Kent, OH, USA
View Profile

,
Lin Liu

Kent State University, Kent, OH, USA

Kent State University, Kent, OH, USA
View Profile

,
Charu C. Aggarwal

IBM T. J. Watson Research Center, Hawthorne, NY, USA

IBM T. J. Watson Research Center, Hawthorne, NY, USA
View Profile

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2011Pages 992–1000https://doi.org/10.1145/2020408.2020569

Published:21 August 2011Publication History

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 992–1000

ABSTRACT

In this paper, we investigate the highly reliable subgraph problem, which arises in the context of uncertain graphs. This problem attempts to identify all induced subgraphs for which the probability of connectivity being maintained under uncertainty is higher than a given threshold. This problem arises in a wide range of network applications, such as protein-complex discovery, network routing, and social network analysis. Since exact discovery may be computationally intractable, we introduce a novel sampling scheme which enables approximate discovery of highly reliable subgraphs with high probability. Furthermore, we transform the core mining task into a new frequent cohesive set problem in deterministic graphs. Such transformation enables the development of an efficient two-stage approach which combines novel peeling techniques for maximal set discovery with depth-first search for further enumeration. We demonstrate the effectiveness and efficiency of the proposed algorithms on real and synthetic data sets.

References

E. Adar and C. Re. Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2):15--22, 2007.Google Scholar
C. C. Aggarwal, editor. Managing and Mining Uncertain Data. Advances in Database Systems. Springer, 2009. Google ScholarDigital Library
S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Res, 14(6):1170--1175, June 2004.Google ScholarCross Ref
J. S. Bader, A. Chaudhuri, J. M. Rothberg, and J. Chant. Gaining confidence in high-throughput protein interaction networks. Nature Biotechnology, 22(1):78--85, December 2003.Google ScholarCross Ref
M. O. Ball. Computational complexity of network reliability analysis: An overview. IEEE Transactions on Reliability, 35:230--239, 1986.Google ScholarCross Ref
Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289--300, 1995.Google ScholarCross Ref
Y. Benjamini and D. Yekutieli. The Control of the False Discovery Rate in Multiple Testing under Dependency. The Annals of Statistics, 29(4):1165--1188, 2001.Google ScholarCross Ref
C. Chen, X. Yan, F. Zhu, and J. Han. gapprox: Mining frequent approximate patterns from a massive network. In ICDM, pages 445--450, 2007. Google ScholarDigital Library
H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493--507, 1952.Google ScholarCross Ref
C. J. Colbourn. The Combinatorics of Network Reliability. Oxford University Press, Inc., 1987. Google ScholarDigital Library
J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao. On a Routing Problem Within Probabilistic Graphs and its Application to Intermittently Connected Networks. In INFOCOM'07, pages 1721--1729, 2007.Google ScholarDigital Library
R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In WWW'04, pages 403--412, 2004. Google ScholarDigital Library
J. Han, H. Cheng, D. Xin, and X. Yan. Frequent Pattern Mining: Current Status and Future Directions. Data Mining and Knowledge Discovery, 14(1), 2007. Google ScholarDigital Library
J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques, Second Edition (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, 2nd edition, 2006. Google ScholarDigital Library
S. Hanhijarvi, K. Puolamaki, and G. C. Garriga. Multiple hypothesis testing in pattern discovery, 2009. arXiv:0906.5263v1 {stat.ML}.Google Scholar
P. Hintsanen. The most reliable subgraph problem. In PKDD, pages 471--478, 2007. Google ScholarDigital Library
P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Min. Knowl. Discov., 17(1):3--23, 2008. Google ScholarDigital Library
P. Hintsanen, H. Toivonen, and P. Sevon. Fast discovery of reliable subnetworks. In ASONAM, pages 104--111, 2010. Google ScholarDigital Library
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.Google ScholarCross Ref
J. Hopcroft and R. Tarjan. Algorithm 447: efficient algorithms for graph manipulation. Commun. ACM, 16:372--378, June 1973. Google ScholarDigital Library
R. Jiang, Z. Tu, T. Chen, and F. Sun. Network motif identification in stochastic networks. PNAS, 103(25):9404--9409, June 2006.Google ScholarCross Ref
R. Jin, L. Liu, B. Ding, and H. Wang. Distance-constraint reachability computation in uncertain graphs. In Proceedings of the VLDB Endowment, volume 4, 2011. Google ScholarDigital Library
R. Jin, S. McCallen, and E. Almaas. Trend motif: A graph mining approach for analysis of dynamic complex networks. In ICDM, pages 541--546, 2007. Google ScholarDigital Library
B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Phys. Rev. E, 83(1):016107, Jan 2011.Google ScholarCross Ref
M. Kasari, H. Toivonen, and P. Hintsanen. Fast discovery of reliable k-terminal subgraphs. In PAKDD (2), pages 168--177, 2010. Google ScholarDigital Library
D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarDigital Library
A. Kirsch, M. Mitzenmacher, A. Pietracaprina, G. Pucci, E. Upfal,and F. Vandin. An efficient rigorous approach for identifying statistically significant frequent itemsets. In PODS'09, pages 117--126, 2009. Google ScholarDigital Library
V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Charu C. Aggarwal and Haixun Wang, editors, Managing and Mining Graph Data, pages 303--336. Springer US, 2010.Google ScholarCross Ref
V. Manfredi, R. Hancock, and J. Kurose. Robust routing in dynamic manets. In Annual Conference of the International Technology Alliance (ACITA), 2008.Google Scholar
M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. PVLDB, 3(1):997--1008, 2010. Google ScholarDigital Library
G. Rubino. Network reliability evaluation. In Network performance modeling and simulation, pages 275--302. 1999. Google ScholarDigital Library
M. Stoer and F. Wagner. A simple min-cut algorithm. J. ACM, 44:585--591, July 1997. Google ScholarDigital Library
G. Swamynathan, C. Wilson, B. Boe, K. Almeroth, and B. Y. Zhao. Do social networks improve e-commerce?: a study on social marketplaces. In WOSP '08: Proceedings of the first workshop on Online social networks, pages 1--6, 2008. Google ScholarDigital Library
L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.Google ScholarDigital Library
D. R. White and F. Harary. The cohesiveness of blocks in social networks: Node connectivity and conditional density. Sociological Methodology, 31:305--359, 2001.Google ScholarCross Ref
X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD '05, 2005. Google ScholarDigital Library
D. M. Yellin. An algorithm for dynamic subset and intersection testing. Theoretical Computer Science, 129(2):397--406, 1994. Google ScholarDigital Library
Y. Yuan, L. Chen, and G. Wang. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In DASFAA, pages 155--170, 2010. Google ScholarDigital Library
Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633--642, 2010. Google ScholarDigital Library
Z. Zou, J. Li, H. Gao, and S. Zhang. Finding top-k maximal cliques in an uncertain graph. In ICDE, pages 649--652, 2010.Google ScholarCross Ref
Z. Zou, J. Li, H. Gao, and S. Zhang. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. on Knowl. and Data Eng., 22(9), 2010. Google ScholarDigital Library

Index Terms

Discovering highly reliable subgraphs in uncertain graphs
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over ...
Read More
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainties are inherently accompanied with graph data in practice, and there is very few work on mining uncertain graph data. This paper investigates frequent ...
Read More
Mining Frequent Subgraph Patterns from Uncertain Graph Data

In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
frequent cohesive set
reliable subgraph
uncertain graph
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 777
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering highly reliable subgraphs in uncertain graphs

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics

Mining Frequent Subgraph Patterns from Uncertain Graph Data