Top

Data Mining and Knowledge Discovery

Published in:

02-09-2023

Network embedding based on high-degree penalty and adaptive negative sampling

Authors: Gang-Feng Ma, Xu-Hua Yang, Wei Ye, Xin-Li Xu, Lei Ye

Published in: Data Mining and Knowledge Discovery | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Network embedding can effectively dig out potentially useful information and discover the relationships and rules which exist in the data, that has attracted increasing attention in many real-world applications. The goal of network embedding is to map high-dimensional and sparse networks into low-dimensional and dense vector representations. In this paper, we propose a network embedding method based on high-degree penalty and adaptive negative sampling (NEPS). First, we analyze the problem of imbalanced node training in random walk and propose an indicator base on high-degree penalty, which can control the random walk and avoid over-sampling high-degree neighbor node. Then, we propose a two-stage adaptive negative sampling strategy, which can dynamically obtain negative samples suitable for the current training according to the training stage to improve training effect. By comparing with seven well-known network embedding algorithms on eight real-world data sets, experiments show that the NEPS has good performance in node classification, network reconstruction and link prediction. The code is available at: https://github.com/Andrewsama/NEPS-master.

previous article Mondrian forest for data stream classification under memory constraints

next article A semi-supervised interactive algorithm for change point detection

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16), pp 265–283

Adhikari B, Zhang Y, Ramakrishnan N, Prakash BA (2018) Sub2vec: Feature learning for subgraphs. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 170–182

Alanis-Lobato G, Mier P, Andrade-Navarro MA (2016) Efficient embedding of complex networks to hyperbolic space via their Laplacian. Sci Rep 6(1):1–10CrossRef

Armandpour M, Ding P, Huang J, Hu X (2019) Robust negative sampling for network embedding. In: Proceedings of the AAAI conference on artificial intelligence 33:3191–3198

Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6):1373–1396CrossRef

Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L, Zhang N et al (2003) Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res 31(9):2443–2450CrossRefPubMedPubMedCentral

Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900

Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 30

Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128

Chen H, Perozzi B, Hu Y, Skiena S (2018) Harp: Hierarchical representation learning for networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32

Cox MA, Cox TF (2008) Multidimensional scaling. In: Handbook of data visualization, Springer, pp 315–347

Dai Q, Li Q, Tang J, Wang D (2018) Adversarial network embedding. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

Donahue J, Simonyan K (2019) Large scale adversarial representation learning. arXiv preprint arXiv:1907.02544

Dong Y, Chawla NV, Swami A (2017) metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 135–144

Feng R, Yang Y, Hu W, Wu F, Zhang Y (2018) Representation learning for scale-free networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

Gao H, Huang H (2018) Self-paced network embedding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1406–1415

Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864

Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st international conference on neural information processing systems, pp 1025–1035

Hou Z, Cen Y, Dong Y, Zhang J, Tang J (2021) Automated unsupervised graph representation learning. IEEE Trans Knowl Data Eng 35:2285–2298

Hu B, Fang Y, Shi C (2019) Adversarial learning on heterogeneous information networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 120–129

Huang X, Li J, Hu X (2017) Label informed attributed network embedding. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 731–739

Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93CrossRef

Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD) 1(1):2–es

Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 891–900

Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A: Stat Mech Appl 390(6):1150–1170CrossRef

Mahoney M (2011) Large text compression benchmark

Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

Narayanan A, Chandramohan M, Chen L, Liu Y, Saminathan S (2016) subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. arXiv preprint arXiv:1606.08928

Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104ADSMathSciNetCrossRef

Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1105–1114

Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710

Perozzi B, Kulkarni V, Chen H, Skiena S (2017) Don’t walk, skip! online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 258–265

Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 385–394

Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326ADSCrossRefPubMed

Rozemberczki B, Allen C, Sarkar R (2021) Multi-scale attributed node embedding. J Complex Netw cnab9(2):014MathSciNet

Shao J (2006) Mathematical statistics: exercises and solutions. Springer Science & Business Media

Shaw B, Jebara T (2009) Structure preserving embedding. In: Proceedings of the 26th annual international conference on machine learning, pp 937–944

Spearman C (1987) The proof and measurement of association between two things. Am J Psychol 100(3/4):441–471CrossRefPubMed

Tang J, Qu M, Mei Q (2015a) Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1165–1174

Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015b) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077

Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323ADSCrossRefPubMed

Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903

Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1225–1234

Wang J, Yu L, Zhang W, Gong Y, Xu Y, Wang B, Zhang P, Zhang D (2017) Irgan: a minimax game for unifying generative and discriminative information retrieval models. In: Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pp 515–524

Wang J, Huang P, Zhao H, Zhang Z, Zhao B, Lee DL (2018) Billion-scale commodity embedding for e-commerce recommendation in alibaba. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 839–848

Wang X, Zhang Y, Shi C (2019) Hyperbolic heterogeneous information network embedding. In: Proceedings of the AAAI conference on artificial intelligence 33:5337–5344

Wang Z, Ye X, Wang C, Cui J, Yu P (2020) Network embedding with completely-imbalanced labels. IEEE Trans Knowl Data Eng 33:3634–3647CrossRef

Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint arXiv:1810.00826

Yang C, Liu Z, Zhao D, Sun M, Chang E (2015) Network representation learning with rich text information. In: Twenty-fourth international joint conference on artificial intelligence

Yang Z, Ding M, Zhou C, Yang H, Zhou J, Tang J (2020) Understanding negative sampling in graph representation learning. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1666–1676

Yin H, Benson AR, Leskovec J, Gleich DF (2017) Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 555–564

Zhang J, Shi X, Xie J, Ma H, King I, Yeung DY (2018) Gaan: Gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint arXiv:1803.07294

Zhang J, Dong Y, Wang Y, Tang J, Ding M (2019) Prone: Fast and scalable network representation learning. IJCAI 19:4278–4284

Zhang W, Chen T, Wang J, Yu Y (2013) Optimizing top-n collaborative filtering via dynamic negative item sampling. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp 785–788

Title: Network embedding based on high-degree penalty and adaptive negative sampling
Authors: Gang-Feng Ma
Xu-Hua Yang
Wei Ye
Xin-Li Xu
Lei Ye
Publication date: 02-09-2023
Publisher: Springer US
Published in: Data Mining and Knowledge Discovery / Issue 2/2024
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-023-00973-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2024

Fast, accurate and explainable time series classification through randomization

The art of centering without centering for robust principal component analysis

Correction to: AA-forecast: anomaly-aware forecast for extreme events

Representing ensembles of networks for fuzzy cluster analysis: a case study

Mondrian forest for data stream classification under memory constraints

Sky-signatures: detecting and characterizing recurrent behavior in sequential data

Premium Partner