Skip to main content
Erschienen in: World Wide Web 3/2020

29.02.2020

A framework for image dark data assessment

verfasst von: Ke Zhou, Yangtao Wang, Yu Liu, Yujuan Yang, Yifei Liu, Guoliang Li, Lianli Gao, Zhili Xiao

Erschienen in: World Wide Web | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Image dark data, whose content and value are not clear, consistently occupy the storage space but hardly produce great value. Blindly applying data mining techniques on these data is highly likely to bring disappointed result and waste large resource. Therefore, it is of great significance to assess the dark data before data mining to help the user cognize the data. However, there are several challenges in dark data assessment work. First, the similarity between images must be objectively measured under aunified standard to help the user understand the evaluation values of dark data. Second, it is important to capture semantic features with generalization ability. Third, it is challenging to design an efficient assessment scheme to support large-scale datasets. To overcome these challenges, we propose an assessment framework which includes offline calculation and online assessment. In offline calculation, we first transform unlabeled images into hash codes by our developed Deep Self-taught Hashing (DSTH) algorithm which can extract semantic features with generalization ability, then construct a semantic graph using restricted Hamming distance, and finally use our designed Semantic Hash Ranking (SHR) algorithm to calculate the overall importance score (rank) for each node (image), which takes both the number of connected links and the weight on edges into consideration. During online assessment, we first translate the user’s query (semantic images) into hash codes using DSTH model, then match the data contained in the dark data via a predefined Hamming distance query range, and finally return the weighted average value of these matched data to help the user cognize the dark data. The results on real-world dataset show our framework can apply to large-scale datasets, help users evaluate the dark data by different requirements, and assist the user to conduct subsequent data mining work.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cafarella, M.J., Ilyas, I.F., Kornacker, M., Kraska, T., Ré, C.: Dark data: are we solving the right problems? In: ICDE, pp. 1444–1445 (2016) Cafarella, M.J., Ilyas, I.F., Kornacker, M., Kraska, T., Ré, C.: Dark data: are we solving the right problems? In: ICDE, pp. 1444–1445 (2016)
2.
Zurück zum Zitat Cai, H.Y., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. In: ICDE, pp. 1538–1539 (2016) Cai, H.Y., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. In: ICDE, pp. 1538–1539 (2016)
3.
Zurück zum Zitat Cao, Y., Long, M., Liu, B., Wang, J.: Deep cauchy hashing for hamming space retrieval. In: CVPR, pp. 1229–1237 (2018) Cao, Y., Long, M., Liu, B., Wang, J.: Deep cauchy hashing for hamming space retrieval. In: CVPR, pp. 1229–1237 (2018)
4.
Zurück zum Zitat Gao, S., Cheng, X., Wang, H., Chia, L.-T.: Concept model-based unsupervised Web image re-ranking. In: ICIP, pp. 793–796 (2009) Gao, S., Cheng, X., Wang, H., Chia, L.-T.: Concept model-based unsupervised Web image re-ranking. In: ICIP, pp. 793–796 (2009)
5.
Zurück zum Zitat Ge, S.S., Zhang, Z., He, H.: Weighted graph model based sentence clustering and ranking for document summarization. In: ICIS, pp. 90–95 (2011) Ge, S.S., Zhang, Z., He, H.: Weighted graph model based sentence clustering and ranking for document summarization. In: ICIS, pp. 90–95 (2011)
6.
Zurück zum Zitat Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends. 57(2), 280–299 (2018)CrossRef Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends. 57(2), 280–299 (2018)CrossRef
7.
Zurück zum Zitat Heidorn, P.B., Stahlman, G.R., Steffen, J.: Astrolabe: curating, linking and computing Astronomy’s dark data. CoRR. abs/1802.03629 (2018) Heidorn, P.B., Stahlman, G.R., Steffen, J.: Astrolabe: curating, linking and computing Astronomy’s dark data. CoRR. abs/1802.03629 (2018)
8.
Zurück zum Zitat Hu, M., Yang, Y., Shen, F., Xie, N., Shen, H.T.: Hashing with angular reconstructive Embeddings. IEEE Trans. Image Processing. 27(2), 545–555 (2018)MathSciNetCrossRef Hu, M., Yang, Y., Shen, F., Xie, N., Shen, H.T.: Hashing with angular reconstructive Embeddings. IEEE Trans. Image Processing. 27(2), 545–555 (2018)MathSciNetCrossRef
9.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
10.
Zurück zum Zitat Keane, N., Yee, C., Liang, Z.: Using topic modeling and similarity thresholds to detect events. In: EVENTS@HLP-NAACL, pp. 34–42 (2015) Keane, N., Yee, C., Liang, Z.: Using topic modeling and similarity thresholds to detect events. In: EVENTS@HLP-NAACL, pp. 34–42 (2015)
11.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
12.
Zurück zum Zitat Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR, pp. 3270–3278 (2015) Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR, pp. 3270–3278 (2015)
13.
Zurück zum Zitat Li, J., Wu, Y., Zhao, J., Lu, K.: Low-rank discriminant embedding for multiview learning. IEEE Trans. Cybernetics. 47(11), 3516–3529 (2017)CrossRef Li, J., Wu, Y., Zhao, J., Lu, K.: Low-rank discriminant embedding for multiview learning. IEEE Trans. Cybernetics. 47(11), 3516–3529 (2017)CrossRef
14.
Zurück zum Zitat Li, J., Lu, K., Huang, Z., Zhu, L., Shen, H.T.: Transfer independently together: a generalized framework for domain adaptation. IEEE Trans. Cybernetics. 49(6), 2144–2155 (2019)CrossRef Li, J., Lu, K., Huang, Z., Zhu, L., Shen, H.T.: Transfer independently together: a generalized framework for domain adaptation. IEEE Trans. Cybernetics. 49(6), 2144–2155 (2019)CrossRef
15.
Zurück zum Zitat Lin, K., Lu, J., Chen, C.-S., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: CVPR, pp. 1183–1192 (2016) Lin, K., Lu, J., Chen, C.-S., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: CVPR, pp. 1183–1192 (2016)
16.
Zurück zum Zitat Liu, H., Shao, M., Li, S., Yun, F.: Infinite ensemble for image clustering. In: SIGKDD, pp. 1745–1754 (2016) Liu, H., Shao, M., Li, S., Yun, F.: Infinite ensemble for image clustering. In: SIGKDD, pp. 1745–1754 (2016)
17.
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot MultiBox detector. In: ECCV, pp. 21–37 (2016) Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot MultiBox detector. In: ECCV, pp. 21–37 (2016)
18.
Zurück zum Zitat Liu, Y., Song, J., Zhou, K., Yan, L., Liu, L., Zou, F., Shao, L.: Deep self-taught hashing for image retrieval. IEEE Trans. Cybernetics. 49(6), 2229–2241 (2019)CrossRef Liu, Y., Song, J., Zhou, K., Yan, L., Liu, L., Zou, F., Shao, L.: Deep self-taught hashing for image retrieval. IEEE Trans. Cybernetics. 49(6), 2229–2241 (2019)CrossRef
19.
Zurück zum Zitat Luo, Y., Yang, Y., Shen, F., Huang, Z., Zhou, P., Shen, H.T.: Robust discrete code modeling for supervised hashing. Pattern Recogn. 75, 128–135 (2018)CrossRef Luo, Y., Yang, Y., Shen, F., Huang, Z., Zhou, P., Shen, H.T.: Robust discrete code modeling for supervised hashing. Pattern Recogn. 75, 128–135 (2018)CrossRef
20.
Zurück zum Zitat Mehmood, R., Zhang, G., Bie, R., Dawood, H., Ahmad, H.: Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing. 208, 210–217 (2016)CrossRef Mehmood, R., Zhang, G., Bie, R., Dawood, H., Ahmad, H.: Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing. 208, 210–217 (2016)CrossRef
21.
Zurück zum Zitat Michaelis, S., Piatkowski, N., Stolpe, M.: Solving Large Scale Learning Tasks. Challenges and Algorithms - Essays Dedicated to Katharina Morik on the Occasion of her 60th Birthday. Lecture Notes in Computer Science, vol. 9580, (2016) Michaelis, S., Piatkowski, N., Stolpe, M.: Solving Large Scale Learning Tasks. Challenges and Algorithms - Essays Dedicated to Katharina Morik on the Occasion of her 60th Birthday. Lecture Notes in Computer Science, vol. 9580, (2016)
22.
Zurück zum Zitat Mihalcea, R. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In ACL, (2004). Mihalcea, R. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In ACL, (2004).
23.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab (1999)
24.
Zurück zum Zitat Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: MIR, pp. 63–72 (2010) Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: MIR, pp. 63–72 (2010)
25.
Zurück zum Zitat Shen, F., Liu, W., Zhang, S., Yang, Y., Shen, H.T.: Learning binary codes for maximum inner product search. In: ICCV, pp. 4148–4156 (2015) Shen, F., Liu, W., Zhang, S., Yang, Y., Shen, H.T.: Learning binary codes for maximum inner product search. In: ICCV, pp. 4148–4156 (2015)
26.
Zurück zum Zitat Shen, F., Shen, C., Liu, W., Shen, H.T.: Supervised discrete hashing. In: CVPR, pp. 37–45 (2015) Shen, F., Shen, C., Liu, W., Shen, H.T.: Supervised discrete hashing. In: CVPR, pp. 37–45 (2015)
27.
Zurück zum Zitat Shen, F., Shen, C., Shi, Q., van den Hengel, A., Tang, Z., Shen, H.T.: Hashing on nonlinear manifolds. IEEE Trans. Image Processing. 24(6), 1839–1851 (2015)MathSciNetCrossRef Shen, F., Shen, C., Shi, Q., van den Hengel, A., Tang, Z., Shen, H.T.: Hashing on nonlinear manifolds. IEEE Trans. Image Processing. 24(6), 1839–1851 (2015)MathSciNetCrossRef
28.
Zurück zum Zitat Shen, F., Xu, Y., Liu, L., Yang, Y., Huang, Z., Shen, H.T.: Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3034–3044 (2018)CrossRef Shen, F., Xu, Y., Liu, L., Yang, Y., Huang, Z., Shen, H.T.: Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3034–3044 (2018)CrossRef
29.
Zurück zum Zitat Shukla, M., Manjunath, S., Saxena, R., Mondal, S., Lodha, S.: POSTER: WinOver enterprise dark data. In: SIGSAC, pp. 1674–1676 (2015) Shukla, M., Manjunath, S., Saxena, R., Mondal, S., Lodha, S.: POSTER: WinOver enterprise dark data. In: SIGSAC, pp. 1674–1676 (2015)
30.
Zurück zum Zitat Song, J., He, T., Gao, L., Xu, X., Shen, H.T.: Deep region hashing for efficient large-scale instance search from images. arXiv preprint arXiv:1701.07901 (2017) Song, J., He, T., Gao, L., Xu, X., Shen, H.T.: Deep region hashing for efficient large-scale instance search from images. arXiv preprint arXiv:1701.07901 (2017)
31.
Zurück zum Zitat Song, J., Gao, L., Liu, L., Zhu, X., Sebe, N.: Quantization-based hashing: a general framework for scalable image and video retrieval. PR. 75, 175–187 (2018) Song, J., Gao, L., Liu, L., Zhu, X., Sebe, N.: Quantization-based hashing: a general framework for scalable image and video retrieval. PR. 75, 175–187 (2018)
32.
Zurück zum Zitat Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: MM, pp. 154–162 (2017) Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: MM, pp. 154–162 (2017)
33.
Zurück zum Zitat Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Processing. 26(5), 2494–2507 (2017)MathSciNetCrossRef Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Processing. 26(5), 2494–2507 (2017)MathSciNetCrossRef
34.
Zurück zum Zitat Yang, Y., Ma, Z., Yang, Y., Nie, F., Shen, H.T.: Multitask spectral clustering by exploring Intertask correlation. IEEE Trans. Cybernetics. 45(5), 1069–1080 (2015)CrossRef Yang, Y., Ma, Z., Yang, Y., Nie, F., Shen, H.T.: Multitask spectral clustering by exploring Intertask correlation. IEEE Trans. Cybernetics. 45(5), 1069–1080 (2015)CrossRef
35.
Zurück zum Zitat Yang, Y., Luo, Y., Chen, W., Shen, F., Shao, J., Shen, H.T.: Zero-shot hashing via transferring supervised knowledge. In: MM, pp. 1286–1295 (2016) Yang, Y., Luo, Y., Chen, W., Shen, F., Shao, J., Shen, H.T.: Zero-shot hashing via transferring supervised knowledge. In: MM, pp. 1286–1295 (2016)
36.
Zurück zum Zitat Yang, E., Liu, T., Cheng, D., Liu, W., Tao, D.: DistillHash: unsupervised deep hashing by distilling data pairs. In: CVPR, pp. 2946–2955 (2019) Yang, E., Liu, T., Cheng, D., Liu, W., Tao, D.: DistillHash: unsupervised deep hashing by distilling data pairs. In: CVPR, pp. 2946–2955 (2019)
37.
Zurück zum Zitat Yu, L., Li, W., Lu, Z., Zhao, M.: Alternating pointwise-pairwise learning for personalized item ranking. In: CIKM, pp. 2155–2158 (2017) Yu, L., Li, W., Lu, Z., Zhao, M.: Alternating pointwise-pairwise learning for personalized item ranking. In: CIKM, pp. 2155–2158 (2017)
38.
Zurück zum Zitat Yu, L., Wang, Y., Zhou, K., Yang, Y., Liu, Y., Song, J., Xiao, Z.: A framework for image dark data assessment. In: APWeb-WAIM, pp. 3–18 (2019) Yu, L., Wang, Y., Zhou, K., Yang, Y., Liu, Y., Song, J., Xiao, Z.: A framework for image dark data assessment. In: APWeb-WAIM, pp. 3–18 (2019)
39.
Zurück zum Zitat Yu, L., Wang, Y., Zhou, K., Yang, Y., Liu, Y.: Semantic-aware data quality assessment for image big data. Futur. Gener. Comput. Syst. 102, 53–65 (2020)CrossRef Yu, L., Wang, Y., Zhou, K., Yang, Y., Liu, Y.: Semantic-aware data quality assessment for image big data. Futur. Gener. Comput. Syst. 102, 53–65 (2020)CrossRef
40.
Zurück zum Zitat Zhang, D., Wang, J., Deng, C., Jinsong, L.: Self-taught hashing for fast similarity search. In: SIGIR, pp. 18–25 (2010) Zhang, D., Wang, J., Deng, C., Jinsong, L.: Self-taught hashing for fast similarity search. In: SIGIR, pp. 18–25 (2010)
41.
Zurück zum Zitat Zhang, C., Govindaraju, V., Borchardt, J., Foltz, T., Ré, C., Peters, S.: GeoDeepDive: statistical inference using familiar data-processing languages. In: SIGMOD, pp. 993–996 (2013) Zhang, C., Govindaraju, V., Borchardt, J., Foltz, T., Ré, C., Peters, S.: GeoDeepDive: statistical inference using familiar data-processing languages. In: SIGMOD, pp. 993–996 (2013)
42.
Zurück zum Zitat Zhang, C., Shin, J., Ré, C., Cafarella, M.J., Niu, F.: Extracting databases from dark data with DeepDive. In: SIGMOD, pp. 847–859 (2016) Zhang, C., Shin, J., Ré, C., Cafarella, M.J., Niu, F.: Extracting databases from dark data with DeepDive. In: SIGMOD, pp. 847–859 (2016)
43.
Zurück zum Zitat Zhang, H., Liu, L., Yang, L., Shao, L.: Unsupervised deep hashing with Pseudo labels for scalable image retrieval. IEEE Trans. Image Processing. 27(4), 1626–1638 (2018)MathSciNetCrossRef Zhang, H., Liu, L., Yang, L., Shao, L.: Unsupervised deep hashing with Pseudo labels for scalable image retrieval. IEEE Trans. Image Processing. 27(4), 1626–1638 (2018)MathSciNetCrossRef
44.
Zurück zum Zitat Zhou, K., Yu, L., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: MM, pp. 1215–1218 (2015) Zhou, K., Yu, L., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: MM, pp. 1215–1218 (2015)
45.
Zurück zum Zitat Zhu, L., Shen, J., Liang, X., Cheng, Z.: Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowl. Data Eng. 29(2), 472–486 (2017)CrossRef Zhu, L., Shen, J., Liang, X., Cheng, Z.: Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowl. Data Eng. 29(2), 472–486 (2017)CrossRef
Metadaten
Titel
A framework for image dark data assessment
verfasst von
Ke Zhou
Yangtao Wang
Yu Liu
Yujuan Yang
Yifei Liu
Guoliang Li
Lianli Gao
Zhili Xiao
Publikationsdatum
29.02.2020
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 3/2020
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-020-00779-x

Weitere Artikel der Ausgabe 3/2020

World Wide Web 3/2020 Zur Ausgabe

Premium Partner