Big Data Indexing: Taxonomy, Performance Evaluation, Challenges and Research Opportunities
(1) Federal University Lafia
(2) Abubakar Tafawa Balewa University
(3) Gombe State University
(4) Abubakar Tafawa Balewa University
(5) Abubakar Tafawa Balewa University
(6) Industry and Innovation Institute, Sheffield Hallam University
Abstract
In order to efficiently retrieve information from highly huge and complicated datasets with dispersed storage in cloud computing, indexing methods are continually used on big data. Big data has grown quickly due to the accessibility of internet connection, mobile devices like smartphones and tablets, body-sensor devices, and cloud applications. Big data indexing has a variety of problems as a result of the expansion of big data, which is seen in the healthcare industry, manufacturing, sciences, commerce, social networks, and agriculture. Due to their high storage and processing requirements, current indexing approaches fall short of meeting the needs of large data in cloud computing. To fulfil the indexing requirements for large data, an effective index strategy is necessary. This paper presents the state-of-the-art indexing techniques for big data currently being proposed, identifies the problems these techniques and big data are currently facing, and outlines some future directions for research on big data indexing in cloud computing. It also compares the performance taxonomy of these techniques based on mean average precision and precision-recall rate.
Keywords
Full Text:
PDFReferences
Thilkanathan, Danan, S. C., Surya, N., Rafael, C., & Leila, A. (2014). A platform for monitoring and sharing of generic health data in the cloud. Future generation computer system, 35, 102-113. Retrieved April 9, 2017. https://doi.org/10.1016/j.future.2013.09.011
Huang, Z., Heng, T. S., & Shao, J. (2010). Bounded Coordinate System Indexing for Real-time. ACM Transactions on Information Systems, 10(10), 1-32. https://doi.org/10.1145/1508850.1508855
Gartner M, Rauber, A., & Berger, H. (2013). Briging structured and unstructured data via hybrid semantic search and interactive ontology-enhanced query formulation. Knowledge information system, 1-32. https://doi.org/10.1007/s10115-013-0678-y
Armbrust, M., Fox , A., Griffith, R., Joseph, A. D., Katz, H. R., & Ko, A. (2009, 2). Above the clouds: Berkeley view of cloud computing, Technical Report UCB/EECS.Materials Genome initiative for Global Competitiveness. https://inst.eecs.berkeley.edu/~cs10/fa10/lec/20/2010-11-10-CS10-L20-AF-Cloud-Computing.pdf
Agrawal, D., Bernstein, P., Bertino, E., Davidson, S.,& Dayal, U. (2012). Challenges and Opportunities with Big Data. A white paper prepared for the Computing Community Consortium, 1-16. https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper.pdf
Chang, c., Kayed, M., Girgis, M. R., & Shaalam, K. F. (2006). A survey of web information extraction system. IEEE Transaction on Knowledge and Data Engineering, 18(10), 1411-1428. https://doi.org/10.1109/TKDE.2006.152
Agrawa, C. C., & Wang, H. (2010). Managing and mining graph data. Springer publishing company. https://doi.org/10.1007/978-1-4419-6045-0
Hosagrahar, V. J., Beng, C., Kian-Lee, T., Cui, Y., & Rui, Z. (2005). iDistance: An adaptive B+-tree based indexing method for nearest neighbour search. ACM Transaction on Database Systems, 30(2), 364-397. https://doi.org/10.1145/1071610.1071612
Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbours with automatic algorithm configuration. VISAPP, 331-340.
Jon, L. B. (1975). Multidimensional Binary Search Trees Used for Associative Searching. ACM, 18(9), 509-517. https://doi.org/10.1145/361002.361007
Chanop, S.-A., & Richard, H. (2008). Optimised KD-trees for fast image descriptor matching. IEEE Conference on Computer Vision and Pattern Recognition, 1-8. https://doi.org/10.1109/CVPR.2008.4587638
Shamshirband, S., Anuar, N., Kiah, M., & Patel, A. (2013). An appraisal and design of a multi-agent system based cooperative wireless intrusion detection computational intelligence technique. Engineering Application of Artificial Intelligent. , 26(8), 2105-2127. https://doi.org/10.1016/j.engappai.2013.04.010
Aguilera M, K., Golab, W., & Shah, M. A. (2008). A practical scalable distributed b-tree. Proceedings of the VLDB Endowment, 1(1), 598–609. https://doi.org/10.14778/1453856.1453922
Jaluta, I. (2014, April). Transaction management in b-tree-indexed database systems. In Information Science, Electronics and Electrical Engineering. International conference on, 3, 1968-1975. https://doi.org/10.1109/InfoSEEE.2014.6946267
Frome, A., Singer, Y., Sha, F., & Malik, J. (2008, June). Learning globally-consistent local. ICCV. https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/shape/papers/FromeSingerShaMalikICCV07.pdf
Friedman, J., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transaction on Mathematical Software, 3(3), 209-226. https://doi.org/10.1145/355744.355745
Kai, Z. Y., Nicholas, J. Y., & Shuo, S. (2013). Discovering of gathering patterns from trajectories. ICDE. https://doi.org/10.1109/ICDE.2013.6544829
Hoyoung, J., Man, L. Y., Xiaofang, Z., Christian, S. J.,& Heng, T. S. (2008). Discovery of convoys in trajectory databases. VLDBJ. https://doi.org/10.14778/1453856.1453971
Yu, Y., Zhu, Y., Ng, W., & Samsudin, J. (2014, 12). An efficient multidimension metadata index and search system for cloud data,” in Cloud Computing technology and science. IEEE Transaction on, 499-504. https://doi.org/10.1109/CloudCom.2014.88
Dieter, P., Christian, S. J., & Yanmis, T. (2000). Novel approaches to the indexing of moving objects trajectories. VDLB. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.3875&rep=rep1&type=pdf
Lei, C., Tammer, M. O., & Vincent, O. (2005). Robust and fast similarity search for moving object trajectories. ICDE. https://doi.org/10.1145/1066157.1066213
Michail, V., George, K., & Dimitrios. (2002). Discovering similar multidimensional trajectories. ICDE. https://doi.org/10.1109/ICDE.2002.994784
Prateek Jain, B. K. (2008, 6). Fast image search for learned metrics. In proceeding of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2008.4587841
Xu, H., Wang, J., Li, Z., Zeng, G., Li, S., & Yu, N. (2011). Complementary hashing for approximate nearest neighbor search. In Proc. ICCV. https://doi.org/10.1109/ICCV.2011.6126424
Kulis, B., Jain, P., & Grauman, K. (2009). Fast similarity search for learned metrics. TPAMI, 31(12), 2143–2157. https://doi.org/10.1109/TPAMI.2009.151
Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: a large dataset for non-parametric object and scene recognition. TPAMI, 30(11), 1958–1970,. https://doi.org/10.1109/TPAMI.2008.128
Strecha, C, A. M., M, M. B., & P, F. (2012). Ldahash: Improved matching with smaller descriptors. TPAMI, 34(1), 66-76. https://doi.org/10.1109/TPAMI.2011.103
Zhu, X., Huang, Z., Cheng, H., Cui, J., & Shen, H. T. (2013). Sparse hashing for fast multimedia search. ACM Transaction on information system, 3(2), 1-24. https://doi.org/10.1145/2457465.2457469
Avidan, S., & Korman, S. (2011). Coherency sensitive hashing. In Proceedings of ICCV. https://doi.org/10.1109/ICCV.2011.6126421
Datar, M., Immorlica, N., Indyk, P., & Mirrokni, P. (2004). Locality sensitive hashing scheme based on p-stable distributions. In Proceedings of the Symposium on Computational Geometry, 253–262. https://doi.org/10.1145/997817.997857
Zhou, A. (2005). c^2: a new overlay network based on can and chord. international journal of high performance computing network, 3(4), 248-261. https://doi.org/10.1007/978-3-540-24679-4_15
W. Liu, J. Wang, R. J. Y-G. Jang, S-F Chang., (2012). Supervised hashing with kernels. In computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2012.6247912
A. Jolly & O. Buisson. (2011). Random maximum margin hashing. CVPR. https://doi.org/10.1109/CVPR.2011.5995709
H. Jae-Pil, L. Youngwoon, H. Junfeng, C. Shih-Fu, Y. Sung_Eui, (2015). Spherical Hashing: Binary Code Embedding with Hyperspheres. IEEE transaction on Pattern Analysis and Machine Intelligent, 1-14. https://doi.org/10.1109/TPAMI.2015.2408363
J. He, R. Rhadhakrishnan, S-F Chang and C. Bauer. (2011). Compact hashing with joint optimisation of search accuracy and time. CVPR. https://doi.org/10.1109/CVPR.2011.5995518
Y. Gong and S. Lazebnik. (2011). Itetrative Quantisation: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE transaction on pattern analysis and machine inteliigence. https://doi.org/10.1109/TPAMI.2012.193
A. Torralba, R. fergus, and Y. Weiss, (2008). Small codes and large image dtatabases for recognition. CVPR. https://doi.org/10.1109/CVPR.2008.4587633
Y. Weiss, A. Torralba, and R. fergus, . (2008). Spectral Hashing. in proceedings of NIPS.
O. Chum, J. Philbin, A. Zisseman, (2008). Near duplicate image detectionmin-hash and tf-idf weighting. BMVC.
M. Rangisky and S. Lazebnik. (2009). Locality sensitive binary codes from shift-invariant kernels. in proceedings od NIPS, 1509-1517.
R. Salakhutdinov, G. Hinton, (2009). Semantic Hashing. International Journal of Approximate reasoning. https://doi.org/10.1016/j.ijar.2008.11.006
Boukari Souley, A. U. Othman (2019). Geometric Similarity Preserving Embedding-Based Hashing for Bid Data in Clou Computing. International Journal of research and Scientific Innovation.
J. Wang, S. Kumar, S-F Chang,. (2010). Sequential projection learning for hashing with compact codes. ICML.
L. Pauleve, H. Jegou, L. Amsaleg, (2010). Locality sensitive Hashing: A comparison of hash function types and queryong mechanism. Pattern recognition Letters. https://doi.org/10.1016/j.patrec.2010.04.004
J. Wang, S. Kumar, and S-F Chang, (2010). Semi-supervised hashing for scalable image retrieval. CVPR. https://doi.org/10.1109/CVPR.2010.5539994
R-S Lin, D. Rose, J. Yangik, (2010). Spec Hashing: Similarity preserving algorithm for entropy-base coding. CVPR. https://doi.org/10.1109/CVPR.2010.5540129
R. Ye, Z. Li, (2016). Compact structure hashing via sparse and similarity preserving embedding. IEEE transaction on cybernatics, 46(3), 718-729. https://doi.org/10.1109/TCYB.2015.2414299
H. Zhang, L. Liu, Y. Yong, L. Shao, (2017). Unsupervised deep hashing with pseudo labels for scalable image retrieeval. https://doi.org/10.1109/TIP. 2017. 2781422
Y. Lv, W. Y. Ng Wing, Z. Zeng, S. D. Yeung,and P. K. Patrick (2015). Asymmetric Cyslical Hashing for Large Scale Image Retrieval. IEEE transaction on multimedia, 11(8), 1225-1235. https://doi.org/10.1109/TMM.2015.2437712
M. Norouzi and D. J. Fleet. (2011). Minimal Hashing for Compact binary codes. ICML.
Kadiyala S, S. N. (2008). A compact multi-resolution indedx for variable length queries in time series database. Knowledge information system, 15(2), 131-147. https://doi.org/10.1007/s10115-007-0097-z
Meshram, B. B., & Gaikwad, G. P. (2013, 4). Different indexing techniques. International Journal of Engineering Research and Application, 3(2), 1230-1235.
Chen, J., Yuegue, C., Lia, E., Cuiping, I. L., & Jiaheng, U. L. (2013). Big Data Challenges: A data Management Perspective. Higher education press and springer verlag Berlin Heidelberg, 7(2), 157-164. http://dx.doi.org/10.1007/s11704-013-3903-7
Kaisler, S.,Armour, F., Espinosa , J. A., & Money, W. (2013). Big data: issues and challenges moving forward. Hawii international conference on system sciences, 995-1004. http://dx.doi.org/10.1109/HICSS.2013.645
M. S. Charkar, (2002). Similarity estimation techniques from rounding algorithms. In Proceedings of Annual ACM Symposium on Theory of Computation, 380-388. https://doi.org/10.1145/509907.509965
B. Kulis, K. Grauman, (2009, September-October). Kernelised Locality-sensitive hashing for scalable image search. In Proceedings of IEEE conference on computer vision and pattern recognition, 2130-2137. http://dx.doi.org/10.1109/ICCV.2009.5459466
B. Souley, A. U. Othman, A. Y. Gital and I. M. Adamu, (2019). Performance evaluation of GSPEBH for big data in cloud computing. Global Scientific Journal.
C. Yan, H. Xie, D. Yang, J. Yin, Y. Zhang, (2018). Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE transaction on intelligent transportation systems, 19(1), 284-295. https://doi.org/10.1109/TITS.2017.2749965
M. He, Y. Yang, F. Shen, N. Xie, and H. T. Shen, (2017). Hashing with Angular Reconstruction Embeddings. IEEE Transactions on Image Processing. 27(5), 545-555. https://doi.org/10.1109/TIP.2017.2749147
Nussinov, R., & Wolfson, H. J. (1991, 12 01). Efficient Detection of three-Dimensional Structural Motifs in Biological Macromolecules by Computer Vision techniques. Peoceedings of the National Academy of Science America, 88(23), 10495-10499. https://doi.org/10.1073%2Fpnas.88.23.10495
Mehrotra, H., Majhi, B., & Gupta, P. (2010). Robust ris indexing scheme using geometric hashing of SIFT keypoints. Journal of Network and Computer Applications, 33, 300–313. https://doi.org/10.1016/j.jnca.2009.12.005
Lowe, D. (2004). Distinctive image features from scale-invariant key points. International Journal of Computer Vision 60, 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94.
Jayaraman, U., Surya, P., & Phalguni, G. (2013). Use of geometric features of principal components for indexing a biometric database. Mathematical aand Computing Modelling, 58, 147-164. https://doi.org/10.1016/j.mcm.2012.06.005
Li, F., Yi, K., & Le, W. (2010). Top-k queries on temporal data. VLDB, 19(5), 715-733. http://dx.doi.org/10.1007/s00778-010-0186-6
Sandu P, I., Zeitouni, K., Oria, V., Barth , D., & Vial, S. (2011). Indexing in network trajectory flows. VLDBJ, 20(5), 643-669. https://doi.org/10.1007/s00778-011-0236-8
Dittrichter. (2011). MOVIES: Indexing moving objects by shooting index images. Geoinformatics, 15(4), 727-767. http://dx.doi.org/10.1007/s10707-011-0122-y
Xie, M., Wang, H., Yin, J., & Meng, X. (2007). Integrity auditing of outsourced data. In Proceedings of the International Conference on Very Large Databases, 782-793.
Vandana, D. K., Jayaraman, U., Amit, K., Aman, K. G., & Gupta, P. (2013). An efficient indexing scheme for face database using modified geometric hashing. Neurocomputing, 116, 208-221. https://doi.org/10.1016/j.neucom.2011.12.056
Umarani, J., Surya, P., & Phalguni, G. (2013). Use of geimetric features of principal components for indexing a biometric database. Mathematical and computing modelling, 58, 147-164. https://doi.org/10.1016/j.mcm.2012.06.005
Ling-Yin, Y.-T. H.-C.-C. (2013). Indexing spatial data in cloud data management. Pervasive mobile computing, 1-14. https://doi.org/10.1016/j.pmcj.2013.07.001
Xiaohui, Yu, K. Q., & Nick, K. (2005). Monitoring K-nearest neighbour queries over moving objects. In Proceedings of the 21st International Conference on Data Engineering. https://doi.org/10.1109/ICDE.2005.92
Wang, M., Viliam, H., John, M., & Patrick, O. (2013, 2). High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future generation computer, 29, 1943-1962. https://doi.org/10.1016/j.future.2013.04.028
Wang, J., Wu, S., Gao, H., Li, J., & Ooi, B. C. (2010). Indexing Multidimensional Data in a Cloud System. ACM SIGMOD International conference on management of data, 591-602. https://doi.org/10.1145/1807167.1807232
Spek, P. V., & Steven, K. (2011). Applying a dynamic threshold to improve cluster detection LSI. Science of computer programming, 76, 1261-1274. https://doi.org/10.1016/j.scico.2010.12.004
James, C., Yiping, K., Ada, W.-C. F., & Jeffrey, X. Y. (2011). Fast grapbh query processing with low-cost index. The VLDB journal, 20(4), 521-539. https://doi.org/10.1007/s00778-010-0212-8
Giangreco, I., Kabary, I. A., & Schuldt, H. (2014, 06). Adam - a database and information retrieval system for big multimedia collections. IEEE International Conference on, 406-413. https://doi.org/10.1109/BigData.Congress.2014.66
Collins, E. (2014). Intersection of the cloud and big data. IEE Cloud Computing 1, 84-85. http://dx.doi.org/10.1109/MCC.2014.12
Cackett, D. (2013). Information Management and Big data: A reference Architecture. Oracle corperation.
Wook-shin, H., Jinsoo, L., & Minh-Duc, P. (2010). iGraph: A framework for comparing a disk-based grapph indexing techniques. Proceedings of the VCLD endowment, 3(1).
Jin Z, L. C., Lin, Y., & Cai, D. (2014, august). Density Sensitive Hashing. IEEE transactions on Cybernetics, 44(8), 1362-1371. https://doi.org/10.1109/TCYB.2013.2283497
Ferragina, P., & Rossano, V. (2007, 7). The ompressed permuterm index. In proceedings of SIGIR, 535-542. https://doi.org/10.1145/1868237.1868248
JJinbao, W., Wu, S., Gao, J., Li, J., & Ooi, C. B. (2010). Indexing multi-dimensional data in a cloud system. SIGMOD. https://doi.org/10.1145/1807167.1807232
Refbacks
- There are currently no refbacks.
Published by : ICSE (Institute of Computer Sciences and Engineering)
Website : http://icsejournal.com/index.php/JCSE/
Email: jcse@icsejournal.com
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.