MULTILEVEL CONTENT MINING MODEL FOR LARGE SCALE WEBSITES
Keywords:
MODEL FOR LARGE SCALE WEBSITESAbstract
As per the current usage of WWW, the data available over the Websites is also growing at a large scale. Hence, efficient Web data extraction has become a great challenge for large scale Websites. The main requirement of a user from such types of Websites is to extract the accurate data in desirable amount of time. This research work provides a Web content extraction model for extracting content from large scale Websites. The System Model (MCMM-LSW) produces a link tree of Website and extracts content based on the seed page extracted from different levels of link tree. The results produce higher recall, precision and overall accuracy (F-measure) than the approach used in the literature i.e. 2-level approach.. The effect of applying MCMM-LSW on changing the number of levels of the Websites is shown in the results. Finally the comparison of keyword based extraction and MCMM- LSW is also shown. `
References
Web Data Extraction, Applications and Techniques: A Survey by Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner published at ACM Computing Surveys, Jul 2012.
Yuefeng Li and Ning Zhong: Web Mining Model and Its Applications for Information Gathering, Knowledge-Based Systems 17, pp. 207–217, 2004.
Rekha Jain and Dr. G. N. Purohit,”Page Ranking Algorithms for Web Mining, International
Journal of Computer Applications”,ISSN: 0975 – 8887, Volume 13– No.5, pp. 22–25, January 2011.
Claudia Elena DINUCA, “An Application for Data reprocessing and Models Extractions in Web Usage Mining”, International Conference on “Risk in Contemporary Economy”, Galati, Romania. ISSN 2067-0532, XIIth Edition, 2011.
S. Brin, and L. Page, “The Anatomy of a Large Scale Hypertextual Web Search Engine”, Computer Network and ISDN Systems, Vol. 30, Issue 1-7, pp. 107-117, 1998.
Wenpu Xing and Ali Ghorbani, “Weighted Page Rank Algorithm”, Proceedings of the Second Annual Conference on Communication Networks and Services Research (CNSR ’04), IEEE, 2004.
J. Kleinberg, “Authoritative Sources in a Hyper-Linked Environment”, Journal of the ACM 46(5), pp. 604-632, 1999.
Cooley R., Mobasher B., Srivastava J. “Web mining: Information and Pattern discovery on the World Wide Web. A survey paper”. In Proc. ICTAI-97, 1997.