A Review Paper: Categorizationof Web Pages
Keywords:
Categorization, WebAbstract
Contemporary web is comprised of trillions of pages and everydaytremendous amount of requests are made to put more web pages on the WWW. It has
been difficult to manage information present on web than to create it. Web pagecategorization can be defined as an approach to categorize the web pages based on a setof predefined categories to manage large web content. Yahoo! and ODP are theexamples of web directories in which pages are categorized manually or semiautomatically, but it is a very time consuming task. There are many ways of categorizingweb pages using different techniques. An approach to categorize webpages automatically on the basis of characteristics of web pages using neural networkbased single discrete perceptron training algorithm which is extended by selecting webpage specific features to categorize web pages of predefined categories with highaccuracy. The idea is presented with the help of two specific and major categories of webpages chosen for categorization that are newspaper and education.
References
Pierre J. M., “Practical Issues for Automated Categorization of Web Pages,” September 2000.
Xiaoguang Q. and Davison B. D., “Web page classification: Features and algorithms,” ACM Computing Surveys, 41(2), 2009
Yahoo!, http://www.yahoo.com, Accessed date 14th March, 2012.
Open Directory Project, http://www.dmoz.org, Accessed date 15th March, 2012
Xu Z. et. al., “A Web Page Classification Algorithm Based On Link Information,” in DCABES’11 Proceedings of the Tenth International Symposium on Distributed Computing and Applications to Business, Engineering and Science , pp. 82-86, 2011.
Bartik V., “Text-Based Web Page Classification with Use of Visual Information,” in ASONAM’10 Proceedings of the International Conference on Advances in Social Network Analysis and Mining, pp. 416-420, 2010.
He Z. and Liu Z., “A Novel Approach to Naïve Bayes Web Page Automatic Classification,” in FSKD’08 Proceedings of the Fifth International Conference on Fuzzy System and Knowledge Discovery, pp. 361-365, 2008.
Radovanović M. and Ivanović M.,“Document Representation for Classification of Short Web Page Descriptions,” in Yugoslav Journal of Operations Research, I8, Number 1, pp. 123-138, 2008.
Dai W. et. al., “A Novel Web Page Categorization Algorithm Based on Block Propagation Using Query-Log Information,” in WAIM’06, LNCS 4016, pp. 435-446, 2006.
Materna J., “Automatic Web Page Classification,” in RASLAN’08 Proceedings of Recent Advances in Slavonic Natural Language Processing, pp. 84-93, 2008. Page | 38
Kwon O. and Lee J., “Web page classification based Nearest Neighbor approach,” in IRAL’00 Proceedings of the fifth international workshop on Information retrieval with Asian languages, pp. 9-15, 2000.