Web Data Mining: A Systematic Approach to Keyword Based Web Research

Web data mining is systematic approach to keyword based and hyperlink based web research for gaining business intelligence. It requires analytical skills to understand hyperlink structure of given website. Hyperlinks possess enormous amount of hidden human annotations that can help automatically understand the authority. If the webmaster provides a hyperlink pointing to another website or web page, this action is perceived as an endorsement to that webpage. Search engines highly focus on such endorsements to define the importance of the page and place them higher in organic search results.

However every hyperlink does not refer to the endorsement since the webmaster may have used it for other purposes, such as navigation or to render paid advertisements. It is important to note that authoritative pages rarely provide informative descriptions. For an instant, Google's homepage may not provide explicit self-description as "Web search engine."

These features of hyperlink systems have forced researchers to evaluate another important webpage category called hubs. A hub is a unique, informative webpage that offers collections of links to authorities. It may have only a few links pointing to other web pages but it links to a collection of prominent sites on a single topic. A hub directly awards authority status on sites that focus on a single topic. Typically, a quality hub points to many quality authorities, and, conversely, a web page that many such hubs link to can be deemed as a superior authority.

Such approach of identifying authoritative pages has resulted in the development of various popularity algorithms such as Page Rank. Google uses Page Rank algorithm to define authority of each webpage for a relevant search query. By analyzing hyperlink structures and web page content, these search engines can render better-quality search results than term-index engines such as Ask and topic directories such as DMOZ.

ARTICLE SOURCE: This factual content has not been modified from the source. This content is syndicated news that can be used for your research, and we hope that it can help your productivity. This content is strictly for educational purposes and is not made for any kind of commercial purposes of this blog.