Today World Wide Web is flooded with billions of static and dynamic web pages created with programming languages such as HTML, PHP and ASP. Web is great source of information offering a lush playground for data mining. Since the data stored on web is in various formats and are dynamic in nature, it's a significant challenge to search, process and present the unstructured information available on the web.
Complexity of a Web page far exceeds the complexity of any conventional text document. Web pages on the internet lack uniformity and standardization while traditional books and text documents are much simpler in their consistency. Further, search engines with their limited capacity can not index all the web pages which makes data mining extremely inefficient.
Moreover, Internet is a highly dynamic knowledge resource and grows at a rapid pace. Sports, News, Finance and Corporate sites update their websites on hourly or daily basis. Today Web reaches to millions of users having different profiles, interests and usage purposes. Every one of these requires good information but don't know how to retrieve relevant data efficiently and with least efforts.
It is important to note that only a small section of the web possesses really useful information. There are three usual methods that a user adopts when accessing information stored on the internet:
• Random surfing i.e. following large numbers of hyperlinks available on the web page.
• Query based search on Search Engines - use Google or Yahoo to find relevant documents (entering specific keywords queries of interest in search box)
• Deep query searches i.e. fetching searchable database from eBay.com's product search engines or Business.com's service directory, etc.
To use the web as an effective resource and knowledge discovery researchers have developed efficient data mining techniques to extract relevant data easily, smoothly and cost-effectively.