Spam detection project using web browser is useful for securing browsing by detecting spam websites. This application will detect spam sites and notify using web browser. 

Spam Detection Project Using Web Browser

Spam Detection Project Using Web Browser Functional Requirements:

              The system should be able to check whether the url in the address bar is valid or invalid. The url in the address bar is taken and check whether that url has repetitive words. The system should be able to tell to which domain the link pointing to. When the user clicks on the particular link it will respond to the user whether it belongs to the same server or domain if not it will show an alert. User should be able to search relevant information according to the given query. The user should be Internet literate so that he/she should request for the relevant information according to the given domain through query or links in the web pages. Interface controller should be able to display pages with any specified links. According to the submitted query or clicking the link, the search engine searches for the relevant information along with the spam pages and displays the web pages relevant to the search. The Query analyzer analyses the features for the different sources of information. The different features like anchor text, URL, surrounding anchor text for the sources of information can be extracted and analyzed which help in detecting spams in the links of the web pages. KLD calculation can be done for detecting spams. KLD calculation can be computed by extracting the different features of information for the given query and the page content of that relevant link.

NON FUNCTIONAL REQUIREMENTS

 User Interface

               User interface includes the browser and the search engine where the user can type a keyword or url on the search engine and browser respectively. The output will be displayed as a web page. 

Documentation

              The description of each function performed in the modules is specified in the code through comments. The user will be provided a clear understanding of the functionality of the software, and guided by the easy-to-understand input

format.

Performance Characteristics

              Performance characteristics depends upon the number of websites being  trained. When comparing to the real search engine it takes a littlebit time to go through the calculation process but the difference between those two are negligible.

Error Handling

             Whether the requested query belongs to different domain it will show a message but if the user wants to proceed  further or not it will ask.

Security Issues

             If there is a spam sites or links the user is not allowed to read it. If once it is identified as a spam page it will be captured. Spam sites are banned.

CONSTRAINTS AND ASSUMPTIONS 

 1)In the browser only urls are allowed.

 2) In the search engine keywords are allowed.

 3) The url should be entered as ‘www’ in the beginning.

For Full Project Source Code , project report contact us.