Study on Website Phishing and their Countermeasures

Rao, Routhu Srinivasa

Please use this identifier to cite or link to this item: http://idr.nitk.ac.in/jspui/handle/123456789/16841

Title:	Study on Website Phishing and their Countermeasures
Authors:	Rao, Routhu Srinivasa
Supervisors:	Pais, Alwyn R.
Keywords:	Department of Computer Science & Engineering
Issue Date:	2020
Publisher:	National Institute of Technology Karnataka, Surathkal
Abstract:	Phishing is one of the manipulation technique which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many antiphishing solutions such as blacklist or whitelist, heuristic and visual similarity based methods proposed till date to prevent the phishing attacks. But online users are still getting trapped into revealing sensitive information in phishing websites. In this research work, we focus on designing new heuristic techniques with comprehensive feature set and different machine learning algorithms for the classification of phishing sites. There exists many machine learning (ML) based techniques to detect the phishing sites but they do not achieve better detection accuracy. To overcome the disadvantages of existing schemes, we have presented an efficient feature-based machine learning framework for the detection of phishing sites. The feature set is collected from different resources such as URL, source code and third party services and fed to the machine learning classifier. The model achieved a significant accuracy of 99.55% using orthogonal Random Forest classifier with a True Positive Rate (TPR) of 99.45% and True Negative Rate (TNR) of 99.42%. Although ML-based technique achieved a significant accuracy but due to the use of third-party services such as search engine or page ranking services the technique might fail when phishing sites hosted on compromised servers (PSHCS) are encountered. To counter these PSHCS, we presented two techniques with and without third party services. Firstly, we present a novel heuristic technique using twin support vector machine (TWSVM) to detect malicious registered phishing sites and also sites which are hosted on compromised servers. This technique achieved an accuracy of 98.4% in detecting phishing sites with TPR of 98.72% and TNR of 98.08%. This technique relies on the home page of the suspicious site for calculating the similarity score between the home page and suspicious site. This mechanism might fail when the correct home page of thesuspicious site is not retrieved. Hence, we presented an improved search engine based technique to identify the matched page for the suspicious site with a dynamic search query to calculate the similarity score. This technique not only detects PSHCS but also detects the newly registered legitimate site. The technique achieved an accuracy of 98.61% with TPR of 97.77% and TNR of 99.36%. The above presented techniques rely on the source code of the website and third party services which needs loading the page for detecting the status of the website. Due to this, the response time of the detection process might get delayed at the client-side. Moreover, due to guaranteed visit of webpage, there might be a more chance of accidental download of malware from the webpage (drive-by-downloads). Hence, we proposed two lightweight techniques based on the inspection of URLs. These techniques are designed to use as first-level filtering of phishing websites without even visiting the suspicious site. The first technique is deployed as a web application which uses hand-crafted and Term-Frequency Inverse Document Frequency features for the detection. The technique achieved an accuracy of 94.26% with TPR of 93.31% and TNR of 96.65%. The second technique is designed for the mobile device where a multi-model ensemble of Long Short Term Memory and Support Vector Machine is presented for the phishing detection. This technique achieved an accuracy of 97.30% with TPR of 97.31% and TNR of 97.28%. The earlier presented techniques either used content or URLs for the phishing detection but they lack the information of target website of the designed phishing site. To offer the same, we presented a lightweight visual similarity-based approach which maintains fingerprints of blacklisted phishing sites along with their target legitimate domains. Also, the technique includes heuristic features for the detection of phishing sites targeting non-whitelisted legitimate sites. This technique achieves a significant accuracy of 98.72% with TPR of 98.51% and TNR of 98.87%.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/16841
Appears in Collections:	1. Ph.D Theses

Files in This Item:

File	Description	Size	Format
158009CS15FV13 Routhu Srinvas Rao.pdf		8.78 MB	Adobe PDF	View/Open

Show full item record