Please use this identifier to cite or link to this item: http://idr.nitk.ac.in/jspui/handle/123456789/16864
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorPatil, Nagamma.-
dc.contributor.authorVanahalli, Manjunath K.-
dc.date.accessioned2021-08-18T11:32:38Z-
dc.date.available2021-08-18T11:32:38Z-
dc.date.issued2020-
dc.identifier.urihttp://idr.nitk.ac.in/jspui/handle/123456789/16864-
dc.description.abstractThe basic and major step of Association Rule Mining (ARM) is itemset mining. ARM and itemset mining have a great and vast range of applications. The conventional featured enumeration based itemset mining algorithms focus on mining frequent itemsets, frequent closed itemsets, and frequent maximal itemsets from transactional datasets. The transactional datasets consist of a smaller number of attributes (features) and a large number of rows (samples). The abundant data across a variety of domains, including bioinformatics has led to the formation of a new form of dataset known as high dimensional dataset, whose data characteristics are different from that of transactional datasets. The high dimensional datasets consist of a large number of features and a smaller number of rows. The amount of information that can be extracted from high dimensional datasets is potentially huge, but extraction of information from these datasets is a non-trivial task. The result of Frequent Itemset Mining (FIM) and Frequent Closed Itemset Mining (FCIM) algorithms include small and mid-sized itemsets, which do not enclose valuable and complete information for decision making. In applications dealing with high dimensional datasets such as bioinformatics, ARM gives greater importance to the large-sized itemsets known as colossal itemsets. The recent research focused on mining frequent colossal itemsets and frequent colossal closed itemsets, which are more influential in decision making and are significant for many applications, especially in the field of bioinformatics. The preprocessing technique of existing frequent colossal itemset mining and frequent colossal closed itemset mining algorithms fail to prune the complete set of insignificant features and rows. An Effective Improved Preprocessing (EIP) technique has been proposed to prune the complete set of insignificant features and rows, which confines an increase in the mining search space. The existing frequent colossal itemset mining algorithm mine limited set of frequent colossal itemsets leading to the generation of an incomplete set of association rules, which consequently affects the decision making. Frequent colossal itemset mining algorithm has been proposed to achieve better accuracy than existing algorithms in terms of mining number of frequent colossal itemsets from the high dimensional dataset. The existing algorithms for mining Frequent Colossal Closed Itemsets (FCCI) from the high dimensional dataset do not enclose an efficient pruning strategy and closeness checking method. To overcome the drawbacks of the existing works, an algorithm enclosed with efficient Rowset Cardinality Table (RCT) based closeness checking methodand pruning strategy has been proposed to efficiently mine FCCI from high dimensional dataset. The existing algorithms are inefficient in mining FCCI from the datasets consisting of a large number of features and rows, as they are inefficient in handling the changing characteristics of data subset during the mining process. The combination of different enumeration methods is required to efficiently handle different characteristics possessed by different datasets. A dynamic switching algorithm has been proposed to efficiently mine FCCI form the dataset consisting of a large number of features and rows. The dynamic switching algorithm efficiently handles the changing characteristics of the data subset during the mining process. The dynamic switching algorithm is enclosed with Itemset Support Table (IST) based closeness checking method and pruning strategy. The existing algorithms for mining FCCI from high dimensional datasets are sequential and computationally expensive. Distributed and parallel computing is a good strategy to overcome the inefficiency of the existing sequential algorithms. The inefficiency of the existing sequential algorithms has been overcome by proposing the parallel row enumerated algorithm to efficiently mine FCCI from the high dimensional dataset. Traversing the row enumerated tree is the best solution for mining FCCI from the high dimensional dataset. The intrinsic nature of the row enumerated tree is typically unbalanced, as the number of nodes in each row enumerated tree branch vary. The distributed and parallel algorithm with load balancing has been designed to address the inefficiency of existing works.en_US
dc.language.isoenen_US
dc.publisherNational Institute of Technology Karnataka, Surathkalen_US
dc.subjectDepartment of Information Technologyen_US
dc.subjectBioinformaticsen_US
dc.subjectHigh Dimensional Dataseten_US
dc.subjectData Characteristicsen_US
dc.subjectPreprocessingen_US
dc.subjectFrequent Colossal Item setsen_US
dc.subjectFrequent Colossal Closed Item setsen_US
dc.subjectRow set Cardinality Tableen_US
dc.subjectItem set Support Tableen_US
dc.subjectDynamic Switchingen_US
dc.subjectPruning Strategyen_US
dc.subjectCloseness Checkingen_US
dc.subjectParallel algorithmen_US
dc.subjectLoad Balancingen_US
dc.titleEfficient Mining of Frequent Colossal Itemsets from High Dimensional Dataen_US
dc.typeThesisen_US
Appears in Collections:1. Ph.D Theses

Files in This Item:
File Description SizeFormat 
145063IT14F02.pdf9.16 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.