Please use this identifier to cite or link to this item: http://dspace2020.uniten.edu.my:8080/handle/123456789/21270
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLaxmi Lydia E.en_US
dc.contributor.authorSharmili N.en_US
dc.contributor.authorNguyen P.T.en_US
dc.contributor.authorHashim W.en_US
dc.contributor.authorMaseleno A.en_US
dc.date.accessioned2021-09-08T06:19:10Z-
dc.date.available2021-09-08T06:19:10Z-
dc.date.issued2019-
dc.identifier.urihttp://dspace2020.uniten.edu.my:8080/handle/123456789/21270-
dc.description.abstractThe existence of unlabeledtext data in documents has become larger and excavating such datasets is a provocative task. The objective of Big Data is to store, retrieve and analyse multipletext documents. Problem Statement:The retrieval of the identical data over large databases is of major concern. Existing Solution:Existing problem is solved by Full-Text Search (FTS) which means pattern matching technique that allows searching of multiple keywords at specific time.Proposed Solution: In this paper, we consider multiple text documents as input and processed using text mining pre-processing algorithms like Key Phrase extraction, Porters stemming for tokenizing and TF_IDF toobtain all non-negative values. These values further processed to get matrix data throughNonnegative matrix factorization (NMF). On performing NMF, K-means algorithmis upgraded with NMF to obtain quality clusters of data sets.Performances of the algorithms are tested using Newsgroup20 data in Open Source Hadoop software environment which also analyses the performance of the MapReduce framework. The final outcome is to generate clusters and index them for the Newsgroup20dataset. Later on, Apache Lucene is presented for automatic document clustering with aGUI interface developed for indexing. Thus, this proposed algorithm resultsby improving the performance of document clustering through Map Reduce framework in Hadoop. © 2019 Mattingley Publishing. All rights reserved.en_US
dc.language.isoenen_US
dc.titleAutomatic document clustering and indexing of multiple documents using KNMF for feature extraction through Hadoop and lucene on big dataen_US
dc.typearticleen_US
item.cerifentitytypePublications-
item.languageiso639-1en-
item.fulltextWith Fulltext-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.grantfulltextreserved-
item.openairetypearticle-
Appears in Collections:UNITEN Ebook and Article
Files in This Item:
File Description SizeFormat 
This document is not yet available.pdf
  Restricted Access
396.12 kBAdobe PDFView/Open    Request a copy
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.