Friday, March 29, 2019
Use and Application of Data Mining
Use and Application of selective in arrangeion Mining  information  excavation is the process of extracting patterns from data. Data  archeological site is becoming an increasingly   all- of the essence(p)(a) tool to  modify the data into information. It is commonly  designd in a wide range of profiling practices, such as marketing, surveillance, fraud  contracting and scientific discovery 1-3. Data   mining  groundwork be  utilise on a variety of data types. Data types include   coordinate data (relational), multimedia data, free   school  school textual matter, and hypertext as shown in Figure 1-1. We can strip hypertext from XML/XHTML tags to get free text4, 5.Nowadays, text is the most common and  satisfactory way for information exchange. This due to the fact that much of the worlds data is  insureed in text  scrolls (newspaper articles, emails, literature, web pages, etc.). The importance of this way has led  legion(predicate) researchers to  fuck off out suitable methods to an   alyze natural  spoken  lyric poem texts to extract the important and useful information. In comparison with data stored in structured format (databases), texts stored in  catalogues is  unstructured and to deal with such data, a pre bear on is required to transform textual data into a suitable format for automatic processing 6.text mining is a new and exciting area of  computing device science research that interested of solving the problem of information  clog by using combination techniques from data mining,  forge  tuition, natural language processing, information retrieval, and knowledge management. Text mining, also  cognize as text data mining 7 or knowledge discovery from textual databases 8, refers  primarily to the automatic process of extracting interesting and high-quality information or knowledge from unstructured text documents by using a suite of analysis tools 9.Definitely, text mining takes much of its inspiration and direction from  result research on data mining. T   herefore, text mining and data mining systems contain many high-level architectural similarities. For example, text mining and data mining systems  opine on preprocessing routines, pattern-discovery algorithms, and presentation-layer elements 1. Further more, text mining adopts many of the specific types of patterns in its core knowledge discovery operations that were first introduced and vetted in data mining research 9.The difference between data mining and text mining lies in the specific stages of preparation of the data and the difficulty of  square offing the important patterns due to the semi-structured or unstructured nature of the textual documents being processed.Data mining systems assumes that data  fox already been stored in a structured format. Therefore, the preprocessing stage focus falls on two critical tasks  rub and normalizing data and creating extensive numbers of table joins. In contrast, for text mining systems, preprocessing tasks focus on the identification    and extraction of representative features for natural language documents. These preprocessing tasks are responsible for transforming unstructured, original-format content in document collections into a more explicitly structured intermediate format, which is a concern that is not  germane(predicate) for most data mining systems. Text mining preprocessing tasks include a variety of different types of techniques culled and adapted from information retrieval, information extraction, and computational  linguistics research (such as tokenization, stop word remover, normalization, and stemming, etc.)9.Typical text mining tasks involving Text extraction and representation, information retrieval, document summarization, document clustering, document  sorting.Text representation is concerned with the problem of how to represent text data in appropriate format for automatic processing. In general, documents can be delineate in two ways, as a bag of words where the  context and the word order    are neglected and the other one is to  come  just about common phrases in text and deal with them as  whizz  scathe 10.In information retrieval, the information needed to be retrieved is represented as query and the task of the information retrieval systems is to find and return documents that contain the most relevant information to the given query. In order to  strain this purpose, text mining techniques are used to analyse text data and make a comparison between the extracted information and the given queries to find out documents that include answers 10, 11.The idea of text summarization is an automatic  spotting of the most important phrases in a given text document and to create a condensed version of the input text for human use 10. Text summarization can be done for a single document or a document collection (multi-document summarization). Most approaches in this area focus on extracting informative sentences from texts and building summaries based on the extracted informati   on. Recently, many approaches have been tried to create summaries based on semantic information extracted from given text documents 10, 11.Document clustering is a machine learning technique that is used to identify the similarity between text documents based on their content. Unlike document classification, document clustering is an unsupervised method in which there are no pre-defined categories. The idea of document clustering is to create links between similar documents in a document collection to allow them to be retrieved together 10-12.Document classification is the assignment of text documents into one or more pre-defined categories based on their content 10, 13. It is a supervised learning problem where the categories are known in advance 10. For the document classification problem, many machine learning techniques including decision trees, K-nearest neighbour, SVM support vector machines and Naive Bayes algorithm have been used to build document classification models. more    details about document classification in the next section.  
Subscribe to:
Post Comments (Atom)
 
 
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.