Friday, March 29, 2019

Use and Application of Data Mining

Use and Application of selective in arrangeion Mining information excavation is the process of extracting patterns from data. Data archeological site is becoming an increasingly all- of the essence(p)(a) tool to modify the data into information. It is commonly designd in a wide range of profiling practices, such as marketing, surveillance, fraud contracting and scientific discovery 1-3. Data mining groundwork be utilise on a variety of data types. Data types include coordinate data (relational), multimedia data, free school school textual matter, and hypertext as shown in Figure 1-1. We can strip hypertext from XML/XHTML tags to get free text4, 5.Nowadays, text is the most common and satisfactory way for information exchange. This due to the fact that much of the worlds data is insureed in text scrolls (newspaper articles, emails, literature, web pages, etc.). The importance of this way has led legion(predicate) researchers to fuck off out suitable methods to an alyze natural spoken lyric poem texts to extract the important and useful information. In comparison with data stored in structured format (databases), texts stored in catalogues is unstructured and to deal with such data, a pre bear on is required to transform textual data into a suitable format for automatic processing 6.text mining is a new and exciting area of computing device science research that interested of solving the problem of information clog by using combination techniques from data mining, forge tuition, natural language processing, information retrieval, and knowledge management. Text mining, also cognize as text data mining 7 or knowledge discovery from textual databases 8, refers primarily to the automatic process of extracting interesting and high-quality information or knowledge from unstructured text documents by using a suite of analysis tools 9.Definitely, text mining takes much of its inspiration and direction from result research on data mining. T herefore, text mining and data mining systems contain many high-level architectural similarities. For example, text mining and data mining systems opine on preprocessing routines, pattern-discovery algorithms, and presentation-layer elements 1. Further more, text mining adopts many of the specific types of patterns in its core knowledge discovery operations that were first introduced and vetted in data mining research 9.The difference between data mining and text mining lies in the specific stages of preparation of the data and the difficulty of square offing the important patterns due to the semi-structured or unstructured nature of the textual documents being processed.Data mining systems assumes that data fox already been stored in a structured format. Therefore, the preprocessing stage focus falls on two critical tasks rub and normalizing data and creating extensive numbers of table joins. In contrast, for text mining systems, preprocessing tasks focus on the identification and extraction of representative features for natural language documents. These preprocessing tasks are responsible for transforming unstructured, original-format content in document collections into a more explicitly structured intermediate format, which is a concern that is not germane(predicate) for most data mining systems. Text mining preprocessing tasks include a variety of different types of techniques culled and adapted from information retrieval, information extraction, and computational linguistics research (such as tokenization, stop word remover, normalization, and stemming, etc.)9.Typical text mining tasks involving Text extraction and representation, information retrieval, document summarization, document clustering, document sorting.Text representation is concerned with the problem of how to represent text data in appropriate format for automatic processing. In general, documents can be delineate in two ways, as a bag of words where the context and the word order are neglected and the other one is to come just about common phrases in text and deal with them as whizz scathe 10.In information retrieval, the information needed to be retrieved is represented as query and the task of the information retrieval systems is to find and return documents that contain the most relevant information to the given query. In order to strain this purpose, text mining techniques are used to analyse text data and make a comparison between the extracted information and the given queries to find out documents that include answers 10, 11.The idea of text summarization is an automatic spotting of the most important phrases in a given text document and to create a condensed version of the input text for human use 10. Text summarization can be done for a single document or a document collection (multi-document summarization). Most approaches in this area focus on extracting informative sentences from texts and building summaries based on the extracted informati on. Recently, many approaches have been tried to create summaries based on semantic information extracted from given text documents 10, 11.Document clustering is a machine learning technique that is used to identify the similarity between text documents based on their content. Unlike document classification, document clustering is an unsupervised method in which there are no pre-defined categories. The idea of document clustering is to create links between similar documents in a document collection to allow them to be retrieved together 10-12.Document classification is the assignment of text documents into one or more pre-defined categories based on their content 10, 13. It is a supervised learning problem where the categories are known in advance 10. For the document classification problem, many machine learning techniques including decision trees, K-nearest neighbour, SVM support vector machines and Naive Bayes algorithm have been used to build document classification models. more details about document classification in the next section.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.