Text Mining — это набор технологий и методов, предназначенных для извлечения информации из текстов. Основная цель — дать аналитику возможность работать с большими объемами исходных данных за счет автоматизации процесса извлечения нужной информации.
Part 1: Introduction and Data Preparation
Overview of text mining
Tokenization
Dictionary creation
Vector generation for prediction
Feature generation and selection
Parsing
Part 2: Predictive Models for Text
Document classification
Document similarity and nearest-neighbor
Decision rules
Probabilistic models
Linear models
Performance evaluation
Applications
Part 3: Retrieval and Clustering of Documents
Measuring similarity for retrieval
Web-based document search and link analysis
Document matching
Clustering by similarity
k-means clustering
Hierarchical clustering
The EM algorithm for clustering
Evaluation of clustering
Part 4: Information Extraction
Goals of information extraction
Finding patterns and entities
Entity Extraction: The Maximum Entropy method
Template filling
Applications
Leave a Reply