Carlos Santos's Library tagged → View Popular
Powell's Books - Principles and Theory for Data Mining and Machine Learning (Springer Series in Statistics) by Bertrand Clarke
Intended primarily as a graduate level textbook for statistics, computer science, and electrical engineering students, this book assumes only a strong foundation in undergraduate statistics and mathematics, and facility with using R packages.
UK PubMed Central Blog: Nature Publishing Group allows data- and text-mining on self-archived manuscripts
Under NPG's terms of reuse, users may view, print, copy, download and text and data-mine the content for the purposes of academic research. Re-use should only be for academic purposes, commercial reuse is not permitted. Full conditions are available on nature.com
Mike Darga's Game Design Blog: Designing a Black Box (Part 1)
Designing a game so you can gather information and predict if a player is likely to leave.
Data Mining: Text Mining, Visualization and Social Media: Influence not as simple as Gladwell would have you believe!
Diffusion of information may ‘long circuit’ the small worlds of social networks. In Kleinberg’s presentation regarding the study of the largest internet chain mail (a petition) he described the role of the threshold model of diffusion in which we require multiple receipts of a stimulus (e.g. a chain mail letter) to pass it on, we are more sensitive to our immediate community – our strong links – than to small-world building weak links. This seems to have some relationship with Watt’s work on Challenging the Influentials Hypothesis and both his criticism of the disease analogy and his focus on the importance of the network structure, not some magical power of the ‘influential’.
[0809.4530v2] Mining Meaning from Wikipedia
It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.
OpenPipeline
OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.
-
OpenPipeline is new open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.
Text Analysis - MAWUI
The Mayo Weka/UIMA Integration (MAWUI) library is an interface that provides a way for UIMA components (e.g. TAE's, CasConsumers, etc.) to use the Weka machine learning environment within a UIMA context. This overview assumes a high-level familiarity with both UIMA and Weka.
UIMA Java Framework
The Unstructured Information Management Architecture (UIMA) framework is an open, industrial-strength, scalable and extensible platform for building analytic applications or search solutions that process text or other unstructured information to find the latent meaning, relationships and relevant facts buried within. It enables developers to build analytic modules and to compose analytic applications from multiple analytic providers, encouraging collaboration and facilitating value extraction for unstructured information.
Selected Tags
Related Tags
Sponsored Links
Top Contributors
Groups interested in DataMining
Diigo is about better ways to research, share and collaborate on information. Learn more »
Join Diigo
