Roger Chen's Library tagged → View Popular
Trust network datasets - TrustLet
a free, collaborative project for collecting and analyzing information about trust metrics.
PMML - AnalyticBridge
PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications.
rpy2 - redesign of rpy
rpy2 is a redesign and rewrite of rpy. It is providing a low-level interface to R, a proposed high-level interface, including wrappers to graphical libraries, as well as R-like structures and functions.
MOA - Massive Online Analysis
MOA is a framework for data stream mining. Includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems.
The 13th Machine Learning Summer School
The 13th Machine Learning Summer School was held in Cambridge, UK. This year's edition was organized by the University of Cambridge, Microsoft Research and PASCAL. The school offered an overview of basic and advanced topics in machine learning through theoretical and practical lectures given by leading researchers in the field. We hope to attract international students, young researchers and industry practitioners with a keen interest in machine learning and a strong mathematical background.
Cascading
Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster.
The processing API lets the developer quickly assemble complex distributed processes without having to "think" in MapReduce. And to efficiently schedule them based on their dependencies and other available meta-data. Obviously simple data processing applications are supported as well, as complex jobs tend to start simple.
Singular Value Decomposition Tutorial
SVD is extraordinarily useful and has many applications such as data analysis, signal processing, pattern recognition, image compression, weather prediction, and Latent Semantic Analysis or LSA (also referred to as Latent Semantic Indexing or LSI).
ECT 584 - Data Mining in Weka
This guide/tutorial uses a detailed example to illustrate some of the basic data preprocessing and mining operations that can be performed using WEKA. It is based on WEKA version 3.4.1. Some of the interface elements and modules may have changed in the most current version of WEKA. You can download the most current version of WEKA from the WEKA Web site. The current version includes a few additional features in the GUI and has a more organized packaging structure for the Java components. You should pay attention to these differences as you go through the tutorial. The differences in packaging structure are particularly important when you are running WEKA from the commandline.
SVM-python
SVMpython is a Python embedded version of SVMstruct. One applies SVMstruct by modifying the svm_struct_api.c file and recompiling. SVMpython allows one to write these functions in Python instead: one applies SVMpython by creating a Python module (commonly just a .py file) with the appropriate methods. This module is loaded and specific methods called at runtime to support the structural learning algorithm.
Google Tech Talk Review: Statistical Aspects of Data Mining | A Beautiful WWW
This is a talk series being given at Google by David Mease based on a Master’s level stats course he is teaching this summer at Stanford. Its easy listening if you already have some data mining or stats background.
Statistical Data Mining Tutorials by Andrew Moore
Andrew Moore
Datasets for Data Mining
This page contains a list of datasets that were selected for the projects for Data Mining and Exploration. Students can choose one of these datasets to work on, or can propose data of their own choice. At the bottom of this page, you will find some examples of datasets which we judged as inappropriate for the projects.
Selected Tags
Related Tags
Sponsored Links
Top Contributors
-
The End of Theory
Items: 21 | Visits: 169
Created by: Roger Chen
-
Data Mining
Items: 8 | Visits: 15
Created by: Socrates Lee
-
data mining application
Items: 4 | Visits: 16
Created by: hillpig
Highlighter, Sticky notes, Tagging, Groups and Network: integrated suite dramatically boosting research productivity. Learn more »
Join Diigo
