Carlos Santos's Library tagged → View Popular
[0908.4425] Geometry of the restricted Boltzmann machine
"The restricted Boltzmann machine is a graphical model for binary random variables. Based on a complete bipartite graph separating hidden and observed variables, it is the binary analog to the factor analysis model. We study this graphical model from the perspectives of algebraic statistics and tropical geometry, starting with the observation that its Zariski closure is a Hadamard power of the first secant variety of the Segre variety of projective lines. We derive a dimension formula for the tropicalized model, and we use it to show that the restricted Boltzmann machine is identifiable in many cases. Our methods include coding theory and geometry of linear threshold functions."
How FlightCaster Squeezes Predictions from Flight Data » Data Wrangling Blog
FlightCaster strikes me as a great example of the next generation of web applications that will leverage that data: bootstrapped startups that apply machine learning and data processing at scale to solve a focused problem people actually care about.
Hadoop Studio
Hadoop Studio is a map-reduce development environment (IDE) based on Netbeans. It makes it easy to create, understand and debug map-reduce applications based on Hadoop, without requiring development-time access to a map-reduce cluster.
Wikipedia page counters « domas mituzas: vaporware, inc.
Source of the dataset used in trendingtopics.org
Amazon Web Services Developer Community : Wikipedia Page Traffic Statistics
This dataset contains a 320 GB sample of the data used to power trendingtopics.org. It includes 7 months of hourly page traffic statistics for over 2.5 Million wikipedia articles (~ 1 TB uncompressed) along with the associated wikipedia content, linkgraph, & metadata.
Data Analysis Examples
The pages below contain examples (often hypothetical) illustrating the application of different statistical analysis techniques using different statistical packages. Each page provides a handful of examples of when the analysis might be used along with sample data, an example analysis, explanation of the output, a short sample write-up, followed by references for more information. These pages merely introduce the essence of the technique and do not to provide a comprehensive description of how to use it.
Tutorial: Scientific and parallel computing using IPython | Python for Scientific and Large Scale Computing
"This series introduces scientific and parallel computing using IPython with emphasis on IPython on a Windows PC. We discuss best practices for effectively using IPython with numpy, scipy, and matplotlib, as well has using IPython for interactive parallel computation." By J. Unpingco, who created the parallel ipython+vision (visual programming environmente) demo.
Cloudera Hadoop & Big Data Blog » Blog Archive » Building a distributed concurrent queue with Apache ZooKeeper
ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having a set of processes wait until they’ve all reached the same point in their execution - a kind of distributed barrier - is surprisingly difficult to do correctly. ZooKeeper offers an API to facilitate this sort of distributed coordination.
Recent codes - RefactorMyCode.com
What interesting could you do with a dataset of refactorings?
Selected Tags
Related Tags
Sponsored Links
Highlighter, Sticky notes, Tagging, Groups and Network: integrated suite dramatically boosting research productivity. Learn more »
Join Diigo
