Carlos Santos's Library tagged → View Popular
Ivory: A Hadoop toolkit for web-scale information retrieval
"Ivory is a Hadoop toolkit for web-scale information retrieval research that features a retrieval engine based on Markov Random Fields, appropriately named SMRF (Searching with Markov Random Fields). This open-source project began in Spring 2009 and represents a collaboration between the University of Maryland and Yahoo! Research. Ivory takes full advantage of the Hadoop distributed environment (the MapReduce programming model and the underlying distributed file system) for both indexing and retrieval. The current release of Ivory (release 0.2) works with Hadoop 0.20.1 (and requires certain features only found in that release). Ivory also uses Cloud9, a MapReduce library for Hadoop developed at the University of Maryland (currently also at release 0.2)."
MapReduce Online! (and some gimmes) « Data Beta
"Introducing HOP: the Hadoop Online Prototype. With modest changes to the structure of Hadoop, we were able to convert it from a batch-processing system to an interactive, online system that can provide features like “early returns” from big jobs, and continuous data stream processing, while preserving the simple MapReduce programming and fault tolerance models popularized by Google and Hadoop. And by the way, it exposes pipeline parallelism that can even make batch jobs finish faster. This is a project led by Tyson Condie, in collaboration with folks at Berkeley and Yahoo! Research."
Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2 | Cloudera
YDN Theater: Tom White - Running Hadoop in the Cloud
He opens with a discussion of the Berkeley RAD Lab paper on cloud computing and walks us through a set of definitions to a discussion of the public cloud. He sees a realm of interesting possibilities: an apparently infinite resource; the elimination of user commitment; and the pay-as you go model, which enables elasticity. Tom describes the implementation of Hadoop in this landscape.
Hadoop Studio
Hadoop Studio is a map-reduce development environment (IDE) based on Netbeans. It makes it easy to create, understand and debug map-reduce applications based on Hadoop, without requiring development-time access to a map-reduce cluster.
gist: 130483 - GitHub
Python Hadoop Streaming script called by Hive
in daily run - calculates simple baseline
monthly trend for "Biggest Movers"
'Grid computing Red Hat' out-Amazons Amazon • The Register
This means you can run ongoing Hadoop jobs - starting them and stopping them whenever you like - without moving data back and forth between the local EC2 disks and Amazon's Simple Storage Sevice (S3).
Cloudera Hadoop & Big Data Blog » Blog Archive » Introducing Sqoop
Sqoop (”SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:
* Imports individual tables or entire databases to files in HDFS
* Generates Java classes to allow you to interact with your imported data
* Provides the ability to import from SQL databases straight into your Hive data warehouse
Cloudera Hadoop & Big Data Blog » Blog Archive » Building a distributed concurrent queue with Apache ZooKeeper
ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having a set of processes wait until they’ve all reached the same point in their execution - a kind of distributed barrier - is surprisingly difficult to do correctly. ZooKeeper offers an API to facilitate this sort of distributed coordination.
Amazon Web Services Developer Community : Any AWS MapReduce examples using the R ...
discussion of R for big data processing
Hadoop User Group UK: HUGUK #2 - Wrap up
Practical MapReduce - (Tom White, Cloudera) video, slides
Introducing Apache Mahout - (Isabel Drost, ASF) video, slides
Hadoop Training: Virtual Machine | Cloudera
-
In order to make it easy for you to get started with Hadoop and complete our various training exercises, we have created a virtual machine with everything you need. The VM includes Cloudera's Distribution for Hadoop, all of our example code, as well as eclipse and other standard tools
Amazon Elastic MapReduce
Using Hadoop on Amazon's "cloud"
-
Amazon Elastic MapReduce is a web service
-
t utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3)
Selected Tags
Related Tags
Sponsored Links
Top Contributors
Groups interested in hadoop
-
BI / Datawarehouse
This list is for Data wareh...
Items: 8 | Visits: 5
Created by: joe pavao
-
Hadoop
Hadoop
Items: 10 | Visits: 8
Created by: luke chi
-
Parallel Databases
Links related to mapreduce ...
Items: 11 | Visits: 3
Created by: Dmitry Serebrennikov
Highlighter, Sticky notes, Tagging, Groups and Network: integrated suite dramatically boosting research productivity. Learn more »
Join Diigo
