Skip to main content

Carlos Santos's Library tagged hadoop   View Popular

Ivory: A Hadoop toolkit for web-scale information retrieval

"Ivory is a Hadoop toolkit for web-scale information retrieval research that features a retrieval engine based on Markov Random Fields, appropriately named SMRF (Searching with Markov Random Fields). This open-source project began in Spring 2009 and represents a collaboration between the University of Maryland and Yahoo! Research. Ivory takes full advantage of the Hadoop distributed environment (the MapReduce programming model and the underlying distributed file system) for both indexing and retrieval. The current release of Ivory (release 0.2) works with Hadoop 0.20.1 (and requires certain features only found in that release). Ivory also uses Cloud9, a MapReduce library for Hadoop developed at the University of Maryland (currently also at release 0.2)."

www.umiacs.umd.edu/...index.html - Preview

hadoop MapReduce InformationRetrieval

MapReduce Online! (and some gimmes) « Data Beta

"Introducing HOP: the Hadoop Online Prototype. With modest changes to the structure of Hadoop, we were able to convert it from a batch-processing system to an interactive, online system that can provide features like “early returns” from big jobs, and continuous data stream processing, while preserving the simple MapReduce programming and fault tolerance models popularized by Google and Hadoop. And by the way, it exposes pipeline parallelism that can even make batch jobs finish faster. This is a project led by Tyson Condie, in collaboration with folks at Berkeley and Yahoo! Research."

databeta.wordpress.com/...mapreduce-online - Preview

MapReduce hadoop streaming

14 Jul 09

YDN Theater: Tom White - Running Hadoop in the Cloud

He opens with a discussion of the Berkeley RAD Lab paper on cloud computing and walks us through a set of definitions to a discussion of the public cloud. He sees a realm of interesting possibilities: an apparently infinite resource; the elimination of user commitment; and the pay-as you go model, which enables elasticity. Tom describes the implementation of Hadoop in this landscape.

developer.yahoo.net/...hadoop_in_the_cloud.html - Preview

Hadoop HadoopSummit by:TomWhite videolecture

10 Jul 09

Hadoop Studio

Hadoop Studio is a map-reduce development environment (IDE) based on Netbeans. It makes it easy to create, understand and debug map-reduce applications based on Hadoop, without requiring development-time access to a map-reduce cluster.

www.hadoopstudio.org - Preview

hadoop mapreduce tools development netbeans HadoopStudio IDE via:pskomoroch

16 Jun 09

gist: 130483 - GitHub

Python Hadoop Streaming script called by Hive
in daily run - calculates simple baseline
monthly trend for "Biggest Movers"

gist.github.com/130483 - Preview

python hadoop hive trendingtopics.org by:PeterSkomoroch

11 Jun 09

'Grid computing Red Hat' out-Amazons Amazon • The Register

This means you can run ongoing Hadoop jobs - starting them and stopping them whenever you like - without moving data back and forth between the local EC2 disks and Amazon's Simple Storage Sevice (S3).

www.theregister.co.uk/...cloudera_does_aws_ebs - Preview

cloudera hadoop amazon

01 Jun 09

Cloudera Hadoop & Big Data Blog » Blog Archive » Introducing Sqoop

Sqoop (”SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:

* Imports individual tables or entire databases to files in HDFS
* Generates Java classes to allow you to interact with your imported data
* Provides the ability to import from SQL databases straight into your Hive data warehouse

www.cloudera.com/...introducing-sqoop - Preview

cloudera sqoop hadoop mapreduce

29 May 09

Cloudera Hadoop & Big Data Blog » Blog Archive » Building a distributed concurrent queue with Apache ZooKeeper

ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having a set of processes wait until they’ve all reached the same point in their execution - a kind of distributed barrier - is surprisingly difficult to do correctly. ZooKeeper offers an API to facilitate this sort of distributed coordination.

www.cloudera.com/...nt-queue-with-apache-zookeeper - Preview

cloudera hadoop zookeeper python distributed queue via:pskomoroch

Hadoop User Group UK: HUGUK #2 - Wrap up

Practical MapReduce - (Tom White, Cloudera) video, slides
Introducing Apache Mahout - (Isabel Drost, ASF) video, slides

huguk.org/...huguk-2-wrap-up.html - Preview

apache mapreduce hadoop cloudera towatch mahout

13 Apr 09

Hadoop Training: Virtual Machine | Cloudera

  • In order to make it easy for you to get started with Hadoop and complete our various training exercises, we have created a virtual machine with everything you need. The VM includes Cloudera's Distribution for Hadoop, all of our example code, as well as eclipse and other standard tools
02 Apr 09

Amazon Elastic MapReduce

  • Amazon Elastic MapReduce is a web service
  • t utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3)
1 - 20 of 52 Next › Last »
Showing 20 items per page

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo