This link has been bookmarked by 24 people . It was first bookmarked on 17 Aug 2006, by Simon.
-
10 Nov 09
-
MapReduce is a software framework introduced by Google to support distributed computing on large data sets on clusters of computers
-
-
02 Nov 09
-
05 May 09
-
03 May 09
-
30 Apr 09
-
-
defined with respect to data structured in (key, value) pairs.
- 7 more annotations...
-
-
Map takes one pair of data with a type on a data domain, and returns a list of pairs in a different domain:
Map(k1,v1) -> list(k2,v2) -
ct to data stru
-
the MapReduce framework collects all pairs with the same key from all lists and groups them together, thus creating one group for each one of the different generated keys.
-
The Reduce function is then applied in parallel to each group, which in turn produces a collection of values in the same domain:
Reduce(k2, list (v2)) -> list(v2) -
This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.
-
count the appearances of each different word in a set of documents
-
each document is split in words, and each word is counted initially with a "1" value by the Map function, using the word as the result key. The framework puts together all the pairs with the same key and feeds them to the same call to Reduce, thus this function just needs to sum all of its input values to find the total appearances of that word.
-
-
-
26 Apr 09
-
31 Mar 09
-
master node
-
The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. (A wor
- 2 more annotations...
-
-
master node
-
master node
-
-
-
01 Mar 09
-
12 Jan 09
-
The framework is inspired by map and reduce functions commonly used in functional programming,[2] although their purpose in the MapReduce framework is not the same as their original forms
-
MapReduce is a framework for computing certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster.
- 10 more annotations...
-
-
The Map and Reduce functions of MapReduce are both defined with respect to data structured in (key, value) pairs.
-
Map(k1,v1) -> list(k2,v2)
-
Reduce(k2, list (v2)) -> list(v2)
-
Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values. This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.
-
The canonical example application of MapReduce is a process to count the appearances of each different word in a set of documents
-
map(String name, String document):
// key: document name
// value: document contents
for each word w in document:
EmitIntermediate(w, 1);
reduce(String word, Iterator partialCounts):
// key: a word
// values: a list of aggregated partial counts
int result = 0;
for each v in partialCounts:
result += ParseInt(v);
Emit(result); -
- an input reader
- a Map function
- a partition function
- a compare function
- a Reduce function
- an output writer
The hot spots, which the application defines, are:
-
The frozen part of the MapReduce framework is a large distributed sort
-
MapReduce is useful in a wide range of applications, including: "distributed grep, distributed sort, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, statistical machine translation..."
-
-
-
30 Dec 08
-
21 Dec 08
Todd Suomelaincludes links to current projects in a variety of languages: java, python, erlang, etc
-
24 Sep 08
-
04 Jun 08
-
24 Apr 08
-
07 Aug 07
-
03 Aug 07
-
25 Jul 07
Matti NarkiaMapReduce is a software framework originally developed by Google to support parallel computations over large (greater than 100 terabyte) data sets on unreliable clusters of computers. The name is derived from the map and reduce functions commonly used in
MapReduce Wikipedia software framework parallel computations computing processing google map reduce functional programming popular
-
30 Jun 07
-
25 Feb 07
-
28 Jan 07
-
17 Aug 06
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.