The way I imagined building an incremental mapreduce mechanism, without storing
the intermediate data and just recomputing chunks that are out-of-date (which
would be lame), is to add one extra concept into the system: call it "demap". It
will basically create "negative entries" for the old data. This is basically
what Damien did by providing both the old and new data map calls, all the time,
just said differently, and I think my way might make the average call a lot
simpler. And I don't see any reason why my version wouldn't be parallelizable,
chainable, and generally yummy.
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.