Skip to main content

marcell mars's Library tagged compute   View Popular

05 Sep 09

Converting 11 million articles from TIFF to PDF-s on amazon EC2 & S3: Self-service, Prorated Super Computing Fun!

"I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. [..] thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3."

open.blogs.nytimes.com/...e-prorated-super-computing-fun - Preview

amazon distribution compute conversion programming linux java python storage

  • I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. I logged in, started Hadoop and submitted a test job to generate a couple thousands articles — and to my surprise it just worked.


    I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3.

30 Jul 09

A short history of btrfs [LWN.net]

"we'll take a behind-the-scenes look at the design and development of btrfs on many levels - technical, political, personal - and trace it from its origins at a workshop to its current position as Linus's root file system. Knowing the background and motivation for each step will help you understand why btrfs was started, how it works, and where it's going in the future. By the end, you should be able to hand-wave your way through a description of btrfs's on-disk format."

lwn.net/342892 - Preview

compute programming storage linux software

  • we'll take a behind-the-scenes look at the design and
    development of btrfs on many levels - technical, political, personal -
    and trace it from its origins at a workshop to its current position as
    Linus's root file system. Knowing the background and motivation for
    each step will help you understand why btrfs was started, how it
    works, and where it's going in the future. By the end, you should be
    able to hand-wave your way through a description of btrfs's on-disk
    format.
26 Jul 09

RANDOM.ORG - Introduction to Randomness and Random Numbers

"RANDOM.ORG is a true random number service that generates randomness via atmospheric noise. "

random.org/randomness - Preview

programming webservice science compute

  • RANDOM.ORG is a true random number service that generates
    randomness via atmospheric noise.
08 May 09

A Neighborhood of Infinity: The Three Projections of Doctor Futamura

"The Three Projections of Futamura are a sequence of applications of a programming technique called 'partial evaluation' or 'specialisation', each one more mind-bending than the previous one. But it shouldn't be programmers who have all the fun. So I'm going to try to explain the three projections in a way that non-programmers can maybe understand too."

blog.sigfpe.com/...ctions-of-doctor-futamura.html - Preview

programming theory compute

  • The Three Projections of Futamura are a sequence of applications of a programming technique called 'partial evaluation' or 'specialisation', each one more mind-bending than the previous one. But it shouldn't be programmers who have all the fun. So I'm going to try to explain the three projections in a way that non-programmers can maybe understand too.
06 Apr 09

erikfrey's bashreduce at master - GitHub

"bashreduce lets you apply your favorite unix tools in a mapreduce fashion across multiple machines/cores. There’s no installation, administration, or distributed filesystem."

github.com/...master - Preview

compute programming text ui distribution network floss software

  • bashreduce lets you apply your favorite unix tools in a mapreduce fashion across multiple machines/cores. There’s no installation, administration, or distributed filesystem.
30 Mar 09

Unladen-swallow - Plans for optimizing Python

"Produce a version of Python at least 5x faster than CPython. "

code.google.com/...ProjectPlan - Preview

python compute programming google

  • Produce a version of Python at least 5x faster than CPython.
26 Mar 09

http://www.ics.uci.edu/~eppstein/PADS/README.txt

"This is PADS, a library of Python Algorithms and Data Structures implemented by David Eppstein of the University of California, Irvine."

www.ics.uci.edu/...README.txt - Preview

python diy programming collection compute science

  • This is PADS, a library of Python Algorithms and Data Structures implemented by David Eppstein of the University of California, Irvine.
25 Feb 09

Home - MongoDB - 10gen Confluence

"MongoDB is a high-performance, open source, schema-free document database designed for cloud computing. The project's goal is a cloud-scale data store that's easy to deploy, manage and use."

www.mongodb.org/Home - Preview

storage compute distribution programming diy floss network software

  • MongoDB is a high-performance, open source, schema-free document database designed for cloud computing. The project's goal is a cloud-scale data store that's easy to deploy, manage and use.
29 Jan 09

AsmXml - A Fast XML Parser

AsmXml is a very fast XML parser and decoder for x86 platforms. It achieves high speed by using the following features: Written in pure assembler, Optimized memory access, Parsing and decoding at the same time. To give an idea of the relative speed of AsmXml, the fastest open source XML parsers process between 10 and 30 MBs of XML per seconds while AsmXml processes around 200 MBs per seconds (on an Athlon XP 1800+).

mkerbiquet.free.fr/...index.html - Preview

programming diy database compute

    • AsmXml is a very fast XML parser and decoder for x86 platforms.
      It achieves high speed by using the following features:

      • Written in pure assembler
      • Optimized memory access
      • Parsing and decoding at the same time


      To give an idea of the relative speed of AsmXml, the fastest open source
      XML parsers process between 10 and 30 MBs of XML per seconds while AsmXml
      processes around 200 MBs per seconds (on an Athlon XP 1800+).

21 Jan 09

The Wireworld computer

the first ever computer implemented as a cellular automaton that you might reasonably want to write a program for. The design was done by David Moore and Mark Owen, with the help of many others, between 1990 and 1992.

www.quinapalus.com/wi-index.html - Preview

programming diy compute vintage foxy visual design

  • the first ever computer
    implemented as a cellular automaton that you might reasonably want
    to write a program for.
    The design was done by David Moore
    and Mark Owen, with the help of many others, between 1990 and 1992.
17 Sep 08

montylingua :: a free, commonsense-enriched natural language understander

MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information.

web.media.mit.edu/...montylingua - Preview

language compute programming python java diy text

  • MontyLingua
    is a free*, commonsense-enriched, end-to-end natural language understander
    for English. Feed raw English text into MontyLingua, and the output
    will be a semantic interpretation of that text. Perfect for information
    retrieval and extraction, request processing, and question answering.
    From English sentences, it extracts subject/verb/object tuples,
    extracts adjectives, noun phrases and verb phrases, and extracts
    people's names, places, events, dates and times, and other semantic
    information.
07 Sep 08

Goodbye MapReduce, Hello Cascading

Cascading abstracts away MapReduce into a more natural logical model and provides a workflow management layer to handle things like intermediate data and data staleness. Cascading’s logical model abstracts away MapReduce into a convenient tuples, pipes, and taps model.

blog.rapleaf.com/dev - Preview

programming distribution compute java diy

  • Cascading abstracts away MapReduce into a more natural logical model and provides a workflow management layer to handle things like intermediate data and data staleness.


    Cascading’s logical model abstracts away MapReduce into a convenient tuples, pipes, and taps model.

13 Aug 08

nodal - generative software application for composing music

Nodal is a generative software application for composing music. It uses a novel method for the notation and playing of MIDI based music. This method is based around the concept of a user-defined graph. The graph consists of nodes (musical events) and edges (connections between events). You interactively define the graph, which is then traversed by any number of players who play the musical events as they encounter them on the graph. The time taken to travel from one node to another is based on the length of the edges that connect the nodes.

www.csse.monash.edu.au/...nodal - Preview

music software apple compute science ui design

  • Nodal is a generative software application for composing music. It
    uses a novel method for the notation and playing of MIDI based music. This
    method is based around the concept of a user-defined graph. The graph consists
    of nodes (musical events) and edges (connections between events). You interactively
    define the graph, which is then traversed by any number of players who play
    the musical events as they encounter them on the graph. The time taken to travel
    from one node to another is based on the length of the edges that connect the
    nodes.
02 Aug 08

Slashdot | Which Open Source Video Apps Use SMP Effectively?

Which open source video conversion apps take full native advantage of SMP? (And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)

tech.slashdot.org/article.pl - Preview

linux visual programming compute collection

  • Which open source video conversion apps take full native advantage of SMP? (And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)
05 May 08

Twitter Can Be Liberated - Here’s How

Distributing twitter can’t be done efficiently just via RSS because rapid and excessive polling would bring servers to a halt. Instead, Saad thinks wrapping RSS in XMPP, an open standards based instant messaging protocol that was originally created for Jabber and is now used in various applications including Google Talk, is the answer. XMPP allows for pushing of messages to subscribers, which removes the need for constant polling. For more of Saad’s thinking, see his site on their product SyncStream, and they’ve already written code that will do this based on their proposed standard called “GetPingd.” Twitter uses XMPP in their API already; third party applications like Google Talk integrate with Twitter via XMPP already.

www.techcrunch.com/...ter-can-be-liberated-heres-how - Preview

social software network distribution compute business

  • Twitter can be decentralized effectively.
  • This can’t be done efficiently just via RSS because rapid and excessive polling would bring servers to a halt. Instead, Saad thinks wrapping RSS in XMPP, an open standards based instant messaging protocol that was originally created for Jabber and is now used in various applications including Google Talk, is the answer. XMPP allows for pushing of messages to subscribers, which removes the need for constant polling. For more of Saad’s thinking, see his site on their product SyncStream, and they’ve already written code that will do this based on their proposed standard called “GetPingd.” Twitter uses XMPP in their API already; third party applications like Google Talk integrate with Twitter via XMPP already.
20 Jan 08

UNIX® Load Average Part 1: How It Works

  • Have you ever wondered how those three little numbers that appear in the UNIX®
    load average (LA) report are calculated?
1 - 17 of 17
Showing 20 items per page

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo