Skip to main content

marcell mars's Library tagged storage   View Popular

05 Sep 09

Converting 11 million articles from TIFF to PDF-s on amazon EC2 & S3: Self-service, Prorated Super Computing Fun!

"I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. [..] thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3."

open.blogs.nytimes.com/...e-prorated-super-computing-fun - Preview

amazon distribution compute conversion programming linux java python storage

  • I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. I logged in, started Hadoop and submitted a test job to generate a couple thousands articles — and to my surprise it just worked.


    I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3.

30 Jul 09

A short history of btrfs [LWN.net]

"we'll take a behind-the-scenes look at the design and development of btrfs on many levels - technical, political, personal - and trace it from its origins at a workshop to its current position as Linus's root file system. Knowing the background and motivation for each step will help you understand why btrfs was started, how it works, and where it's going in the future. By the end, you should be able to hand-wave your way through a description of btrfs's on-disk format."

lwn.net/342892 - Preview

compute programming storage linux software

  • we'll take a behind-the-scenes look at the design and
    development of btrfs on many levels - technical, political, personal -
    and trace it from its origins at a workshop to its current position as
    Linus's root file system. Knowing the background and motivation for
    each step will help you understand why btrfs was started, how it
    works, and where it's going in the future. By the end, you should be
    able to hand-wave your way through a description of btrfs's on-disk
    format.
19 Mar 09

ProtoBlogr.net Blog: USB finger, more details

"As many of you might have already heard from me or various places on the net after my motorcycle accident last year
I got myself a prosthetic USB finger . After last night when my friend Bergie blogged about this many have contacted me through email and asking if it is true or how can I work while my finger is in the USB or Is it attached straight to my bone, etc.
So I think it is best I'll try to explain some of these here now. "

protoblogr.net/...usb_finger-more_details.html - Preview

body usb storage posthuman medicine hardware

15 Mar 09

Kadoo, Ka-done for now: bizjournals.com Business News - MSN Money

“We’re just trying to see what’s happening in the general economy and then we’ll make some decisions down the road,” he said. Until then, he said the Web site continues to operate in “hibernation-mode.”

news.moneycentral.msn.com/...providerarticle.aspx - Preview

business webservice software distribution storage social network

  • “We’re just trying to see what’s happening in the general economy and then we’ll make some decisions down the road,” he said. Until then, he said the Web site continues to operate in “hibernation-mode.”
25 Feb 09

Home - MongoDB - 10gen Confluence

"MongoDB is a high-performance, open source, schema-free document database designed for cloud computing. The project's goal is a cloud-scale data store that's easy to deploy, manage and use."

www.mongodb.org/Home - Preview

storage compute distribution programming diy floss network software

  • MongoDB is a high-performance, open source, schema-free document database designed for cloud computing. The project's goal is a cloud-scale data store that's easy to deploy, manage and use.
05 Feb 09

Dead Swap

deadSwap is a social experiment exploring the possibilities of creating an entirely off-line fire-sharing and communications platform where people pass a USB memory stick from one to another.

deadswap.net/DeadSwap - Preview

social storage distribution gsm text network art

  • deadSwap is a social experiment exploring the possibilities of creating an entirely off-line fire-sharing and communications platform where people pass a USB memory stick from one to another.
27 Jan 09

Dabble DB - Create an Online Database - Collect, report, and share your data

Dabble DB helps you create online databases on the web. It’s easy to use yet extremely flexible and powerful.

dabbledb.com - Preview

webservice programming storage software ui

  • Dabble DB helps you create online databases on the web. It’s easy to use yet extremely flexible and powerful.
24 Jan 09

Tim Anderson

Inspired by the ease with which he could build sugary solids, Anderson decided to perform an experiment in precision sculpting. He borrowed an old ink-jet printer that printed by shooting ink straight down, put some sugar on an index card, and put the card inside the printer. When the printer had finished spraying ink onto the pile of sugar, Anderson took the index card out of the printer and lifted up the newly-formed 3-dimensional letters. Just as he had suspected, the act of applying ink to sugar crystals transformed the crystals into precisely sculpted solid objects. Pleased with these results, Anderson and Bredt set out to build a machine to take advantage of this breakthrough in "3-D printing." The first machine included parts from an old ink-jet printer and an abandoned wafer-transfer machine.

www.progressiveengineer.com/...Anders.htm - Preview

diy 3d print hardware vintage storage social

  • Inspired by the ease with which he could build sugary solids, Anderson
    decided to perform an experiment in precision sculpting. He borrowed an
    old ink-jet printer that printed by shooting ink straight down, put some
    sugar on an index card, and put the card inside the printer. When the
    printer had finished spraying ink onto the pile of sugar, Anderson took
    the index card out of the printer and lifted up the newly-formed 3-dimensional
    letters. Just as he had suspected, the act of applying ink to sugar crystals
    transformed the crystals into precisely sculpted solid objects.


    Pleased with these results, Anderson and Bredt set out to build a machine
    to take advantage of this breakthrough in "3-D printing." The
    first machine included parts from an old ink-jet printer and an abandoned
    wafer-transfer machine.

19 Oct 08

Data Scraping Wikipedia with Google Spreadsheets « OUseful.Info, the blog…

We have scraped some data from a wikipedia page into a Google spreadsheet using the =importHTML formula, published a handful of rows from the table as CSV, consumed the CSV in a Yahoo pipe and created a geocoded KML feed from it, and then displayed it in a YahooGoogle map.

ouseful.wordpress.com/...pedia-with-google-spreadsheets - Preview

google yahoo semantic software map storage programming social design

  • we have scraped some data from a wikipedia page into a Google spreadsheet using the =importHTML formula, published a handful of rows from the table as CSV, consumed the CSV in a Yahoo pipe and created a geocoded KML feed from it, and then displayed it in a YahooGoogle map.
25 Aug 08

Opentape

Opentape is a free, open-source package that lets you make and host your own mixtapes on the web.

opentape.fm - Preview

music semantic storage webservice distribution software social business floss

  • Opentape is a free, open-source package that lets you make and host your own mixtapes on the web.
  • Opentape is a free, open-source package that lets you make and host your own mixtapes on the web.
10 Aug 08

Google Has All My Data – How Do I Back It Up?

Kill somebody. Then the gov't will back it all up for you! Easy.

ask.slashdot.org/article.pl - Preview

discussion google storage social software webservice security privacy

  • Then the gov't will back it all up for you! Easy.
15 Jul 08

ext3undel

ext3undel is a collection of scripts to help you recover files from ext2/ext3 file systems, where you (accidentally) deleted them from

projects.izzysoft.de/...ext3undel - Preview

storage floss software linux

  • ext3undel is a collection of scripts to help you recover files from ext2/ext3 file systems, where you (accidentally) deleted them from
29 May 08

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS)

Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data. Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.

hadoop.apache.org/core - Preview

programming storage compare measure diy floss software java

    • Hadoop is a software platform that lets one easily write and run
      applications that process vast amounts of data.




      Here's what makes Hadoop especially useful:




      • Scalable:
        Hadoop can reliably store and process petabytes.

      • Economical:
        It distributes the data and processing across clusters of
        commonly available computers. These clusters can number into the
        thousands of nodes.

      • Efficient:
        By distributing the data, Hadoop can process it in parallel on
        the nodes where the data is located. This makes it extremely
        rapid.

      • Reliable:
        Hadoop automatically maintains multiple copies of data and
        automatically redeploys computing tasks based on failures.





      Hadoop
      implements MapReduce,
      using the Hadoop Distributed File System (HDFS) (see figure below.)
      MapReduce divides applications into many small blocks of work.
      HDFS creates multiple replicas of data blocks for reliability,
      placing them on compute nodes around the cluster. MapReduce can
      then process the data where it is located.




      Hadoop has been demonstrated on clusters with 2000 nodes.
      The current design target is 10,000 node clusters.

  • architecture
09 May 08

Tahoe: A Secure Distributed Filesystem

The "Tahoe" project is a distributed filesystem, which safely stores files on multiple machines to protect against hardware failures. Cryptographic tools are used to ensure integrity and confidentiality, and a decentralized architecture minimizes single points of failure. Files can be accessed through a web interface or native system calls (via FUSE). Fine-grained sharing allows individual files or directories to be delegated by passing short URI-like strings through email. Tahoe grids are easy to set up, and can be used by a handful of friends or by a large company for thousands of customers.

allmydata.org/...pycon-tahoe.html - Preview

storage distribution floss software python network

  • The "Tahoe" project is a distributed filesystem, which safely
    stores files on multiple machines to protect against hardware failures.
    Cryptographic tools are used to ensure integrity and confidentiality, and a
    decentralized architecture minimizes single points of failure. Files can be
    accessed through a web interface or native system calls (via FUSE).
    Fine-grained sharing allows individual files or directories to be delegated
    by passing short URI-like strings through email. Tahoe grids are easy to set
    up, and can be used by a handful of friends or by a large company for
    thousands of customers.
10 Mar 08

pytagsfs

  • pytagsfs takes a set of tagged media files and presents them in a different directory structure based on the tag content.
1 - 16 of 16
Showing 20 items per page

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo