marcell mars's Library tagged → View Popular
Converting 11 million articles from TIFF to PDF-s on amazon EC2 & S3: Self-service, Prorated Super Computing Fun!
"I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. [..] thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3."
-
I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. I logged in, started Hadoop and submitted a test job to generate a couple thousands articles — and to my surprise it just worked.
I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3.
A short history of btrfs [LWN.net]
"we'll take a behind-the-scenes look at the design and development of btrfs on many levels - technical, political, personal - and trace it from its origins at a workshop to its current position as Linus's root file system. Knowing the background and motivation for each step will help you understand why btrfs was started, how it works, and where it's going in the future. By the end, you should be able to hand-wave your way through a description of btrfs's on-disk format."
-
we'll take a behind-the-scenes look at the design and
development of btrfs on many levels - technical, political, personal -
and trace it from its origins at a workshop to its current position as
Linus's root file system. Knowing the background and motivation for
each step will help you understand why btrfs was started, how it
works, and where it's going in the future. By the end, you should be
able to hand-wave your way through a description of btrfs's on-disk
format.
ProtoBlogr.net Blog: USB finger, more details
"As many of you might have already heard from me or various places on the net after my motorcycle accident last year
I got myself a prosthetic USB finger . After last night when my friend Bergie blogged about this many have contacted me through email and asking if it is true or how can I work while my finger is in the USB or Is it attached straight to my bone, etc.
So I think it is best I'll try to explain some of these here now. "
-
As many of you might have already heard from me or various places on the net after my motorcycle accident last year
I got myself a prosthetic USB finger .
After last night when my friend Bergie blogged about this many have contacted me through email and asking if it is true or how can I work while my finger is in the USB or Is it attached straight to my bone, etc.
So I think it is best I'll try to explain some of these here now.
Kadoo, Ka-done for now: bizjournals.com Business News - MSN Money
“We’re just trying to see what’s happening in the general economy and then we’ll make some decisions down the road,” he said. Until then, he said the Web site continues to operate in “hibernation-mode.”
-
“We’re just trying to see what’s happening in the general economy and then we’ll make some decisions down the road,” he said. Until then, he said the Web site continues to operate in “hibernation-mode.”
Home - MongoDB - 10gen Confluence
"MongoDB is a high-performance, open source, schema-free document database designed for cloud computing. The project's goal is a cloud-scale data store that's easy to deploy, manage and use."
-
MongoDB is a high-performance, open source, schema-free document database designed for cloud computing. The project's goal is a cloud-scale data store that's easy to deploy, manage and use.
Dead Swap
deadSwap is a social experiment exploring the possibilities of creating an entirely off-line fire-sharing and communications platform where people pass a USB memory stick from one to another.
-
deadSwap is a social experiment exploring the possibilities of creating an entirely off-line fire-sharing and communications platform where people pass a USB memory stick from one to another.
Dabble DB - Create an Online Database - Collect, report, and share your data
Dabble DB helps you create online databases on the web. It’s easy to use yet extremely flexible and powerful.
-
Dabble DB helps you create online databases on the web. It’s easy to use yet extremely flexible and powerful.
Tim Anderson
Inspired by the ease with which he could build sugary solids, Anderson decided to perform an experiment in precision sculpting. He borrowed an old ink-jet printer that printed by shooting ink straight down, put some sugar on an index card, and put the card inside the printer. When the printer had finished spraying ink onto the pile of sugar, Anderson took the index card out of the printer and lifted up the newly-formed 3-dimensional letters. Just as he had suspected, the act of applying ink to sugar crystals transformed the crystals into precisely sculpted solid objects. Pleased with these results, Anderson and Bredt set out to build a machine to take advantage of this breakthrough in "3-D printing." The first machine included parts from an old ink-jet printer and an abandoned wafer-transfer machine.
-
Inspired by the ease with which he could build sugary solids, Anderson
decided to perform an experiment in precision sculpting. He borrowed an
old ink-jet printer that printed by shooting ink straight down, put some
sugar on an index card, and put the card inside the printer. When the
printer had finished spraying ink onto the pile of sugar, Anderson took
the index card out of the printer and lifted up the newly-formed 3-dimensional
letters. Just as he had suspected, the act of applying ink to sugar crystals
transformed the crystals into precisely sculpted solid objects.
Pleased with these results, Anderson and Bredt set out to build a machine
to take advantage of this breakthrough in "3-D printing." The
first machine included parts from an old ink-jet printer and an abandoned
wafer-transfer machine.
Data Scraping Wikipedia with Google Spreadsheets « OUseful.Info, the blog…
We have scraped some data from a wikipedia page into a Google spreadsheet using the =importHTML formula, published a handful of rows from the table as CSV, consumed the CSV in a Yahoo pipe and created a geocoded KML feed from it, and then displayed it in a YahooGoogle map.
-
we have scraped some data from a wikipedia page into a Google spreadsheet using the =importHTML formula, published a handful of rows from the table as CSV, consumed the CSV in a Yahoo pipe and created a geocoded KML feed from it, and then displayed it in a
YahooGoogle map.
Opentape
Opentape is a free, open-source package that lets you make and host your own mixtapes on the web.
-
Opentape is a free, open-source package that lets you make and host your own mixtapes on the web.
-
Opentape is a free, open-source package that lets you make and host your own mixtapes on the web.
Google Has All My Data – How Do I Back It Up?
Kill somebody. Then the gov't will back it all up for you! Easy.
-
Then the gov't will back it all up for you! Easy.
ext3undel
ext3undel is a collection of scripts to help you recover files from ext2/ext3 file systems, where you (accidentally) deleted them from
-
ext3undel is a collection of scripts to help you recover files from ext2/ext3 file systems, where you (accidentally) deleted them from
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS)
Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data. Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.
-
Scalable:
Hadoop can reliably store and process petabytes.
Economical:
It distributes the data and processing across clusters of
commonly available computers. These clusters can number into the
thousands of nodes.
Efficient:
By distributing the data, Hadoop can process it in parallel on
the nodes where the data is located. This makes it extremely
rapid.
Reliable:
Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
Hadoop is a software platform that lets one easily write and run
applications that process vast amounts of data.
Here's what makes Hadoop especially useful:
Hadoop
implements MapReduce,
using the Hadoop Distributed File System (HDFS) (see figure below.)
MapReduce divides applications into many small blocks of work.
HDFS creates multiple replicas of data blocks for reliability,
placing them on compute nodes around the cluster. MapReduce can
then process the data where it is located.
Hadoop has been demonstrated on clusters with 2000 nodes.
The current design target is 10,000 node clusters. -

Tahoe: A Secure Distributed Filesystem
The "Tahoe" project is a distributed filesystem, which safely stores files on multiple machines to protect against hardware failures. Cryptographic tools are used to ensure integrity and confidentiality, and a decentralized architecture minimizes single points of failure. Files can be accessed through a web interface or native system calls (via FUSE). Fine-grained sharing allows individual files or directories to be delegated by passing short URI-like strings through email. Tahoe grids are easy to set up, and can be used by a handful of friends or by a large company for thousands of customers.
-
The "Tahoe" project is a distributed filesystem, which safely
stores files on multiple machines to protect against hardware failures.
Cryptographic tools are used to ensure integrity and confidentiality, and a
decentralized architecture minimizes single points of failure. Files can be
accessed through a web interface or native system calls (via FUSE).
Fine-grained sharing allows individual files or directories to be delegated
by passing short URI-like strings through email. Tahoe grids are easy to set
up, and can be used by a handful of friends or by a large company for
thousands of customers.
pytagsfs
-
pytagsfs takes a set of tagged media files and presents them in a different directory structure based on the tag content.
Selected Tags
Related Tags
Sponsored Links
Top Contributors
Groups interested in storage
-
nas_usb-storage server
collection of ethernet usb2...
Items: 38 | Visits: 2080
Created by: toshel <<<<
-
Online Databases
Some online database and da...
Items: 11 | Visits: 88
Created by: smckearney
-
File Storage
Items: 26 | Visits: 103
Created by: Orli Yakuel
Diigo is about better ways to research, share and collaborate on information. Learn more »
Join Diigo
