marcell mars's Library tagged → View Popular
Converting 11 million articles from TIFF to PDF-s on amazon EC2 & S3: Self-service, Prorated Super Computing Fun!
"I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. [..] thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3."
-
I was ready to deploy Hadoop and my code on a cluster of EC2 machines. For deployment, I created a custom AMI (Amazon Machine Image) for EC2 that was based on a Xen image from my desktop machine. Using some simple Python scripts and the boto library, I booted four EC2 instances of my custom AMI. I logged in, started Hadoop and submitted a test job to generate a couple thousands articles — and to my surprise it just worked.
I then began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3.
Circuit Simulator Applet
"This java applet is an electronic circuit simulator. When the applet starts up you will see an animated schematic of a simple LRC circuit. The green color indicates positive voltage. The gray color indicates ground. A red color indicates negative voltage. The moving yellow dots indicate current."
-
This java applet is an electronic circuit simulator. When the applet starts up you will see
an animated schematic of a simple LRC circuit. The green
color indicates positive voltage.
The gray color indicates ground.
A red color indicates negative voltage. The moving yellow dots indicate current.
montylingua :: a free, commonsense-enriched natural language understander
MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information.
-
MontyLingua
is a free*, commonsense-enriched, end-to-end natural language understander
for English. Feed raw English text into MontyLingua, and the output
will be a semantic interpretation of that text. Perfect for information
retrieval and extraction, request processing, and question answering.
From English sentences, it extracts subject/verb/object tuples,
extracts adjectives, noun phrases and verb phrases, and extracts
people's names, places, events, dates and times, and other semantic
information.
Clojure
Clojure is a dynamic programming language that targets the Java Virtual Machine. It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language - it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection. Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multithreaded designs.
-
<script type="text/javascript">
<!--
var autosaveDetected = false;
var checkoutStatus = '';
var checkoutUser = '';
var checkoutMessage = '';
function disableEditButton(element) {
if (element.className == 'WikiPageMenuEditButton') {
element.className='WikiPageMenuEditButtonDisabled';
}
jQuery(element).unbind("click");
element.style.cursor='default';
element.href='#';
return true;
}
function enableEditButton(element) {
log('enabledEditButton');
if (element.className == 'WikiPageMenuEditButtonDisabled') {
element.className='WikiPageMenuEditButton';
}
jQuery(element).click(function() {
jQuery('.WikiNotice').remove();
jQuery('.WikiBanner').remove();
try {
disableEditButton(element);
showFiles();
editorEnable();
return false;
} catch (e) {
element.href = '/page/edit/' + encodeURIComponent(wikispaces_page);
redirectToEditPage(false);
return true;
}
});
element.style.cursor='pointer';
element.href='/page/edit/' + encodeURIComponent(wikispaces_page);
return true;
}
function editorEnable() {
if (navigator.userAgent.toLowerCase().indexOf('safari') + 1 ||
navigator.userAgent.toLowerCase().indexOf('opera') + 1) {
redirectToEditPage(true);
exit();
}
try {
var url = '/page/dump/' + encodeURIComponent(wikispaces_page) + '?format=Wikispaces2';
jQuery.ajax({ url: url, type: "GET", timeout: 2000, global: false, error: redirectToEditPageCallback, success: editorEnableCallback });
} catch (e) {
redirectToEditPage(false);
}
}
function redirectToEditPageCallback(request, errorString, errorException) {
log('redirectToEditPageCallback');
redirectToEditPage(false);
}
function redirectToEditPage(textEditor) {
log('redirectToEditPage');
document.location = '/page/edit/' + encodeURIComponent(wikispaces_page) + (textEditor ? '?texteditor=1' : '');
}
function editorEnableCallback(xml) {
log('editorEnableCallback');
// Load the fetched XML and set it up for the editor
if (loadXMLData(xml)) {
if (checkoutStatus == 'locked' && (!wikispaces_isUserLoggedIn || checkoutUser != wikispaces_username)) {
alert(checkoutMessage);
enableEditButton(document.getElementById('editButton'));
return false;
}
// Fade before autosave or editor starts, so that we don't fade the popup windows
fadeEditor();
// Setup autosave
if (autosaveDetected) {
log('autosaveDetected');
if (typeof(Dialog) != 'undefined') {
if (typeof(showAutosavePopup) != 'undefined') {
log('showAutosavePopup');
showAutosavePopup();
} else {
log('ERROR: undefined showAutosavePopup');
redirectToEditPage(false);
return false;
}
} else {
log('ERROR: undefined Dialog');
redirectToEditPage(false);
return false;
}
} else {
//Only start the autosave loop after the modal autosave popup has closed or if it wasn't displayed
log('autosaveLoop setTimeout');
setTimeout("autosaveLoop()", 1000);
}
jQuery('#editor_wrap').show();
if (customEditorStart('WikispacesEditorContentHidden')) {
jQuery('#WikiTags').hide();
jQuery('#content_view').hide();
jQuery('#WikiAds').hide();
o = jQuery('#WikiAdMargin');
if (o) {
o.css('marginRight', 0);
}
return true;
} else {
log('customEditorStart failed');
}
}
// Something failed, and we're in a callback, so we have to change the page location to the edit page like this
redirectToEditPage(false);
}
function loadXMLData(xml) {
log('loadXMLData');
try {
var contentBlock = xml.getElementsByTagName('dump')[0].getElementsByTagName('content')[0];
var wikiPageData = '';
if (contentBlock && contentBlock.childNodes.length > 0) {
for (var i = 0; i < contentBlock.childNodes.length; i++) {
wikiPageData += contentBlock.childNodes[i].nodeValue;
}
}
var versionBlock = xml.getElementsByTagName('dump')[0].getElementsByTagName('version')[0]
if (versionBlock && versionBlock.firstChild) {
version = versionBlock.firstChild.nodeValue;
}
// If we got back a version, but not any data
if (!wikiPageData && version != 0) {
return false;
}
// Store loaded page data in the same place that the edit page would have it
if (wikiPageData) {
document.getElementById('WikispacesEditorContentHidden').innerHTML = wikiPageData;
}
checkoutStatus = xml.getElementsByTagName('checkout')[0].getElementsByTagName('checkoutStatus')[0].firstChild.nodeValue;
if (checkoutStatus == 'locked') {
checkoutUser = xml.getElementsByTagName('checkout')[0].getElementsByTagName('checkoutUser')[0].firstChild.nodeValue;
var checkoutDate = xml.getElementsByTagName('checkout')[0].getElementsByTagName('checkoutDate')[0].firstChild.nodeValue;
var checkoutRefreshDate = xml.getElementsByTagName('checkout')[0].getElementsByTagName('checkoutRefreshDate')[0].firstChild.nodeValue;
checkoutMessage = 'This page was locked for editing by ' + checkoutUser + ' on ' + checkoutDate + '. The page will be available for editing after the changes have been saved.';
}
if (xml.getElementsByTagName('dump')[0].getElementsByTagName('autosave').length > 0) {
var autosaveRoot = xml.getElementsByTagName('dump')[0].getElementsByTagName('autosave')[0];
var autosaveContentBlock = autosaveRoot.getElementsByTagName('autosaveContent')[0];
var autosaveContent = '';
if (autosaveContentBlock && autosaveContentBlock.childNodes.length > 0) {
for (var i = 0; i < autosaveContentBlock.childNodes.length; i++) {
autosaveContent += autosaveContentBlock.childNodes[i].nodeValue;
}
}
var autosaveDate = autosaveRoot.getElementsByTagName('autosaveDate')[0].firstChild.nodeValue;
autosaveVersion = autosaveRoot.getElementsByTagName('autosaveVersion')[0].firstChild.nodeValue;
document.getElementById('autosavePrompt').innerHTML = '<h1>Draft Recovered</h1><p class="wikispaces_p">We have recovered an unsaved draft of this page, created ' + autosaveDate + '.</p>' +
(autosaveVersion != version ? '<p class="wikispaces_p">However, another person has edited this page since your last draft. If you continue, their changes will be overwritten. To view these changes, <a href="/page/diff/' + encodeURIComponent(wikispaces_page) + '?v1=' + autosaveVersion + '&v2=' + version + '">click here</a>.</p>' : '');
jQuery('#autosaveContent').val(autosaveContent);
autosaveDetected = true;
}
return true;
} catch (e) {
return false;
}
}
// Link the edit button to the dynamic in-page editor and remove the href to the standalone editor
jQuery(document).ready(function() {
if (document.getElementById('editButton')) {
enableEditButton(document.getElementById('editButton'));
}
} );
//-->
</script>
<!-- The wiki div is styled in the customizable stylesheet -->

<!-- google_ad_section_start -->
Clojure is a dynamic programming language that targets the Java Virtual Machine. It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language - it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.
Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multithreaded designs.
Goodbye MapReduce, Hello Cascading
Cascading abstracts away MapReduce into a more natural logical model and provides a workflow management layer to handle things like intermediate data and data staleness. Cascading’s logical model abstracts away MapReduce into a convenient tuples, pipes, and taps model.
-
Cascading abstracts away MapReduce into a more natural logical model and provides a workflow management layer to handle things like intermediate data and data staleness.
Cascading’s logical model abstracts away MapReduce into a convenient tuples, pipes, and taps model.
The Shoebox — Ruby-Processing
This is no Shoes app. This is a Ruby wrapper that lets you harness Processing’s awesome power. It makes Processing act in a slightly more Shoes-like way, and replaces the ol’ crusty faux-Java-1.4-syntax sandals that Processing usually wears with some new
-
This is no Shoes app. This is a Ruby wrapper that lets you harness Processing’s awesome power. It makes Processing act in a slightly more Shoes-like way, and replaces the ol’ crusty faux-Java-1.4-syntax sandals that Processing usually wears with some new Ruby slippers.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS)
Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data. Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.
-
Scalable:
Hadoop can reliably store and process petabytes.
Economical:
It distributes the data and processing across clusters of
commonly available computers. These clusters can number into the
thousands of nodes.
Efficient:
By distributing the data, Hadoop can process it in parallel on
the nodes where the data is located. This makes it extremely
rapid.
Reliable:
Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
Hadoop is a software platform that lets one easily write and run
applications that process vast amounts of data.
Here's what makes Hadoop especially useful:
Hadoop
implements MapReduce,
using the Hadoop Distributed File System (HDFS) (see figure below.)
MapReduce divides applications into many small blocks of work.
HDFS creates multiple replicas of data blocks for reliability,
placing them on compute nodes around the cluster. MapReduce can
then process the data where it is located.
Hadoop has been demonstrated on clusters with 2000 nodes.
The current design target is 10,000 node clusters. -

Hecl - The Mobile Scripting Language
-
The Hecl Programming Language is a high-level, open source
scripting language implemented in Java. It is intended to be
small, extensible, extremely flexible, and easy to learn and
use. Infact, it's small enough that it runs on J2ME-enabled
cell phones!
Selected Tags
Related Tags
Sponsored Links
Top Contributors
Groups interested in java
-
Java and Java script Programind
Codes and techniques of pro...
Items: 4 | Visits: 113
Created by: stefan stoichev
-
Java
Items: 574 | Visits: 146
Created by: Lubos Pochman
Highlighter, Sticky notes, Tagging, Groups and Network: integrated suite dramatically boosting research productivity. Learn more »
Join Diigo
