"Because data science is growing so rapidly, we now have a massive ecosystem of useful tools. I've spent the past month or so trying to organize this ecosystem into a coherent portrait and, over the next few days, I'm going to roll it out and explain what I think it all means."
"Spark is an Apache project advertised as “lightning fast cluster computing”. It has a thriving open-source community and is the most active Apache project at the moment.
Spark provides a faster and more general data processing platform. Spark lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop. Last year, Spark took over Hadoop by completing the 100 TB Daytona GraySort contest 3x faster on one tenth the number of machines and it also became the fastest open source engine for sorting a petabyte."
"It is time for us to formally define what a category is, to see a wealth of examples. In our next post we’ll see how the definitions laid out here translate to programming constructs. As we’ve said in our soft motivational post on categories, the point of category theory is to organize mathematical structures across various disciplines into a unified language. As such, most of this post will be devote to laying down the definition of a category and the associated notation. We will be as clear as possible to avoid a notational barrier for newcomers, so if anything is unclear we will clarify it in the comments."
An introduction to Category Theory for
"In one of my posts on type level meta programming in Scala the question of Turing completeness came up already. The question is whether Scala’s type system can be used to force the Scala compiler to carry out any calculation which a Turing machine is capable of. Various of my older posts show how Scala’s type system can be used to encode addition and multiplication on natural numbers and how to encode conditions and bounded loops.
"An HList is an arbitrary-length tuple. The advantages of an HList over the TupleX classes in the Scala standard library are that we are (in theory) not restricted to a fixed length and we do not need to duplicate methods for each tuple arity."
"Introduction to Machine Learning with Python and Scikit-Learn"
"Scala as a platform for statistical computing and data science"
Actor-based Concurrency in Newspeak 4
"There has been a scurry of reaction on twitter and reddit to Robert Fischer’s post saying that Scala is Not a Functional Programming Language. James Iry responded by saying that analogous reasoning leads to the conclusion that Erlang is Not Functional"
"Robert Fischer has a post on his blog: Scala is not a functional language. It’s not the first time this idea has come up. It’s not an idea entirely without merit. Scala’s functional features are certainly not as seamless to use as one might hope."
"Designed for the new command line user, this 537-page volume covers the same material as LinuxCommand.org but in much greater detail. In addition to the basics of command line use and shell scripting, The Linux Command Line includes chapters on many common programs used on the command line, as well as more advanced topics."
"This tutorial shows off much of GNU Parallel's functionality. The tutorial is meant to learn the options in GNU Parallel. The tutorial is not to show realistic examples from the real world.
Spend an hour walking through the tutorial. Your command line will love you for it."