Rhm2ktmi's List: Big Data

Oracle enters Hadoop data discovery fray with Big Data Discovery 2

Mar 17, 15

451research.com/report-short oracle hadoop data discovery
- Oracle recently packed four big data-related products into a single announcement, headlined by the launch of Oracle Big Data Discovery (BDD), a new Hadoop data discovery tool. For the other three offerings – Oracle GoldenGate for Big Data, Oracle Big Data SQL and Oracle NoSQL Database 3.2.5 – Oracle hit the refresh button
- Oracle BDD is a new data-discovery, preparation and analysis tool that runs as a native application on the Oracle Big Data Appliance (BDA). Cloudera's Distribution, including Apache Hadoop, comes packaged with BDA, so that is currently the distribution compatible with Oracle BDD. However, Oracle is developing a Hortonworks-compatible version. But Oracle BDD is not tied to BDA, and can also be installed on a stand-alone Cloudera cluster.
DataStax Acquires Aurelius, The Experts Behind TitanDB : DataStax 2

Mar 02, 15

www.datastax.com/...ius-the-experts-behind-titandb TITAN DataStax
- DataStax, the company that delivers Apache Cassandra™ to the enterprise, today announced the acquisition of Aurelius LLC, the innovators behind the open source graph database, Titan.
- As the leading experts in graph database technology, the Aurelius team will join DataStax to build DataStax Enterprise (DSE) Graph, adding graph database capabilities into DSE alongside Apache Cassandra, DSE Search and Analytics. The addition of graph technology to DSE will empower enterprises with true ‘multi-model’ capabilities that deliver new levels of power and flexibility to transactional applications.
The Druid real-time database moves to an Apache license | Gigaom 1

Mar 01, 15

gigaom.com/...ase-moves-to-an-apache-license druid database apache
- Druid, an open source database designed for real-time analysis, is moving to the Apache 2 software license in order to hopefully spur more use of and innovation around the project. It was open sourced in late 2012 under the GPL license, which is generally considered more restrictive than the Apache license in terms of how software can be reused.
The Open Data Platform, like United Linux before it, will fail - TechRepublic

Feb 22, 15

www.techrepublic.com/...ited-linux-before-it-will-fail ODP hadoop Pivotal
ISO/IEC 27018:2014 - Information technology -- Security techniques -- Code of practice for protection of personally identifiable information (PII) in public clouds acting as PII processors 1

Feb 18, 15

www.iso.org/...catalogue_detail.htm cloud data privacy standard #DLM
- Information technology -- Security techniques -- Code of practice for protection of personally identifiable information (PII) in public clouds acting as PII processors
Apache Storm and Kafka Together: A Real-time Data Refinery 1

Feb 18, 15

hortonworks.com/...gether-real-time-data-refinery storm Kafka
- In this blog, we will focus on one of those data processing engines—Apache Storm—and its relationship with Apache Kafka. I will describe how Storm and Kafka form a multi-stage event processing pipeline, discuss some use cases, and explain Storm topologies.
Open Data Platform

Feb 17, 15

opendataplatform.org data platform
Hortonworks launches data-governance initiative for Hadoop 2

Jan 30, 15

451research.com/report-short data governance hadoop
- Hortonworks has launched an initiative to improve data governance in Apache Hadoop by providing a single data-governance foundation for the Hadoop stack. Data governance was not initially a major concern for Hadoop, but it has become increasingly relevant as large enterprises want to use Hadoop in more mission-critical production deployments and deploy multipurpose Hadoop clusters (so-called 'data lakes').
- The result will be a new Apache incubator project (or projects; the fine details are still being worked on) to create new knowledge-store, policy-engine and audit-store functionality that will plug into Apache Falcon and Apache Ranger (the evolution of the security policy engine that Hortonworks acquired with XA Secure).
Big Data And Data Protection: Preparing For Tales Of The Unexpected - Data Protection - UK 1

Jan 13, 15

"Data protection law – the bundle of statutory duties on those who handle personal data about individuals and the corresponding rights for the individuals concerned – sits plumb in the centre of data law, an increasingly broad and complex amalgam of contract law, intellectual property and regulation.

An important area of looming challenge for data protection lawyers at the moment is Big Data, the aggregation and analysis of datasets of great volume, variety and velocity for the purpose of competitive advantage1, where the business world is just at the start of a period of rapid adoption.

"

www.mondaq.com/...Law+Firm+StartUps+Then+And+Now data protection law
- Data protection law – the bundle of statutory duties on those who handle personal data about individuals and the corresponding rights for the individuals concerned – sits plumb in the centre of data law, an increasingly broad and complex amalgam of contract law, intellectual property and regulation.
  
  An important area of looming challenge for data protection lawyers at the moment is Big Data, the aggregation and analysis of datasets of great volume, variety and velocity for the purpose of competitive advantage¹, where the business world is just at the start of a period of rapid adoption.
Announcing Apache Falcon 0.6.0 2

Jan 11, 15

With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and rea...

hortonworks.com/...announcing-apache-falcon-0-6-0 apache falcon
- With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. As more data flows into and through a Hadoop cluster to feed these engines, Apache Falcon is a crucial framework for simplifying data management and pipeline processing.
- Among these many bug fixes, improvements and new features, four stand out as particularly important:
  
  Authorization with ACLs for entities
  
  Enhancements to lineage metadata
  
  Cloud archival
  
  Falcon recipes
Lockheed Martin Introduces Open Source Software Platform For Simpler Real-Time Analytics Processing And Analysis - MarketWatch 1

Jan 10, 15

www.marketwatch.com/...essing-and-analysis-2015-01-08
- The StreamFlow™ software project is designed to make working with Apache Storm, a free and open source distributed real-time computation system, easier and more productive. A Storm application ingests significant amounts of data through the use of topologies, or set of rules that govern how a network is organized. These topologies categorize the data streams into understandable pipelines.
Flafka: Apache Flume Meets Apache Kafka for Event Processing | Cloudera Engineering Blog 2

Dec 12, 14

blog.cloudera.com/...che-kafka-for-event-processing Flume Kafka
- One key feature of Kafka is its functional simplicity. While there is a lot of sophisticated engineering under the covers, Kafka’s general functionality is relatively straightforward. Part of this simplicity comes from its independence from any other applications (excepting Apache ZooKeeper). As a consequence however, the responsibility is on the developer to write code to either produce or consume messages from Kafka. While there are a number of Kafka clients that support this process, for the most part custom coding is required.
- Cloudera engineers and other open source community members have recently committed code for Kafka-Flume integration, informally called “Flafka,” to the Flume project. Flume is a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of data from many different sources to a centralized data store. Flume provides a tested, production-hardened framework for implementing ingest and real-time processing pipelines. Using the new Flafka source and sink, now available in CDH 5.2, Flume can both read and write messages with Kafka.
DeepDive 1

Dec 11, 14

deepdive.stanford.edu
- DeepDive is a new type of system that enables developers to analyze data on a deeper level than ever before. DeepDive is a trained system: it uses machine learning techniques to leverage on domain-specific knowledge and incorporates user feedback to improve the quality of its analysis.
InfluxDB - Open Source Time Series, Metrics, and Analytics Database

Dec 11, 14

influxdb.com timeseries database metrics analytics influxdb
Starter Pack - Elasticsearch

Dec 04, 14

p.brightact.com/...1400523435463812 ElasticSearch ELK
Glassbeam integrates with Apache Spark for in-memory machine data processing 8

Nov 27, 14

451research.com/report-short apache spark Glassbeam IoT
- Glassbeam says that with the latest version of its SCALAR data processing engine, it is prepared for the IoT, which will require real-time data analytics able to handle up to billions of sensor readings. It already had a fast analytics platform, but integration with Apache Spark enables real-time analytics, as well as predictive analytics and machine learning.
- The company started with Glassbeam Analytics for standard and custom analytics on machine-generated data, and then introduced Glassbeam Explorer for search and exploratory analysis in 2013.
- It later added a back-end platform to its portfolio, dubbed SCALAR, which is designed to bring performance and scalability to the process of analyzing data from the IoT and is delivered via the cloud in a SaaS model.
- Glassbeam counts IBM, HDS, Aruba Networks, Dimension Data and Meru Networks among its customers.
- Due to its focus on machine data analytics Glassbeam will be compared with Splunk, but it says it can ingest more data types than Splunk
- Others in the log management space that it will likely be compared with include Sumo Logic, Loggly, Logentries, X15 Software and TIBCO (through its acquisition of LogLogic)
- more on data created by sensors or data streaming in from smart devices
- In this camp, we'd put the likes of ParStream, IBM InfoSphere Streams, DataTorrent, SQLstream and Software AG Apama
6 more annotations...
Cask 3

Nov 16, 14

cask.co hadoop Cask
- Deliver the Cask Data Application Platform (CDAP), an open source application development platform for the Hadoop ecosystem that provides developers with data and application virtualization to accelerate application development, address a broader range of real-time and batch use cases, and deploy applications into production while satisfying enterprise requirements.
- Data Virtualization
  
  Logical representations of data.
- App Virtualization
  
  Standardized containers for apps
1 more annotation...
GridGain ignites In-Memory Data Fabric via new Apache incubator project 5

Nov 12, 14

451research.com/report-short data fabric In-Memory
- It has now gone one step further, because the core technology has been proposed as an Apache Software Foundation project.
- GridGain announced the release of its in-memory data processing technology using the Apache License in March.
- The proposal is for the combined in-memory compute, grid, streaming and Hadoop acceleration technology to become Apache Ignite and be developed by a community of interested parties via the Apache Software Foundation's community development process.
- Development is likely to be dominated by GridGain employees, who make up just over half of initial list of committers, with the rest represented by employees of WANdisco, Pivotal, ChronoTrack and FitechSource (although it should be noted that their involvement in Apache Ignite would be as individuals at this stage).
- GridGain already faces a number of open source competitors, such as Hazelcast and Red Hat (with both JBoss Data Grid and the associated InfiniSpan project), as well as a number of established closed source providers such as Oracle, TIBCO, Software AG, IBM, Pivotal, GigaSpaces and ScaleOut Software.
3 more annotations...
Paxata refreshes Hadoop-based data preparation service in hopes of furthering growth 11

Nov 08, 14

451research.com/report-short
- Paxata has released the Fall 2014 version of its cloud service, which is designed to provide a single metadata layer as well as process and execution models for data preparation tasks. The latest releases marks one of the first major makeovers of the startup's multi-tenant service for data integration, quality, enrichment, governance and collaboration.
- Paxata has a fresh cut of its data preparation service on the market following the general availability of Fall 2014 in October. The latest version includes numerous enhancements to bolster performance, flexibility and connectivity. It also contains improvements to the front-end application used by business analysts to wrangle data in a self-service fashion.
- Support for the Spark in-memory processing engine is a headline enhancement.
- Spark support aside, Fall 2014 contains additional proprietary IP, such as the Pax compiler. The proprietary IP Paxata has added is mainly in a layer dubbed Intellifusion, which is its patent-pending semantic and machine-learning engine. Intellifusion sits above a Spark-based distributed processing engine, which in turn sits on a scheduling and resource management distributed system built on the Hadoop Distributed File System (HDFS).
- The Pax compiler, which is for Pascal, Basic and JavaScript languages, generates machine code for Intel processors. Intellifusion also houses internally developed capabilities for automatically detecting data types, patterns, relationships, anomalies and errors.
- Paxata is not alone in courting self-service data preparation opportunities
- Tamr and Trifacta are fellow startups in this emerging sector, which is also now being targeted by the data management establishment with Informatica's Project Springbok for access, profiling, transformation and enrichment of data sources, as well as IBM DataWorks
- While all of these offerings have a slightly different focus, we think Trifacta is closest in nature to Paxata
- However, Paxata disagrees, arguing that Trifacta is focused on a different audience and is looking to attract data scientists – not the Excel POWER users it is targeting.
- When it comes to Hadoop-based data management opportunities, we believe startup Waterline Data Sciences also provides some rivalry given its focus on profiling and discovering data in Hadoop
- Lastly, it will be interesting to see how far data discovery and visual analysis offerings will extend into the data preparation arena, as the latter makes a good bedfellow for the former. IBM has recognized this synergy, folding DataWorks into its Watson Analytics visual discovery cloud service.
9 more annotations...
The Netflix Tech Blog: Introducing Dynomite - Making Non-Distributed Databases, Distributed 2

Nov 08, 14

techblog.netflix.com/...introducing-dynomite.html netflix distributed databases microservices
- Netflix has long been a proponent of the microservices model. This model offers higher-availability, resiliency to failure and loose coupling. The downside to such an architecture is the potential for a latent user experience.
- Most of these microservices use some kind of stateful system to store and serve data. A few milliseconds here and there can add up quickly and result in a multi-second response time.

1 - 20 of 1058 Next › Last »

20 items/page

List Comments (0)

List Info

Rhm2ktmi

1058 items | 11 visits

Updated on Mar 22, 15
Created on Sep 16, 12

Category: Computers & Internet

URL:

Rhm2ktmi's List: Big Data

Oracle enters Hadoop data discovery fray with Big Data Discovery 2

DataStax Acquires Aurelius, The Experts Behind TitanDB : DataStax 2

The Druid real-time database moves to an Apache license | Gigaom 1

The Open Data Platform, like United Linux before it, will fail - TechRepublic

ISO/IEC 27018:2014 - Information technology -- Security techniques -- Code of practice for protection of personally identifiable information (PII) in public clouds acting as PII processors 1

Apache Storm and Kafka Together: A Real-time Data Refinery 1

Open Data Platform

Hortonworks launches data-governance initiative for Hadoop 2

Big Data And Data Protection: Preparing For Tales Of The Unexpected - Data Protection - UK 1

Announcing Apache Falcon 0.6.0 2

Lockheed Martin Introduces Open Source Software Platform For Simpler Real-Time Analytics Processing And Analysis - MarketWatch 1

Flafka: Apache Flume Meets Apache Kafka for Event Processing | Cloudera Engineering Blog 2

DeepDive 1

InfluxDB - Open Source Time Series, Metrics, and Analytics Database

Starter Pack - Elasticsearch

Glassbeam integrates with Apache Spark for in-memory machine data processing 8

Cask 3

Data Virtualization

App Virtualization

GridGain ignites In-Memory Data Fabric via new Apache incubator project 5

Paxata refreshes Hadoop-based data preparation service in hopes of furthering growth 11

The Netflix Tech Blog: Introducing Dynomite - Making Non-Distributed Databases, Distributed 2

List Info