Creating Hadoop and Impala friendly partitioned data with Kite SDK

Feb 09

Creating Hadoop and Impala friendly partitioned data with Kite SDK

When working with Hadoop and SQL-On-Hadoop systems like Impala, we have to think about couple of important factors – how to serialize data for storage and processing and how to partition the data. Majority of Hadoop practitioner now agree that most flexible and performant would be combination of Avro and Parquet formats. So let’s dive into some...

Read More

Using Docker containers to perform quick prototypes

Jan 19

Using Docker containers to perform quick prototypes

Docker container, alongside with cloud computing, is a big enablers of innovation. One of the use cases which we constantly face as developers is the ability quickly install and run software packages that we are not familiar with but need to explore and create proof-of-concepts or work on spike, to validate our design decision. We often need to create...

Read More

Docker Revolution

Jan 01

Docker Revolution

Definitely 2014 was a year of Big Data. But with all the new development in Big Data, I almost missed on one very important new trend in the industry. I think it will change the way we build, deploy and package software – will be it Big Data frameworks, web sites, java apps or python libraries. I believe in 2015 this technology will...

Read More

Probabilistic data Structures – Bloom filter and HyperLogLog for Big Data

Oct 11

Probabilistic data Structures – Bloom filter and HyperLogLog for Big Data

When working with large volume of data memory and space requirement could be very high. This in turn have effect on scalability, when suddenly your job or process either taking too long or requires more resources. Probabilistic data structure allow you to trade some accuracy for immense decrease in memory usage. For example,...

Read More

Streaming Platforms: Storm vs Spark Streaming

Sep 30

Streaming Platforms: Storm vs Spark Streaming

Our latest meetup, Storm vs Spark face-off, was a big hit among Big Data engineers in New York. Slides from our meetup and from Hadoop User Group in Chicago presented on this page. I hope both of those presentations will help you make better choice for your use case and environment. Apache storm vs. Spark Streaming P. Taylor...

Read More

Book Review – Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

Sep 28

Book Review – Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives

“Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives” is a new book by Vijay Agneeswaran on the topic of Big Data. Author provides foundation why Hadoop, especially Map-Reduce computational model is not suited well for a number of cases. Author divided those cases...

Read More

Seven micro-services architecture problems and solutions

Jun 04

Seven micro-services architecture problems and solutions

Micro-service architecture present a set of challenges that need to be addressed. Those include: 1. Operational overhead. Now instead of single monolithic application you have to : Deploy many small micro-services Monitor many small micro services. Provision hardware for many more services. 2. Complexities in networking calls,...

Read More
Page 1 of 1012345...10...Last »