Walking tour in Brooklyn’s Fort Greene

May 03

Walking tour in Brooklyn’s Fort Greene

Yesterday I visited Historic Fort Greene – very interesting part of Brooklyn. It was a walking tour organized by Big Onion Working Tours Company. With knowlegable guide, who turn out to be PhD from Columbia in Art History, we explored this Brooklyn’s most diverse neighborhoods, and area remarkably historic and cutting-edge contemporary all at the same time....

Read More

Spark as a catalyst to learn Scala

Apr 12

Spark as a catalyst to learn Scala

There is no doubts that Apache Spark project has a lot of momentum in the Big Data world right now. Short of curing cancer, Spark appears to be able to solve all the data problems people have. Map/Reduce batch workflow – check, real-time streaming – check, working with Graphs – check, Machine Learning – check. So I...

Read More

Creating Hadoop and Impala friendly partitioned data with Kite SDK

Feb 09

Creating Hadoop and Impala friendly partitioned data with Kite SDK

When working with Hadoop and SQL-On-Hadoop systems like Impala, we have to think about couple of important factors – how to serialize data for storage and processing and how to partition the data. Majority of Hadoop practitioner now agree that most flexible and performant would be combination of Avro and Parquet formats. So let’s...

Read More

Using Docker containers to perform quick prototypes

Jan 19

Using Docker containers to perform quick prototypes

Docker container, alongside with cloud computing, is a big enablers of innovation. One of the use cases which we constantly face as developers is the ability quickly install and run software packages that we are not familiar with but need to explore and create proof-of-concepts or work on spike, to validate our design decision. We often need...

Read More

Docker Revolution

Jan 01

Docker Revolution

Definitely 2014 was a year of Big Data. But with all the new development in Big Data, I almost missed on one very important new trend in the industry. I think it will change the way we build, deploy and package software – will be it Big Data frameworks, web sites, java apps or python libraries. I believe in 2015 this...

Read More

Probabilistic data Structures – Bloom filter and HyperLogLog for Big Data

Oct 11

Probabilistic data Structures – Bloom filter and HyperLogLog for Big Data

When working with large volume of data memory and space requirement could be very high. This in turn have effect on scalability, when suddenly your job or process either taking too long or requires more resources. Probabilistic data structure allow you to trade some accuracy for immense decrease in memory usage. For...

Read More

Streaming Platforms: Storm vs Spark Streaming

Sep 30

Streaming Platforms: Storm vs Spark Streaming

Our latest meetup, Storm vs Spark face-off, was a big hit among Big Data engineers in New York. Slides from our meetup and from Hadoop User Group in Chicago presented on this page. I hope both of those presentations will help you make better choice for your use case and environment. Apache storm vs. Spark...

Read More
Page 1 of 1012345...10...Last »