by edvorkin | Apr 12, 2015 | Architecture, BigData, Blog, Spark
There is no doubts that Apache Spark project has a lot of momentum in the Big Data world right now. Short of curing cancer, Spark appears to be able to solve all the data problems people have. Map/Reduce batch workflow – check, real-time streaming – check,...
by edvorkin | Feb 9, 2015 | Architecture, BigData, Blog
When working with Hadoop and SQL-On-Hadoop systems like Impala, we have to think about couple of important factors – how to serialize data for storage and processing and how to partition the data. Majority of Hadoop practitioner now agree that most flexible and...
by edvorkin | Oct 11, 2014 | Architecture, BigData, Blog, Java
When working with large volume of data memory and space requirement could be very high. This in turn have effect on scalability, when suddenly your job or process either taking too long or requires more resources. Probabilistic data structure allow you to trade some...