by edvorkin | Jul 20, 2015 | BigData, Blog, Java
In this post we well deploy simple java/scala microservice application to Mesos and learn how to scale it up and down. If you read my previous post, you know that Mesos fully support Dockers, most popular today container technology. We will take advantage of Dockers...
by edvorkin | Jul 13, 2015 | BigData, Blog
Mesos is an interesting project that cut across different use cases – big data analytics, web development, continuous integration and so on. I learned about Mesos from big data perspective, running Spark job on it. But in reality it has much broader utility...
by edvorkin | Apr 12, 2015 | Architecture, BigData, Blog, Spark
There is no doubts that Apache Spark project has a lot of momentum in the Big Data world right now. Short of curing cancer, Spark appears to be able to solve all the data problems people have. Map/Reduce batch workflow – check, real-time streaming – check,...
by edvorkin | Feb 9, 2015 | Architecture, BigData, Blog
When working with Hadoop and SQL-On-Hadoop systems like Impala, we have to think about couple of important factors – how to serialize data for storage and processing and how to partition the data. Majority of Hadoop practitioner now agree that most flexible and...
by edvorkin | Jan 19, 2015 | Agile ALM, BigData, Blog
Docker container, alongside with cloud computing, is a big enablers of innovation. One of the use cases which we constantly face as developers is the ability quickly install and run software packages that we are not familiar with but need to explore and create...
by edvorkin | Jan 1, 2015 | BigData, Blog
Definitely 2014 was a year of Big Data. But with all the new development in Big Data, I almost missed on one very important new trend in the industry. I think it will change the way we build, deploy and package software – will be it Big Data frameworks, web...
by edvorkin | Oct 11, 2014 | Architecture, BigData, Blog, Java
When working with large volume of data memory and space requirement could be very high. This in turn have effect on scalability, when suddenly your job or process either taking too long or requires more resources. Probabilistic data structure allow you to trade some...
by edvorkin | Sep 30, 2014 | BigData, Blog, Storm
Our latest meetup, Storm vs Spark face-off, was a big hit among Big Data engineers in New York. Slides from our meetup and from Hadoop User Group in Chicago presented on this page. I hope both of those presentations will help you make better choice for your use case...