Seven micro-services architecture problems and solutions

Micro-service architecture present a set of challenges that need to be addressed.
Those include:
1. Operational overhead. Now instead of single monolithic application you have to :

  • Deploy many small micro-services
  • Monitor many small micro services.
  • Provision hardware for many more services.

2. Complexities in networking calls, interactions.

Thinks get even more complicated when something goes wrong. Instead of looking into one application, you have to chaise interactions, network calls, request and responses between different services. If you have performance problem, you have to untangle the web of micro services.

So how to make this operational overhead as small as possible and take full advantages of micro-service architecture?
I think by creating and enforcing some standard in software development practices as well as using modern tools.

1. Problem: Deploying much greater number of applications.
Solution is in deployment automation. Using tools like Jenkins, uDeploy, Capistrano, Chief, Puppet or custom scripts should completely automate deployments of micro services.
One button deployment should be implemented and deployed.

2. Problem: Monitor many services for performance degradation or issues.

Solution: Performance monitoring.

Because each service talk to multiple other one, it’s will be hard to pinpoint the source of performance problem. There is Apache log to get performance information. Solution is in performance monitoring of each individual micro-service. Graphite, Statsd or cloud based services like New Relic provide rich, visual dashboard where problems can be discovered.

graphite

3.Problem: alerting when integration between mocro-services have issues.

Solution: Alerting with Synthetic Monitoring. What if one of the micro-services went down? What if communication between services stopped working? How we know before our user notice?

“Synthetic monitoring (also known as active monitoring) is website monitoring that is done using a web browser emulation or scripted recordings of web transactions. Behavioral scripts (or paths) are created to simulate an action or path that a customer or end-user would take on a site. Those paths are then continuously monitored at specified intervals for performance, such as: functionality, availability, and response time measures.” – Wikipedia

synthetic robot
For example, we have a complex process involving Twitter API (web service, out storm topology (multiple micro-services), consumer facing REST API micro services that interact with each other.
We run a script that post tweet on twitter and expect result to show up in correct format in consumer web application. If this does not happen, our test script can alert and let us know of the problem.

4. Problem: Looking into log files from many micro-services. You want to dig in, but there is so many different servers so just looking and finding problem in logs becaume big pain.

Solution: Use log aggregation. Logstash with Kibana are open source tools that need to be utilized by each operational team who don’t employ similar solution.
Logstash allow to aggregate logs into one location, and Kibana allow to search log files.

logstash

5. Problem: Monitoring distributed systems in real-time.

Solution: Use of the tools outlined below. By monitoring operational and business events in distributed application one can gain insights into how application perform. Error and other event can be aggregated and alert can be triggered if something exceed certain threshold. For example, system can send email for every exception raised by your code.Track the latency distribution of your micro-services. See the top processes on any host, by memory and CPU in one convenient dashboard. There are several commercial and open source tools that allow you do so. You can build one yourself as well if nothing fit your need. I suggest to investigate following frameworks and tools first:

  • Nexflix Suro
  • Riemann.io
  • Sensu
  • Circonus

If you want to roll out your own solution then look into Kafka, Storm, Drool. The main idea here is to have generic, common monitoring framework for business and operational events together. Then you can answer question like – “Is this particular technical issue affected my sales number? and many more questions.

dash-riak

6. Problem: Route individual call across different micro-services for troubleshooting.

Solution: Create Correlation ID for each service call.

One service issue rest API call to another server, this one to yet another one and so on. How you can figure out how one particular request got transformed and what was get called or not? Correlation ID that get passed across calls allow you to track each request and it route much easier. This Correlation ID is used per request and if you are in the process of debugging some issue – your Correlation ID will be the best starting point for searching for what went wrong along that request!

Granted, it’s requires some up-front development cost, but will be paid off in a long run. When request travel between different micro-services you will be able to see all interactions and which service have problems, for example, did not emit any calls at all. Many commercial software packages, for example, Microsoft SharePoint, are using this pattern to help with problem solving.

sorry_something_went_wrong

7. Problem. Micro-services and whole system scalability.

Solution: Deploy to the Cloud. Let say one of you micro-service suddenly does not perform well because of unexpected spike in traffic. By using “elasticity” provided by major cloud vendor, you can setup a simple rules that will automatically provision additional hardware when specific threshold are met. for example, if service response time increase to 500ms, deploy several more servers.

cloud

I am sure than by following this rules and with right infrastructure and attitude we can get all the positive properties of micro-services architecture and eliminate or minimize much of the negatives.

1 Comment

  1. What about the fact that microservices have the communication overhead (communication over internal network vs direct) which you don’t have with a well designed decoulped monolyth? (OSGI jars developed by different teams).

    Reply

Leave a Reply to Tomer Ben David Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>