First, the introduction of microservices

1. What are Microservices

In the introduction of microservices, we must first understand what microservices are. As the name suggests, microservices have to be understood from two aspects,what is “micro” and what is “service”. In the narrow sense, the small and famous”2 pizza team” is a good interpretation of this explanation (the 2 pizza team was first proposed by Amazon CEO Bezos, meaning that the design of a single service, all participants from the design, development, testing, operation and maintenance owners add up to only 2 pizzas). The so-called service, must be different from the system, service one or a set of relatively small and independent functional units, is the user can perceive the minimum set of functions.

2. Origin of Microservices

First proposed by Martin Fowler and James Lewis in 2014, microservices architecture style is a way to develop a single application using a set of small services, each running in its own process and communicating using lightweight mechanisms, usually HTTP APIs, that are built on business capabilities and can be deployed independently through automated deployment mechanisms, implemented in different programming languages, and different data storage technologies, with minimal centralized management.

3. Why do you need microservices?

In the traditional IT industry, most of the software is piling up a variety of independent systems, the problem of these systems is summed up as poor scalability, reliability is not high, high maintenance costs. However, since SOA used bus mode in the early days, this bus mode is strongly bound to a certain technology stack, such as: J2EE. This results in many enterprises ‘ legacy systems are difficult to connect, switching time is too long, the cost is too high, the convergence of the stability of the new system also takes some time.In the end, SOA looks beautiful,but it has become an enterprise-class luxury that small and medium-sized companies are afraid of.

3.1 Problems caused by the monolithic architecture

The single architecture works well in the case of a relatively small scale, but with the expansion of the scale of the system, it exposes more and more problems, mainly the following points:

1.Complexity gets higher

For example, some projects have hundreds of thousands of lines of code, the difference between the various modules is more vague, the logic is more confusing, the more code complexity, the more difficult to solve the problem encountered.

2.Technological debt is rising

The company’s personnel flow is a normal thing, some employees before leaving, neglect the quality of the self-control, resulting in leaving a lot of errors, due to the huge amount of project code, a error is difficult to find, which brings great trouble to the new employees, the greater the turnover of personnel left more errors, which is the so-called technical debt more and more.

3.Deployment slows down gradually

This is very well understood, the single architecture module is very large, the amount of code is very large, resulting in the deployment of the project takes more and more time, once some projects start to take 10 minutes, what a terrible thing ah, start a few projects a day will pass, leaving developers very little time to develop.

4.Hindering technological innovation

For example, a previous project was written using struts2, due to the inextricably linked between the various modules, the amount of code, the logic is not clear enough, if you want to use spring mvc to refactor this project will be very difficult, the cost will be very large, so more often companies have to bite the bullet to continue to use the old struts architecture, which hinders the innovation of technology.

5.Cannot scale on demand

For example, the movie module is a CPU-intensive module,and the order module is IO-intensive module, if we want to improve the performance of the order module, such as increasing memory, increasing hard disk, but because all modules are in one architecture, so we have to consider other module factors when expanding the performance of the order module, because we can not expand the performance of a module and damage the performance of other modules, and thus can not scale on demand.

3.2 Differences between Microservices and Monolithic Architectures

Each module of microservices is equivalent to a separate project. The amount of code is significantly reduced, and the problem is relatively easy to solve.

Single architecture All modules share a database, the storage mode is relatively single, microservices each module can use a different storage mode (for example, some use redis, some use mysql, etc.), the database is also a single module corresponding to its own database.

Monolithic architecture All module development uses the same technology, microservices each module can use a different development technology, the development mode is more flexible.

3.3 Microservices and SOA Differences

Microservices, in the essence, are SOA architectures.In a microservice system, there can be services written in Java or services written in Python. They are unified into a system by Restful architectural style.So the microservices themselves have nothing to do with the specific technology implementation, and are highly scalable.

4. The Nature of Microservices

Microservices, the key is not just the microservices themselves,but the system should provide a set of basic architecture, which allows microservices to be deployed, run, and upgraded independently. Not only that, the system architecture also allows microservices and microservices to be structurally “loosely coupled”, and functionally expressed as a unified whole.This so-called“unified whole”shows a unified style of interface, unified rights management, unified security policy, unified on-line process, unified log and audit methods, unified scheduling, unified access entry and so on.

The purpose of microservices is to effectively split applications for agile development and deployment.

Microservices promote the idea that inter-team should be inter-operate, not integrate .inter-operate is to define the boundaries and interfaces of the system. In a team full stack, let the team be autonomous, because if the team is formed in such a way, the cost of communication within the system will be maintained, each subsystem will be more cohesive, each other’s dependent coupling can become weak, cross-system communication costs can be reduced.

5. What kind of project is suitable for microservices

Microservices can be divided according to the independence of the business function itself, if the system provides services that are very low-level, such as: operating system kernel, storage system, network system, database system, etc., such systems are low-level, there is a close relationship between functions and functions, if forced to split into smaller service units, will make the integration workload rise sharply, and this artificial cutting can not bring real isolation on the business, so can not be deployed and run independently, it is not suitable for making microservices.

Whether you can make a microservice depends on four elements:

  • Small: microservices are small in size,2 pizza teams.
  • Independent:Ability to deploy and run independently.
  • Light: Use lightweight communication mechanisms and architectures.
  • Loose: Is loosely coupled between services.

6. Microservice Folding and Design

Moving from a monolithic structure to a microservice architecture will continue to encounter the problem of service boundary division: for example, we have a user service to provide the basic information of the user,so should the user’s avatar and picture, etc. be divided into a new service is better or should it be merged into the user service?If the granularity of the service is too coarse, it is back to the old way of monolithic; if it is too fine, the overhead of inter-service calls becomes negligible, and the difficulty of management increases exponentially.So far, there is no standard that can be called service boundary division, which can only be adjusted according to different business systems

The big principle of splitting is that when a business does not depend on or rarely depends on other services,has independent business semantics, provides data for more than 2 other services or clients, then it should be split into a separate service module.

4-Microservice Design Principles

6.1 Microservice Design Principles

Principle of Single Responsibility

It means that each microservice only needs to implement its own business logic on it, such as the order management module, it only needs to process the business logic of the order on it, and the rest does not need to be considered.

Principles of Service Autonomy

It means that each microservice is independent from development, testing, operation and maintenance, etc., including the stored database is also independent, there is a complete process, we can treat it as a project.Do not have to rely on other modules.

Lightweight Communication Principles

The first is that the language of communication is very lightweight, second, the communication mode needs to be cross-language, cross-platform, cross-language is to make each microservice has enough independence, can not be controlled by technology.

Clear principles of interfaces

Since there may be invocation relationships between microservices, in order to try to avoid future adjustments due to changes in the interface of a microservice, it is necessary to take into account all situations at the beginning of the design, so that the interface is as common and flexible as possible, so as to avoid other modules also making adjustments.

7. Microservices Advantages and Disadvantages

7.1 Advantages

Each microservice can run independently in its own process;

A series of independently running microservices work together to build the entire system;

Each service is developed as a separate business, and a microservice generally completes a specific function,such as: order management, user management, etc;

Microservices communicate through lightweight communication mechanisms,such as calls via REST APIs or RPC.

7.2 Advantages

Easy to develop and maintain

Since a single module of microservices is equivalent to a project, the development of this module we only need to care about the logic of this module, the amount of code and logical complexity will be reduced, so that it is easy to develop and maintain.

Faster start-up

This is relative to a single microservice, and the service speed of starting a module is obviously much faster than starting an entire project with a single architecture.

Local modifications are easy to deploy

We found a problem in the development. If it is a single architecture, we need to re-release and start the whole project,which is very time-consuming, but microservices are different. Which module has a bug we only need to solve the bug of that module, after solving the bug, we only need to restart the service of this module, the deployment is relatively simple, do not have to restart the entire project, thus saving time.

The technology stack is not limited

For example, order microservices and movie microservices were originally written in java.Now we want to change the movie microservices to NodeJS technology,which is entirely possible,and because the focus is only on the logic of the movie, the cost of technology replacement will be much less.

Scaling on demand

We said above that monolithic architecture when you want to extend the performance of a module, you have to take into account whether the performance of other modules will be affected. For our microservices, it is not a problem at all.

7.3 Disadvantages

High operation and maintenance requirements

For a single architecture, we only need to maintain this project, but for a microservice architecture, because the project is composed of multiple microservices, each module problem will cause the whole project to run abnormally, it is often not easy to know which module caused the problem, because we can not track the problem step by step through debug, which puts forward high requirements for the operation and maintenance personnel.

Distributed Complexity

For a single architecture, we can not use distributed, but for a microservice architecture, distributed is almost a necessary technology, due to the complexity of distributed itself, resulting in microservice architecture has become complex.

High cost of interface adjustment

For example, user microservices are to be called by order microservices and movie microservices. Once the interface of the user microservices changes greatly, then all the microservices that depend on it have to be adjusted accordingly. Since the microservices may be very large, the cost of adjusting the interface will be significantly increased.

Repetitive work

For a single architecture, if a business is used by multiple modules, we can abstract it into a tool class that is called directly by all modules,but microservices cannot do so, because the tool class of this microservice cannot be called directly by other microservices, so we have to build such a tool class on each microservice, resulting in duplication of code.

8. Microservices Development Framework

At present, the development framework of microservices, the most commonly used are the following four:

Spring Cloud: (Now very popular microservice architecture


Dropwizard: (Focus on the development of individual microservices

Consul, etcd&etc.(Modules for microservices)

9. The difference between Sprint cloud and Sprint boot

Spring Boot:

Designed to simplify the creation of product-level Spring applications and services, it simplifies configuration files, uses embedded web servers, contains many out-of-the-box microservices capabilities, and can be deployed jointly with spring cloud.

Spring Cloud:

The Microservice toolkit provides developers with development kits for distributed system configuration management, service discovery, circuit breakers, intelligent routing, micro-agent, control bus and so on.

Second, microservices practice

1. How do clients access these Microservices services?(API Gateway)

The traditional way of development,all services are local, the UI can be called directly, now split into independent services by function, running in a separate Java process that is generally on a separate virtual machine.How does the client UI access his?There are N services in the background,the front desk needs to remember to manage N services,a service offline / update / upgrade, the front desk will be redeployed, which obviously does not serve our split concept, especially when the current desk is a mobile application, usually the pace of business changes is faster.In addition, N small service calls are not a small network overhead.There are also general microservices within the system, usually stateless, and user login information and rights management is best to have a unified local maintenance management (OAuth).

Therefore, generally between the N services in the background and the UI will generally be a proxy or called API Gateway,his role includes

Provide a unified service portal for microservices to be transparent to the foreground

Aggregate back-end services to save traffic and improve performance

Provide security, filtering, flow control and other API management functions

In fact, I understand that this API Gateway can have a lot of generalized implementation, it can be a soft and hard box, it can be a simple MVC framework, or even a Node.The server side of js.Their most important role is to provide an aggregation of background services for the foreground (usually mobile applications), provide a unified service exit, and de-coupling between them, but API Gateway can also become a single point of failure or a performance bottleneck.

2. How do Microservices communicate?(Service calls)

Because all microservices are independent Java processes running on independent virtual machines, so the traffic between services is IPC (inter process communication), there have been many mature programs.Now there are two ways to basically the most versatile.In these ways, you can write a book in terms of expansion, and we are generally familiar with the details, and we do not expand the talk.

REST(JAX-RS,Spring Boot)

RPC(Thrift, Dubbo)

Asynchronous message calls(Kafka, Notify)

General synchronous call is relatively simple, consistency is strong,but easy to call problems, performance experience will be worse, especially when the call level is more.The comparison between RESTful and RPC is also a very interesting topic.General REST based on HTTP, easier to implement, easier to be accepted, the server implementation technology is more flexible,each language can support, at the same time across the client, there are no special requirements for the client, as long as the package of HTTP SDK can be called, so relatively wide use.RPC also has its own advantages, the transport protocol is more efficient,more secure and controllable, especially in a company, if there is a unified development specification and a unified service framework, his development efficiency advantages are more obvious.Look at the actual conditions of their technical accumulation, their own choice.

The asynchronous message mode has a particularly wide range of applications in distributed systems, he can reduce the coupling between the calling services, but also become a buffer between calls, to ensure that the backlog of messages will not flush the callee, while ensuring the caller’s service experience, continue to do their own work, will not be slow down by background performance.However, the cost is to weaken the consistency, the need to accept the final consistency of the data; there is a background service generally to achieve idempotence, because the message is sent for performance considerations will generally be repeated(to ensure that the message is received and received only once is a great test of performance); and finally, the need to introduce an independent broker,if there is no technical accumulation within the company, the broker distributed management is also a great challenge.

3. How do you find so many services?(Service Discovery)

In the microservice architecture, each service generally has multiple copies to do load balancing.A service may go offline at any time,or it may add new service nodes to temporary access pressure.How do services perceive each other?How is the service managed?This is the problem with service discovery.There are generally two types of practices, but also have advantages and disadvantages.Basically, it is through zookeeper and other similar technologies to do distributed management of service registration information.When the service goes live, the service provider registers its service information to ZK(or similar framework) and maintains a long link through a heartbeat, updating the link information in real time.Service callers address through ZK, according to customizable algorithms, find a service, you can also cache the service information locally to improve performance.When the service is offline, ZK will send a notification to the service client.

Client-side: The advantage is that the architecture is simple,the extension is flexible, and only depends on the service registrar.The disadvantage is that the client has to maintain the address of all the calling services, there is technical difficulty, and the general large companies have mature internal framework support,such as Dubbo.

Server side: The advantage is simple,all services are transparent to the front-end caller, and applications deployed on cloud services in small companies are generally used more.

4. What if the service hangs up in Microservices?

The biggest feature of distributed is that the network is unreliable.This risk can be reduced through microservice splitting, but without special guarantees, the outcome is definitely a nightmare.We have just encountered an online failure is a very humble SQL counting function, when the number of visits increases, resulting in high database load, affecting the performance of the application, thus affecting all the foreground applications that call this application service.So when our system is composed of a series of service call chains, we must ensure that any link problem does not affect the overall link.

There are many corresponding means:

  1. Retry mechanism
  2. Current limiting
  3. Fuse mechanism
  4. Load Balancing

Downgrade (local caching) these methods are basically clear and generic, not detailed.

For example, Netflix’s Hystrix:

5. Issues to consider for Microservices

Here’s a very good graph summarizing the issues to consider in microservice architecture, including

API Gateway

Inter-service calls

Service Discovery

Service Fault Tolerance

Service Deployment

Data calls

Third, microservices important components

1. Microservices Basic Capabilities

2. Service Registry

Services need to create a service discovery mechanism to help services perceive each other’s existence.When the service starts, it will register its own service information to the registry and subscribe to the services it needs to consume.

The service registry is the core of service discovery.It holds the network addresses (IPAddress and Port) of each of the available service instances.The service registry must have high availability and real-time updates.The Netflix Eureka mentioned above is a service registry.It provides a REST API for service registration and query service information.The service registers its own IPAddress and Port by using a POST request.Every 30 seconds, a PUT request is sent to refresh the registration information.Log off the service with a DELETE request.The client obtains the available service instance information through a GET request. Netflix achieves high availability is achieved by running multiple instances on Amazon EC2,with each Eureka service having an elastic IP Address.When the Eureka service starts, there is dynamic allocation of DNS servers.The Eureka client obtains the network address (IP Address and Port) of Eureka by querying DNS.In general, the Eureka server address is returned and the client is in the same availability zone. Others that can act as a service registry are:

etcd-highly available, distributed, strongly consistent, key-value, Kubernetes, and Cloud Foundry all use etcd.

consul-a tool for discovering and configuring.It provides an API that allows clients to register and discover services.Consul can perform a service health check to determine the availability of the service.

zookeeper — widely used in distributed applications, high-performance coordination services. Apache Zookeeper was originally a subproject of Hadoop,but is now a top-level project.

2.1 zookeeper service registration and discovery

In simple terms, zookeeper can act as a service Registry, allowing multiple service providers to form a cluster, allowing service consumers to obtain specific service access addresses (ip+ports) through the service registry to access specific service providers.As shown in the following figure:

Specifically, the zookeeper is a distributed file system, whenever a service provider after deployment to their services registered to The zookeeper of a way on the PATH: /{service}/{version}/{ip:port}, such as our HelloWorldService deployed to the two machines, then the zookeeper will create two entries recorded: were/HelloWorldService/1.0.0/ /HelloWorldService/1.0.0/。

zookeeper provides a “heartbeat detection” function, it will periodically send a request to each service provider(in fact, a long socket connection is established), if there is no response for a long time, the service center will think that the service provider has“hung up”, and cull it, for example, If the machine is down, then the path on zookeeper will be only/HelloWorldService/1.0.0/

The service consumer will listen to the corresponding path (/HelloWorldService/1.0.0), once the data on the path has a task change (increase or decrease), zookeeper will notify the service consumer service provider address list has changed, so as to update.

More importantly, zookeeper’s innate fault-tolerant and disaster-tolerant capabilities (such as leader elections) ensure high availability of the service registry.

3. Load Balancing

In order to ensure high availability, each microservice needs to deploy multiple service instances to provide services.At this point, the client performs load balancing of the service.

3.1 Common Strategies for Load Balancing

3.1.1 Random

The request from the network is randomly assigned to multiple servers in the internal.

3.1.2 Polling

Each request from the network, in turn assigned to the internal server, from 1 to N and then start over.This load balancing algorithm is suitable for servers within the server group have the same configuration and the average service request is relatively balanced.

3.1.3 Weighted Polling

According to the different processing power of the server, assign different weights to each server, so that it can accept the corresponding number of weights of the service request.For example: the weight of the server A is designed to be 1, the weight of B is 3, the weight of C is 6, the server A, B, C will receive 10%, 30%, 60% of the service request.This equalization algorithm can ensure that high-performance servers get more usage, to avoid low-performance servers overloaded.

3.1.4 IP Hash

This way by generating a hash value of the request source IP, and through this hash value to find the correct real server.This means that his corresponding server is always the same for the same host.In this way, you do not need to save any source IP.However, it is important to note that this approach may result in an unbalanced server load.

3.1.5 Minimum number of connections

The time spent on the server for each request of the client may vary greatly. With the lengthening of the working time, if a simple round robin or random balancing algorithm is used, the connection process on each server may vary greatly and does not achieve true load balancing.The minimum number of connections balancing algorithm has a data record for each server that needs to load internally, recording the number of connections currently being processed by the server. When there is a new service connection request, the current request will be assigned to the server with the least number of connections, so that the balance is more in line with the actual situation and the load is more balanced.This equalization algorithm is suitable for long-term processing of request services,such as FTP.

4. Fault tolerance

Fault tolerance, the understanding of the word, is to accommodate the error, do not let the error expand again, let the impact of the error within a fixed boundary,”a thousand miles of embankment destroyed in the nest ” The way we use fault tolerance is to make the nest do not grow large.Then our common downgrades, current limiting, fuses, timeout retry, etc. are fault-tolerant methods.

When calling a service cluster, if a microservice invokes exceptions, such as timeouts, connection exceptions, network exceptions, etc., the service fault tolerance is made according to the fault tolerance policy.Currently supported service fault tolerance policies have fast failure, failure switching.If it fails multiple times in a row, it fuses directly and no longer initiates the call.This prevents a service exception from draining all services that depend on him.

4.1 Fault Tolerance Policy

4.1.1 Fast Failure

The service only initiates a stand-by, and the failure immediately reports an error.Typically used for write operations that are not idempotent

4.1.2 Failover

The service initiates a call, and when a failure occurs, retry the other server.Usually used for read operations, but retry brings a longer delay.The number of retries can usually be set

4.1.3 Failure Security

Fail safe, when the service call has an exception, it is ignored directly.Typically used for operations such as writing logs.

4.1.4 Automatic recovery of failures

When an exception occurs in a service call, a failed request is logged and a regular retransmission is made.Typically used for message notifications.

4.1.5 forking Cluster

Multiple servers are called in parallel, and as long as there is one success, it is returned.Usually used for high real-time read operations.The maximum number of parallelism can be set by forks=n.

4.1.6 Broadcast Calls

The broadcast calls all providers, one by one, and any failure fails.It is typically used to notify all providers of updates to local resource information such as caches or logs.

5. Fusing

Fuse technology can be said to be a kind of”intelligent fault tolerance”, when the call meets the number of failures, the failure ratio will trigger the fuse to open, there is a program to automatically cut off the current RPC call, to prevent further expansion of the error.To achieve a fuse is mainly to consider three modes, off, open, half open.The transition of each state is shown below.

   When we deal with exceptions, we have to decide how to handle them according to the specific business situation. For example, we call the commodity interface, the other party only temporarily does the downgrade process, then as a gateway call, we have to cut to the alternative service to perform or get the bottom data, and give user-friendly tips.There is also a need to distinguish the type of exception, such as the dependent service crashes, which may take a long time to solve.It may also be that the server load is temporarily too high, resulting in a timeout.As a fuse should be able to identify this type of exception, so as to adjust the fuse strategy according to the specific type of error.Added manual settings, in the case of failed service recovery time is uncertain, the administrator can manually force the switch fuse state.Finally, the fuse usage scenario is to call a remote service program or shared resource that may fail.If local private resources are cached locally, the use of fuses increases the overhead of the system.Also note that fuses cannot be used as an exception handling substitute for business logic in your application.

Some exceptions are stubborn, sudden, unpredictable, and difficult to recover, and can also lead to cascading failures (for example, suppose a service cluster load is very high, if a part of the cluster hangs up at this time, but also accounts for a large part of the resources, the entire cluster may suffer).If we continue to retry at this time, the result is mostly a failure.Therefore, at this time our application needs to immediately enter the failure state(fast-fail), and take the appropriate method for recovery.

We can use a state machine to implement CircuitBreaker, which has the following three states

Closed: Circuit Breaker is closed by default, allowing the operation to be executed.CircuitBreaker internally records the number of recent failures, and if the corresponding operation fails, the number will continue once.CircuitBreaker transitions to the Open state if the number of failures( or the failure rate )reaches a threshold within a certain period of time.In the on state, Circuit Breaker enables a timeout timer that is set to give the cluster the appropriate time to recover from the failure.When the timer time comes, CircuitBreaker will switch to the Half-Open (Half-Open )state.

Open: In this state, the execution of the corresponding operation will fail immediately and an exception will be thrown immediately.

Half-Open:In this state, Circuit Breaker allows a certain number of operations to be performed.If all operations succeed, CircuitBreaker assumes that the failure has been restored,it transitions to a closed state, and resets the number of failures.If any of these operations fail, Circuit Breaker will assume that the fault still exists, so it will switch to the on state and turn the timer on again(giving the system some more time to recover from the failure)

6. Current limiting and downgrading

    Ensure the stability of core services.In order to ensure the stability of the core service, with the increasing number of visits, you need to set a limit threshold for the number of services the system can handle, more than this threshold request is directly rejected.At the same time, in order to ensure the availability of core services, you can downgrade some non-core services,by limiting the maximum number of traffic to the service to limit the flow, through the management console for a single microservice manual downgrade.

7. SLA in Microservices

SLA: Short for Service-LevelAgreement, which means Service level Agreement. A contract between a network service provider and a customer that defines terms such as service type, quality of service, and customer payment.

A typical SLA includes the following items:

  1. Minimum bandwidth allocated to customers;
  2. Customer Bandwidth Limits;
  3. Number of customers who can serve at the same time;
  4. Scheduling notifications prior to network changes that may affect user behavior;
  5. Dial-in Access Availability;
  6. Using Statistics;
  7. Minimum network utilization performance supported by the service provider, such as 99.9% active working time or up to 1 minute of downtime per day;
  8. Traffic priority for all types of customers;
  9. Customer technical support and services;
  10. Penalties are specified for service providers that fail to meet SLA requirements.

8. API Gateways

   The gateway here refers to the API gateway, which means that all API calls are unified access to the API gateway layer,and there is a unified access and output of the gateway layer.The basic functions of a gateway are: unified access, security protection, protocol adaptation, traffic control, long-and long-link support, fault tolerance.With the gateway, each API service provider team can focus on their own business logic processing, while the API gateway is more focused on security, traffic, routing and other issues.

9. Multi-level caching

     The simplest cache is to look up the database once and then write the data to the cache, such as redis, and set the expiration time.Because there is expiration, we should pay attention to the penetration rate of the cache.The penetration rate calculation formula,such as query method queryOrder(number of calls 1000/1s)inside the nested query DB method queryProductFromDb(number of calls 300/s), then the penetration rate of redis is 300/1000, in this way of using the cache, it is necessary to pay attention to the penetration rate, the penetration rate is large, indicating that the effect of the cache is not good.Another way to use the cache is to persist the cache,that is, do not set the expiration time, which will face a data update problem.In general, there are two ways, one is to use the timestamp, the query is based on redis by default, each time you set the data into a timestamp, each time you read the data with the current time of the system and the last set of this timestamp to do comparison, such as more than 5 minutes, then check the database again.This can ensure that there is always data in redis, which is generally a fault-tolerant method for DB.The other is to really let redis be used as a DB.The binlog, which subscribes to the database, pushes the data to the cache through the data heterogeneous system,and sets the cache to multi-level.You can use jvmcache as a first-level cache in the application, generally small size, access frequency is more suitable for this jvmcache mode, a set of redis as a second-level remote cache,in addition to the outermost three-level redis as a persistent cache.

10. Timeouts and retry

    Timeout and retry mechanism is also a method of fault tolerance, where RPC calls occur, such as reading redis, db, mq, etc., because the network failure or the dependent service failure, can not return the result for a long time, it will lead to increased threads, increased cpu load, and even lead to an avalanche.So set the timeout for each RPC call.For the case of strong dependence on RPC call resources, there must be a retry mechanism, but the number of retry is recommended 1-2 times, in addition, if there is a retry, then the timeout time should be reduced accordingly, such as retry 1 time, then a total of 2 calls occur.If the timeout is configured for 2s, then the client will have to wait for 4s to return. Therefore, retry + timeout mode, the timeout time should be reduced.Here also talk about a PRC call time is consumed in which links, a normal call statistics of time including: ① call-side RPC framework execution time + ② network transmission time + ③Server-side RPC framework execution time + ④server-side business code time.The caller and the service side have their own performance monitoring,such as the caller tp99 is 500ms, the service side tp99 is 100ms, find the network group colleagues to confirm that the network is no problem.So where is the time spent? There are two reasons, the client caller,and one reason is that TCP retransmission occurs on the network.So pay attention to these two points.

11. Thread pool Isolation

In this aspect of resistance, when Servlet3 is asynchronous, thread isolation has been mentioned. The advantage between thread isolation is to prevent cascading failures or even avalanches.When the gateway calls N more than one interface service, we need to thread isolation for each interface.For example, we have to call orders, goods, users. Then the order of the business can not affect the processing of goods and user requests.If you do not do thread isolation, when the access order service network failure leads to delay, the thread backlog eventually leads to the full-service CPU load.That is, we say that all the services are not available, how many machines will be stuffed with requests at the moment. Then with thread isolation will make our gateway can ensure that local problems will not affect the global.

12. Downgrade and current limiting

 The industry HAS A VERY MATURE APPROACH TO DOWNGRADE CURRENT LIMITING, SUCH AS FAILBACK MECHANISM, CURRENT LIMITING METHOD TOKEN BUCKET, DRAIN BUCKET, SEMAPHORE AND SO ON. Let’s talk about some of our experience here, the downgrade is generally achieved by the unified configuration center downgrade switch, then when there are many interfaces from the same provider, the provider’s system or the machine room network where there is a problem, we have to have a unified downgrade switch, otherwise, it will be an interface to downgrade. That is, to have a large knife on the type of business. There is the downgrade remember violence downgrade, what is the downgrade of violence, such as the forum function down, the results of the user show a large whiteboard, we want to achieve the cache of some data, that is, there is a bottom data.If the distributed current limit is realized, a common back-end storage service, such as redis, is required to read redis configuration information using lua on large nginx nodes.Our current limit is a stand-alone current limit, and did not implement distributed current limit.

13. Gateway Monitoring and Statistics

API gateway is a serial call,then every step of the occurrence of exceptions should be recorded, unified storage in a place such as elasticserach, to facilitate the subsequent analysis of the call exception.Given that the company’s docker applications are all unified distribution, and there are already 3 agents on docker before the distribution, it is no longer allowed to increase.We have implemented an agent program to collect the log output from the server, and then send it to the kafka cluster, and then consume it to the elasticserach, and query it through the web.Now do the tracking function is relatively simple,this piece also needs to continue to be rich.

Leave a Comment

Scroll to Top