What are the best local event applications

The list of the seven articles in this series is as follows:

Microservice battle (1): advantages and disadvantages of the microservice architecture

Microservice battle (two): using the API gateway

Microservice Combat (3): In-depth, cross-process communication of the microservice architecture

Microservice Combat (four): Feasible solutions and practical cases for service discovery

Microservice practice (5): Event-driven data management of microservices

Microservice Practice (6): Choosing a Microservice Delivery Strategy

Microservice practice (7): Migration from a monolithic architecture to a microservice architecture

[Editor's Note] This article is the fifth in a series of applications built using microservices. The first article introduces the microservice architecture pattern and explains the pros and cons of using microservices. The second and third articles describe different aspects of communication between microservice architecture modules. The fourth article examines the problems with service discovery. In this article, we will examine the distributed data management problems caused by microservice architecture from a different perspective.

1.1 Microservices and problems with managing distributed data

Monolithic applications generally have a relational database. The advantage of this is that the application can use ACID transactions, which can have some important operational characteristics:

  • Atomicity - every change is atomic
  • Consistency - The database status is always consistent
  • Isolation - even when transactions are running at the same time, they appear to be serial
  • Persistent - Once the transaction has been sent, it cannot be rolled back

Given the above features, the application can be simplified to: start a transaction, change many rows (insert, delete, update) and then send these transactions.

Another benefit of using a relational database is that it provides SQL support (a powerful, declarable, table-transformed query language). Users can easily combine data from multiple tables by querying them. The RDBMS query planner determines the best implementation method. Users don't have to worry about underlying issues like accessing the database. In addition, since all application data is in one database, querying is easy.

However, data access becomes very complicated for the microservice architecture. This is because the data is private to the microservice and can only be accessed through the API. This type of packed data access makes the microservices loosely coupled and independent of one another. When multiple services access the same data, the schema updates the access time and coordinates between all services.

In addition, different microservices often use different databases. Applications generate a wide variety of data, and relational databases are not necessarily the best choice. In some scenarios, a NoSQL database may offer a more comfortable data model that offers more performance and scalability. For example, an application that generates and queries strings uses a character search engine such as Elasticsearch. Similarly, an application that generates social image data can use an image database such as Neo4j. Therefore, microservice-based applications generally use a database that combines SQL and NoSQL, which is known as the polyglot persistence method.

A partitioned, polyglot-persistent architecture for storing data has many advantages, including loosely coupled services and better performance and scalability. Linked to this, however, is the challenge of distributed data management.

The first challenge is to complete a transaction while maintaining data consistency between multiple services. The reason for this problem is that we are using an online B2B store as an example. The maintenance of the customer service includes various customer information, e.g. B. Lines of Credit. The order service manages orders and must check that a new order does not conflict with the customer's credit limit. In a single application, the ordering service only needs to use ACID transactions to check available credit and create orders.

In contrast, in the microservice architecture, the order and customer tables are private tables for corresponding services, as shown in the following figure:

The order service cannot access the customer table directly, but only via the API published by customer service. Ordering services can also use distributed transactions, also known as two-phase commit (2PC). However, in current applications, 2PC is not optional. According to the CAP theory, a choice must be made between availability and ACID consistency. Availability is generally a better choice. Many modern technologies, such as B. Many NoSQL databases, however, do not support 2PC. Maintaining data consistency between services and databases is a very basic requirement, so we need to find other solutions.

The second challenge is to find data from multiple services. For example, imagine that the application needs to display the customer and their order. When the ordering service provides an API for accepting user ordering information, users can use application-like linkage operations to receive data. The application receives user information from the user service and accepts this user order from the ordering service. For example, suppose the order service only supports querying of orders using private keys (possibly using a NoSQL database that only supports acceptance using primary keys). There is currently no convenient method of receiving the required data.

1.2 Event-Driven Architecture

For many applications, this solution is to use an event-driven architecture. When something important happens in this architecture, the microservice publishes an event, e.g. B. updating a business unit. When the microservices that subscribe to these events receive this event, they can update their business entities, which can also take longer to publish.

You can use events to implement business transactions across multiple services. A transaction is generally made up of a series of steps, and each step consists of a microservice that updates a business entity and publishes an event that activates the next step. The following illustration shows how to use an event-driven approach to checking credit availability when creating an order. Microservices exchange events via a message broker.

  1. The order service creates an order with the status NEW and publishes an "Order created" event.

  2. Customer Service uses the "order created" event, reserves credit for this order, and publishes the "credit reserved" event

  3. Consumption of the order service Credit Reserved Event, change the status of the order to OPEN

    More complicated scenarios can introduce further steps, e.g. B. Checking user credit when reserving inventory.

Considering (a) that each service atomically updates the database and publishes events, and then (b) the message broker ensures that the event is delivered at least once, and then business transactions can be completed across multiple services (this transaction is not a ACID transaction). This mode offers weak security, e.g. B. a possible consistency. This type of transaction is known as the BASE model.

You can also use events to get an overview of the realization of data pre-joins owned by different microservices. The service that manages this view subscribes to related events and updates the view. For example, the sales order view update service (keeping the sales order view) subscribes to events published by customer service and order service.

When the sales order view update service receives a customer or order event, it updates the sales order view record. You can use a document database (e.g. MongoDB) to implement a sales order view and store a document for each user. The query service for the sales order view is responsible for answering customer inquiries and recent orders (by querying the data record for the sales order view).

The event-driven architecture has both advantages and disadvantages. This architecture can result in transactions spanning multiple services and ultimately providing consistency. In addition, applications can keep the final view. The disadvantage is that the programming model is more complicated than the ACID transaction model: to fail at the application level If the credit check is unsuccessful, the order must be canceled. The application must also process inconsistent data. This is because the changes made by the transaction are visible in flight. The application will also encounter data inconsistencies while reading the final view which has not been updated. Another disadvantage is that participants have to recognize and ignore redundant events.

1.3 Achieve atomicity

The event-driven architecture will also encounter the atomicity of database updates and publication events. For example, the order service needs to add a row to the ORDER table and then publish the Order Created event. Both of these processes require atomicity. If the service crashes after updating the database and the event cannot be published, the system becomes inconsistent. The standard way to ensure atomic operations is to use a distributed transaction that includes a database and a message broker. However, based on the CAP theory described above, this is not what we want.

1.3.1 Use local transactions to publish events

One way to achieve atomicity is to use a multi-step process that includes only local transactions to publish event applications. The trick lies in an EVENT table that serves as a message list function in the storage business entity's database. The application initiates a (local) database transaction, updates the status of the business entity, inserts an event in the EVENT table, and then sends the transaction. Another independent application process or thread queries the EVENT table, publishes the event to the message agent, and uses the local transaction to mark the event as published, as shown in the following figure:

The order service inserts a row into the ORDER table and then inserts an "Order Created" event into the "EVENT" table. The event publishing thread or process queries the EVENT table, requests and publishes unpublished events, and then updates the EVENT table to mark the event as published.

This method also has advantages and disadvantages. The advantage is that it can ensure that the publishing of events does not depend on 2PC and that applications publish events at the enterprise level without inferring what happened to them. The disadvantage is that this method can cause errors because developers need to be aware of the publish events. In addition, this method is challenging for some applications that use NoSQL databases because NoSQL itself has limited transaction and query capabilities.

This method does not require a 2PC as the application uses local transaction update status and publish events. Now let's look at another simple application update state to achieve atomicity.

1.3.2 Mining Database Transaction Logs

Another way to determine the atomicity of events published by threads or processes without a 2PC is to break down database transactions or send logs. The application updates the database to make changes to the database transaction log. The transaction log mining process or thread reads these transaction logs and publishes the log to the message agent. As seen in the following figure:

Examples of this method are the LinkedIn Databus project. Databus analyzes the Oracle transaction log and publishes events based on changes. LinkedIn uses Databus to ensure consistency between records in the system.

Another example is: The AWS Stream Mechanism in AWS DynamoDB, a manageable NoSQL database. A DynamoDB stream is based on the changes (create, update, and delete operations) to the database table based on the time series over the past 24 hours, and the application can come from the stream to read these changes and then publish them to an event.

Transaction log mining also has advantages and disadvantages. The advantage is that every update release event does not depend on 2PC. Transaction log mining can be simplified by separating release events and application business logic. The main disadvantage is that transaction logs have different formats for different databases and even different database versions have different formats. It is difficult to convert the lowest level transaction log update records to the higher level business events.

The transaction log mining method updates the database directly from the application without 2PC intervention. Let's look at a completely different approach: there is no need to update methods that are only based on events.

1.3.3 Use event source

Event Sourcing uses a radically different Event Center methodology to achieve atomicity without the need for 2PC to ensure consistency of business units. This application stores a series of state change events of a business entity instead of storing the current state of the entity. The application can replay the event to restore the entity to its current state. As long as the business unit changes, new events are added to the timeline. Because the backup event is a single operation, it must be atomic.

Consider the event entity as an example to understand how the event source works. Conventionally, each order is assigned to a row in the ORDER table, e.g. B. in the table ORDER_LINE_ITEM. With the event source method, the ordering service saves an order in the form of event status changes: created, approved, shipped, canceled; each event contains enough data to reconstruct the order status.

Events are stored in the event database for a long time and APIs are provided to add and get entity events. The event store is similar to the message broker described earlier and provides APIs for subscribing to events. The event store delivers events to all interested subscribers. The event memory is the backbone of the event-driven microservice architecture.

The event source method has many advantages: It solves the key problem of the event-driven architecture so that the event can be released reliably as long as the state changes, which also solves the data consistency problem in the microservice architecture. Also, because it is a persistent event rather than an object, it avoids the problem of the relational impedance mismatch of the object.

The data source method provides a 100% reliable audit trail for changes to business entities so that entity status can be retrieved at any time. In addition, the event source method allows business logic to consist of loosely coupled business units that exchange events. These advantages make it relatively easy to move monolithic applications into a microservice architecture.

The event source method also has many shortcomings in that using different or unknown patterns makes relearning difficult, the event store only supports business entities for primary key queries, and requires using Command Query Responsibility Segregation (CQRS) to complete the query business. Therefore, the application may need to process consistent data.

1.4 Summary

In the microservice architecture, each microservice has its own private data set. Different microservices can use different SQL or NoSQL databases. Although the database architecture offers strong advantages, it also faces the challenge of distributed data management. The first challenge is to maintain the consistency of business transactions between multiple services. The second challenge is to get consistent data from a multi-service environment.

The best solution is to use an event-driven architecture. One of the challenges is to update the status and publish events atomically. There are several ways to solve this problem, including treating the database as a message queue, transaction log mining, and event source.

Other aspects of microservices will be discussed in depth in future blogs.
The links to the other seven articles in this series are as follows:

Reprinted at: https://my.oschina.net/CraneHe/blog/703169