[MySQL compatible] About the distributed SQL database "TiDB" [OSS]

Jon
Jan 22, 2022
4 min read

Updated: Apr 29, 2022

Hello everyone!

This is Jon from Beyond GTA, and today we are going to learn about an open source (OSS) distributed SQL database "TiDB"

■ TiDB was developed by open source developer PingCAP and is now managed by the Cloud Native Computing Foundation (CNCF) community to which PingCAP belongs .

■ TiDB is an open source New SQL database that supports HTAP (Hybrid Transactional and Analytical Processing) workloads.

■ Compatible with "MySQL", with horizontal scalability, strong consistency and high availability. It covers OLTP (online transaction processing), OLAP (online analytical processing), and HTAP services, and is suitable for various use cases that require high availability and strong consistency with large-scale data.

■ As an example of introduction by a Japanese company, it is used in the infrastructure of PayPay, which develops QR payment services.

◇ Quote: Platform Engineer – The Backbone behind PayPay Transaction

TiDB features

● Horizontal dispersion scale-out / scale-in

· The TiDB architecture design that separates compute from storage allows you to scale out / scale in individual compute / storage capacity online as needed.

● Multi-replica and high availability

-Replicas whose data is stored in multiple replicas use the Multi-Raft protocol to acquire transaction logs.

· Transactions commit only if the data was successfully written to most replicas, which guarantees strong consistency and availability if a few replicas go down.

· You can configure the region and replica quantity as needed to meet the requirements of different disaster tolerance levels.

● Real-time HTAP

-TiDB provides two storage engines, TiKV , a row-based storage engine, and TiFlash , a columnar storage engine.

· TiFlash uses the Multi-Raft Learner protocol to replicate data from TiKV in real time, ensuring data consistency between the TiKV row-based storage engine and the TiFlash columnar storage engine.

· TiKV and TiFlash can be deployed on different machines as needed to solve the HTAP resource isolation issue.

● Cloud-native distributed database

· TiDB is a distributed database designed for the cloud, providing flexible scalability, reliability, and security for cloud platforms, giving users the flexibility to scale TiDB to meet their workload requirements.

· TiDB has at least three replicas of each data and can be scheduled in different cloud availability zones to allow outages for the entire data center.

· The TiDB Operator helps you manage your TiDB with Kubernetes and automates tasks related to the operation of your TiDB cluster, so you can easily deploy TiDB to the cloud that provides your managed Kubernetes.

-With the " TiDB Cloud " service, which manages TiDB itself in a fully managed manner, you can deploy and execute a TiDB cluster with just a few clicks on the cloud.

* TiDB Cloud is one of the managed services (paid services) deployed within cloud platforms such as AWS, Azure, and GCP.

● Compatible with the MySQL 5.7 protocol and the MySQL ecosystem

-TiDB is the MySQL 5.7 protocol-General features of MySQL-Compatible with the MySQL ecosystem, so you don't have to change a lot of code when migrating an existing application to TiDB, just a small amount of code. All you have to do is change it.

· TiDB also has a set of data migration tools to help you migrate your application data to TiDB.

TiDB architecture

◇ Quote: TiDB Architecture

As a distributed database, TiDB is designed to consist of multiple components. These components communicate with each other to form a complete TiDB system.

● TiDB server

The TiDB server is a stateless SQL layer that exposes the connection endpoints of the MySQL protocol to the outside world. The TiDB server receives the SQL request, parses and optimizes the SQL, and finally generates a distributed execution plan.

· Horizontally scalable, providing an integrated interface to the outside world via load balancing components such as Linux Virtual Server (LVS) / HAProxy / F5. No data is stored, it is dedicated to compute and SQL analysis, and it sends the actual data read request to the TiKV node (or TiFlash node).

● PD (Placement Driver) server

-The PD server is a component that is composed of at least three nodes and is responsible for managing metadata for the entire cluster.

· Stores real-time data distribution metadata for all single TiKV nodes and the topology structure of the entire TiDB cluster, provides a TiDB dashboard management UI, and assigns transaction IDs to distributed transactions.

• The PD server not only stores the cluster metadata, but also sends data scheduling commands to specific TiKV nodes according to the data distribution status reported in real time by the TiKV nodes.

● Storage server

◇ TiKV server

-TiKV is a distributed transaction key-value storage engine, and the TiKV server is responsible for data storage.

-Each region stores data in a specific key range, which is the interval between left-closed and right-opened from StartKey to EndKey, and each TiKV node has multiple regions. The TiKV API provides native support for distributed transactions at the key-value pair level and supports snapshot isolation level isolation by default.

· After processing the SQL statement, the TiDB server translates the SQL execution plan into the actual call to the TiKV API. TiKV has native high availability and supports automatic failover because the data is stored in TiKV and all TiKV data is automatically maintained in multiple replicas (3 replicas by default).

◇ TiFlash server

-TiFlash server is a special type of storage server. Unlike regular TiKV nodes, TiFlash stores data column by column and is primarily designed to speed up the analysis process.

Summary

Since it is a highly available service with the characteristics of OSS, horizontally distributed, and MySQL compatible, it may be interesting to add it to one of the selections as a database for Web services such as social games and EC sites.

This blog post is originally written by Ohara Yuya from our HQ company Beyond Co.

Enterprise
Server Infrastructure
Management