SpringBoot 2 種方式快速實現分庫分表，輕鬆拿捏！

大家好，我是小富～

本文是《分庫分表ShardingSphere5.x原理與實戰》系列的第三篇文章，本文將為您介紹 ShardingSphere 的一些基礎特性和架構組成，以及在 Springboot 環境下通過 JAVA編碼 和 Yml設定 兩種方式快速實現分庫分表。

本文案例demo地址

一、什麼是 ShardingSphere？

shardingsphere 是一款開源的分散式關係型資料庫中介軟體，為 Apache 的頂級專案。其前身是 sharding-jdbc 和 sharding-proxy 的兩個獨立專案，後來在 2018 年合併成了一個專案，並正式更名為 ShardingSphere。

其中 sharding-jdbc 為整個生態中最為經典和成熟的框架，最早接觸分庫分表的人應該都知道它，是學習分庫分表的最佳入門工具。

如今的 ShardingSphere 已經不再是單純代指某個框架，而是一個完整的技術生態圈，由三款開源的分散式資料庫中介軟體 sharding-jdbc、sharding-proxy 和 sharding-sidecar 所構成。前兩者問世較早，功能較為成熟，是目前廣泛應用的兩個分散式資料庫中介軟體，因此在後續的文章中，我們將重點介紹它們的特點和使用方法。

二、為什麼選 ShardingSphere？

為了回答這個問題，我整理了市面上常見的分庫分表工具，包括 ShardingSphere、Cobar、Mycat、TDDL、MySQL Fabric 等，並從多個角度對它們進行了簡單的比較。

Cobar

Cobar 是阿里巴巴開源的一款基於MySQL的分散式資料庫中介軟體，提供了分庫分表、讀寫分離和事務管理等功能。它採用輪詢演演算法和雜湊演演算法來進行資料分片，支援分散式分表，但是不支援單庫分多表。

它以 Proxy 方式提供服務，在阿里內部被廣泛使用已開源，設定比較容易，無需依賴其他東西，只需要有Java環境即可。相容市面上幾乎所有的 ORM 框架，僅支援 MySQL 資料庫，且事務支援方面比較麻煩。

MyCAT

Mycat 是社群愛好者在阿里 Cobar 基礎上進行二次開發的，也是一款比較經典的分庫分表工具。它以 Proxy 方式提供服務，支援分庫分表、讀寫分離、SQL路由、資料分片等功能。

相容市面上幾乎所有的 ORM 框架，包括 Hibernate、MyBatis和 JPA等都相容，不過，美中不足的是它僅支援 MySQL資料庫，目前社群的活躍度相對較低。

TDDL

TDDL 是阿里巴巴集團開源的一款分庫分表解決方案，可以自動將SQL路由到相應的庫表上。它採用了垂直切分和水平切分兩種方式來進行分表分庫，並且支援多資料來源和讀寫分離功能。

TDDL 是基於 Java 開發的，支援 MySQL、Oracle 和 SQL Server 資料庫，並且可以與市面上 Hibernate、MyBatis等 ORM 框架整合。

不過，TDDL僅支援一些阿里巴巴內部的工具和框架的整合，對於外部公司來說可能相對有些侷限性。同時，其檔案和社群活躍度相比 ShardingSphere 來說稍顯不足。

Mysql Fabric

MySQL Fabric是 MySQL 官方提供的一款分庫分表解決方案，同時也支援 MySQL其他功能，如高可用、負載均衡等。它採用了管理節點和代理節點的架構，其中管理節點負責實時管理分片資訊，代理節點則負責接收並處理使用者端的讀寫請求。

它僅支援 MySQL 資料庫，並且可以與市面上 Hibernate、MyBatis 等 ORM 框架整合。MySQL Fabric 的檔案相對來說比較簡略，而且由於是官方提供的解決方案，其社群活躍度也相對較低。

ShardingSphere

ShardingSphere 成員中的 sharding-jdbc 以 JAR 包的形式下提供分庫分表、讀寫分離、分散式事務等功能，但僅支援 Java 應用，在應用擴充套件上存在侷限性。

因此，ShardingSphere 推出了獨立的中介軟體 sharding-proxy，它基於 MySQL協定實現了透明的分片和多資料來源功能，支援各種語言和框架的應用程式使用，對接的應用程式幾乎無需更改程式碼，分庫分表設定可在代理服務中進行管理。

除了支援 MySQL，ShardingSphere還可以支援 PostgreSQL、SQLServer、Oracle等多種主流資料庫，並且可以很好地與 Hibernate、MyBatis、JPA等 ORM 框架整合。重要的是，ShardingSphere的開源社群非常活躍。

如果在使用中出現問題，使用者可以在 GitHub 上提交PR並得到快速響應和解決，這為使用者提供了足夠的安全感。

產品比較

通過對上述的 5 個分庫分表工具進行比較，我們不難發現，就整體效能、功能豐富度以及社群支援等方面來看，ShardingSphere 在眾多產品中優勢還是比較突出的。下邊用各個產品的主要指標整理了一個表格，看著更加直觀一點。

三、ShardingSphere 成員

ShardingSphere 的主要組成成員為sharding-jdbc、sharding-proxy，它們是實現分庫分表的兩種不同模式：

sharding-jdbc

它是一款輕量級Java框架，提供了基於 JDBC 的分庫分表功能，為使用者端直連模式。使用sharding-jdbc，開發者可以通過簡單的設定實現資料的分片，同時無需修改原有的SQL語句。支援多種分片策略和演演算法，並且可以與各種主流的ORM框架無縫整合。

sharding-proxy

它是基於 MySQL 協定的代理服務，提供了透明的分庫分表功能。使用 sharding-proxy 開發者可以將分片邏輯從應用程式中解耦出來，無需修改應用程式碼就能實現分片功能，還支援多資料來源和讀寫分離等高階特性，並且可以作為獨立的服務執行。

四、快速實現

我們先使用sharding-jdbc來快速實現分庫分表。相比於 sharding-proxy，sharding-jdbc 適用於簡單的應用場景，不需要額外的環境搭建等。下邊主要基於 SpringBoot 的兩種方式來實現分庫分表，一種是通過YML設定方式，另一種則是通過純Java編碼方式（不可並存）。在後續章節中，我們會單獨詳細介紹如何使用sharding-proxy以及其它高階特性。

ShardingSphere 官網地址：https://shardingsphere.apache.org/

準備工作

在開始實現之前，需要對資料庫和表的拆分規則進行明確。以對t_order表進行分庫分表拆分為例，具體地，我們將 t_order 表拆分到兩個資料庫中，分別為db1和db2，每個資料庫又將該表拆分為三張表，分別為t_order_1、t_order_2和t_order_3。

db0
├── t_order_0
├── t_order_1
└── t_order_2
db1
├── t_order_0
├── t_order_1
└── t_order_2

JAR包引入

引入必要的 JAR 包，其中最重要的是shardingsphere-jdbc-core-spring-boot-starter和mysql-connector-java這兩個。為了保證功能的全面性和相容性，以及避免因低版本包導致的不必要錯誤和偵錯工作，我選擇的包版本都較高。

shardingsphere-jdbc-core-spring-boot-starter 是 ShardingSphere 框架的核心元件，提供了對 JDBC 的分庫分表支援；而 mysql-connector-java 則是 MySQL JDBC 驅動程式的實現，用於連線MySQL資料庫。除此之外，我使用了JPA作為持久化工具還引入了相應的依賴包。

<!-- jpa持久化工具 -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
    <version>2.7.6</version>
</dependency>
<!-- 必須引入的包 mysql -->
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>8.0.31</version>
</dependency>
<!-- 必須引入的包 ShardingSphere -->
<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
    <version>5.2.0</version>
</dependency>

YML設定

我個人是比較推薦使用YML設定方式來實現 sharding-jdbc 分庫分表的，使用YML設定方式不僅可以讓分庫分表的實現更加簡單、高效、可維護，也更符合 SpringBoot的開發規範。

在 src/main/resources/application.yml 路徑檔案下新增以下完整的設定，即可實現對t_order表的分庫分表，接下來拆解看看每個設定模組都做了些什麼。


spring:
  shardingsphere:
    # 資料來源設定
    datasource:
      # 資料來源名稱，多資料來源以逗號分隔
      names: db0,db1
      db0:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
        username: root
        password: 123456
      db1:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
        username: root
        password: 123456
    # 分片規則設定
    rules:
      sharding:
        # 分片演演算法設定
        sharding-algorithms:
          database-inline:
            # 分片演演算法型別
            type: INLINE
            props:
              # 分片演演算法的行表示式（演演算法自行定義，此處為方便演示效果）
              algorithm-expression: db$->{order_id > 4?1:0}
          table-inline:
            # 分片演演算法型別
            type: INLINE
            props:
              # 分片演演算法的行表示式
              algorithm-expression: t_order_$->{order_id % 4}
        tables:
          # 邏輯表名稱
          t_order:
            # 行表示式識別符號可以使用 ${...} 或 $->{...}，但前者與 Spring 本身的屬性檔案預留位置衝突，因此在 Spring 環境中使用行表示式識別符號建議使用 $->{...}
            actual-data-nodes: db${0..1}.t_order_${0..3}
            # 分庫策略
            database-strategy:
              standard:
                # 分片列名稱
                sharding-column: order_id
                # 分片演演算法名稱
                sharding-algorithm-name: database-inline
            # 分表策略
            table-strategy:
              standard:
                # 分片列名稱
                sharding-column: order_id
                # 分片演演算法名稱
                sharding-algorithm-name: table-inline
    # 屬性設定
    props:
      # 展示修改以後的sql語句
      sql-show: true

以下是 shardingsphere 多資料來源資訊的設定，其中的 names 表示需要連線的資料庫別名列表，每新增一個資料庫名就需要新增一份對應的資料庫連線設定。

spring:
  shardingsphere:
    # 資料來源設定
    datasource:
      # 資料來源名稱，多資料來源以逗號分隔
      names: db0,db1
      db0:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
        username: root
        password: 123456
      db1:
        type: com.zaxxer.hikari.HikariDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        jdbc-url: jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true
        username: root
        password: 123456

rules節點下為分片規則的設定，sharding-algorithms 節點為自定義的分片演演算法模組，分片演演算法可以在後邊設定表的分片規則時被參照，其中：

database-inline：自定義的分片演演算法名稱；
type：該分片演演算法的型別，這裡先以 inline 為例，後續會有詳細章節介紹；
props：指定該分片演演算法的具體內容，其中 algorithm-expression 是該分片演演算法的表示式，即根據分片鍵值計算出要存取的真實資料庫名或表名，。

db$->{order_id % 2} 這種為 Groovy 語言表示式，表示對分片鍵 order_id 進行取模，根據取模結果計算出db0、db1，分表的表示式同理。

spring:
  shardingsphere:
    # 規則設定
    rules:
      sharding:
        # 分片演演算法設定
        sharding-algorithms:
          database-inline:
            # 分片演演算法型別
            type: INLINE
            props:
              # 分片演演算法的行表示式（演演算法自行定義，此處為方便演示效果）
              algorithm-expression: db$->{order_id % 2}
          table-inline:
            # 分片演演算法型別
            type: INLINE
            props:
              # 分片演演算法的行表示式
              algorithm-expression: t_order_$->{order_id % 3}

tables節點定義了邏輯表名t_order的分庫分表規則。actual-data-nodes 用於設定物理資料節點的數量。

db${0..1}.t_order_${0..3} 表示式意思此邏輯表在不同資料庫範例中的分佈情況，如果只想單純的分庫或者分表，可以調整表示式，分庫db${0..1}、分表t_order_${0..3}。

db0
├── t_order_0
├── t_order_1
└── t_order_2
db1
├── t_order_0
├── t_order_1
└── t_order_2

spring:
  shardingsphere:
    # 規則設定
    rules:
      sharding:
        tables:
          # 邏輯表名稱
          t_order:
            # 行表示式識別符號可以使用 ${...} 或 $->{...}，但前者與 Spring 本身的屬性檔案預留位置衝突，因此在 Spring 環境中使用行表示式識別符號建議使用 $->{...}
            actual-data-nodes: db${0..1}.t_order_${0..3}
            # 分庫策略
            database-strategy:
              standard:
                # 分片列名稱
                sharding-column: order_id
                # 分片演演算法名稱
                sharding-algorithm-name: database-inline
            # 分表策略
            table-strategy:
              standard:
                # 分片列名稱
                sharding-column: order_id
                # 分片演演算法名稱
                sharding-algorithm-name: table-inline

database-strategy 和 table-strategy分別設定了分庫和分表策略；

sharding-column表示根據表的哪個列（分片鍵）進行計算分片路由到哪個庫、表中；

sharding-algorithm-name 表示使用哪種分片演演算法對分片鍵進行運算處理，這裡可以參照剛才自定義的分片演演算法名稱使用。

props節點用於設定其他的屬性設定，比如：sql-show表示是否在控制檯輸出解析改造後真實執行的 SQL語句以便進行偵錯。

spring:
  shardingsphere:
    # 屬性設定
    props:
      # 展示修改以後的sql語句
      sql-show: true

跑個單測在向資料庫中插入 10 條資料時，發現資料已經相對均勻地插入到了各個分片中。

JAVA 編碼

如果您不想通過 yml 組態檔實現自動裝配，也可以使用 ShardingSphere 的 API 實現相同的功能。使用 API 完成分片規則和資料來源的設定，優勢在於更加靈活、可客製化性強的特點，方便進行二次開發和擴充套件。

下邊是純JAVA編碼方式實現分庫分表的完整程式碼。

@Configuration
public class ShardingConfiguration {

    /**
     * 設定分片資料來源
     * 公眾號：程式設計師小富
     */
    @Bean
    public DataSource getShardingDataSource() throws SQLException {
        Map<String, DataSource> dataSourceMap = new HashMap<>();
        dataSourceMap.put("db0", dataSource1());
        dataSourceMap.put("db1", dataSource2());

        // 分片rules規則設定
        ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration();
        shardingRuleConfig.setShardingAlgorithms(getShardingAlgorithms());

        // 設定 t_order 表分片規則
        ShardingTableRuleConfiguration orderTableRuleConfig = new ShardingTableRuleConfiguration("t_order", "db${0..1}.t_order_${0..2}");
        orderTableRuleConfig.setTableShardingStrategy(new StandardShardingStrategyConfiguration("order_id", "table-inline"));
        orderTableRuleConfig.setDatabaseShardingStrategy(new StandardShardingStrategyConfiguration("order_id", "database-inline"));
        shardingRuleConfig.getTables().add(orderTableRuleConfig);

        // 是否在控制檯輸出解析改造後真實執行的 SQL
        Properties properties = new Properties();
        properties.setProperty("sql-show", "true");
        // 建立 ShardingSphere 資料來源
        return ShardingSphereDataSourceFactory.createDataSource(dataSourceMap, Collections.singleton(shardingRuleConfig), properties);
    }

    /**
     * 設定資料來源1
     * 公眾號：程式設計師小富
     */
    public DataSource dataSource1() {
        HikariDataSource dataSource = new HikariDataSource();
        dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
        dataSource.setJdbcUrl("jdbc:mysql://127.0.0.1:3306/shardingsphere-db1?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true");
        dataSource.setUsername("root");
        dataSource.setPassword("123456");
        return dataSource;
    }

    /**
     * 設定資料來源2
     * 公眾號：程式設計師小富
     */
    public DataSource dataSource2() {
        HikariDataSource dataSource = new HikariDataSource();
        dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
        dataSource.setJdbcUrl("jdbc:mysql://127.0.0.1:3306/shardingsphere-db0?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true");
        dataSource.setUsername("root");
        dataSource.setPassword("123456");
        return dataSource;
    }

    /**
     * 設定分片演演算法
     * 公眾號：程式設計師小富
     */
    private Map<String, AlgorithmConfiguration> getShardingAlgorithms() {
        Map<String, AlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>();

        // 自定義分庫演演算法
        Properties databaseAlgorithms = new Properties();
        databaseAlgorithms.setProperty("algorithm-expression", "db$->{order_id % 2}");
        shardingAlgorithms.put("database-inline", new AlgorithmConfiguration("INLINE", databaseAlgorithms));

        // 自定義分表演演算法
        Properties tableAlgorithms = new Properties();
        tableAlgorithms.setProperty("algorithm-expression", "t_order_$->{order_id % 3}");
        shardingAlgorithms.put("table-inline", new AlgorithmConfiguration("INLINE", tableAlgorithms));

        return shardingAlgorithms;
    }
}

ShardingSphere 的分片核心設定類 ShardingRuleConfiguration，它主要用來載入分片規則、分片演演算法、主鍵生成規則、繫結表、廣播表等核心設定。我們將相關的設定資訊 set到設定類，並通過createDataSource建立並覆蓋 DataSource，最後注入Bean。

使用Java編碼方式只是將 ShardingSphere 預知的載入設定邏輯自己手動實現了一遍，兩種實現方式比較下來，還是推薦使用YML設定方式來實現 ShardingSphere的分庫分表功能，相比於Java編碼，YML設定更加直觀和易於理解，開發者可以更加專注於業務邏輯的實現，而不需要過多關注底層技術細節。

@Getter
@Setter
public final class ShardingRuleConfiguration implements DatabaseRuleConfiguration, DistributedRuleConfiguration {
    
    // 分表設定設定
    private Collection<ShardingTableRuleConfiguration> tables = new LinkedList<>();
    // 自動分片規則設定
    private Collection<ShardingAutoTableRuleConfiguration> autoTables = new LinkedList<>();
    // 繫結表設定
    private Collection<String> bindingTableGroups = new LinkedList<>();
    // 廣播表設定
    private Collection<String> broadcastTables = new LinkedList<>();
    // 預設的分庫策略設定
    private ShardingStrategyConfiguration defaultDatabaseShardingStrategy;
    // 預設的分表策略設定
    private ShardingStrategyConfiguration defaultTableShardingStrategy;
    // 主鍵生成策略設定
    private KeyGenerateStrategyConfiguration defaultKeyGenerateStrategy;
    
    private ShardingAuditStrategyConfiguration defaultAuditStrategy;
    // 預設的分片鍵
    private String defaultShardingColumn;
    // 自定義的分片演演算法
    private Map<String, AlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>();
    // 主鍵生成演演算法
    private Map<String, AlgorithmConfiguration> keyGenerators = new LinkedHashMap<>();
    
    private Map<String, AlgorithmConfiguration> auditors = new LinkedHashMap<>();
}

經過檢視控制檯列印的真實 SQL紀錄檔，發現在使用 ShardingSphere 進行資料插入時，其內部實現會先根據分片鍵 order_id 查詢記錄是否存在。如果記錄不存在，則執行插入操作；如果記錄已存在，則進行更新操作。看似只會執行10條插入SQL，但實際上需要執行20條SQL語句，多少會對資料庫的效能產生一定的影響。

功能挺簡單的，但由於不同版本的 ShardingSphere 的 API 變化較大，網上類似的資料太不靠譜，本來想著藉助 GPT 快點實現這段程式碼，結果差點和它幹起來，最後還是扒了扒看了原始碼完成的。

預設資料來源

可能有些小夥伴會有疑問，對於已經設定了分片規則的t_order表可以正常運算元據，如果我們的t_user表沒有設定分庫分表規則，那麼在執行插入操作時會發生什麼呢？

仔細看了下官方的技術檔案，其實已經回答了小夥伴這個問題，如果只有部分資料庫分庫分表，是否需要將不分庫分表的表也設定在分片規則中？官方回答：不需要。

我們建立一張t_user表，並且不對其進行任何分片規則的設定。在我的印象中沒有通過設定 default-data-source-name 預設的資料來源，操作未分片的表應該會報錯的！

我們向t_user嘗試插入一條資料，結果居然成功了？翻了翻庫表發現資料只被插在了 db1 庫裡，說明沒有走廣播路由。

shardingsphere-jdbc 5.x版本移除了原本的預設資料來源設定，自動使用了預設資料來源的規則，為驗證我多增加了資料來源，嘗試性的調整了db2、db0、db1的順序，再次插入資料，這回記錄被插在了 db2 庫，反覆試驗初步得出結論。

未分片的表預設會使用第一個資料來源作為預設資料來源，也就是 datasource.names 第一個。

spring:
  shardingsphere:
    # 資料來源設定
    datasource:
      # 資料來源名稱，多資料來源以逗號分隔
      names: db2 , db1 , db0

總結

本期我們對 shardingsphere 做了簡單的介紹，並使用 yml 和 Java編碼的方式快速實現了分庫分表功能，接下來會按照文首的思維導圖的功能逐一實現。

下期文章將是《分庫分表ShardingSphere5.x原理與實戰》系列的第四篇，《分庫分表預設分片策略、廣播表、繫結表一網打盡》。

本文案例demo地址

我是小富，下期見～