PXC系列学习笔记（六）——Java千万数据量导入和查询优化

博主： F嘉阳
发布时间：2019 年 12 月 30 日
1292 次浏览
暂无评论
3254字数
分类：学习

一、介绍

通用导入方法：数据量大应当使用load data方式导入，而不是使用source sql文件的方式

此处为了验证PXC集群性能，需要创造一千万条数据进行SQL语句性能测试

二、数据生成

若对数据内容无要求，则可使用通用工具生成随机数据，此处面向具体业务场景，故使用Java应用程序生成数据

1. 创建SpringBoot工程

使用SpringBoot快速创建数据导入工程，引入jdbc和jpa依赖

...

    org.springframework.boot
    spring-boot-starter-parent
    2.2.2.RELEASE
     

...

    
        org.springframework.boot
        spring-boot-starter-data-jpa
    

    
        mysql
        mysql-connector-java
        runtime
    
    
        org.projectlombok
        lombok
        true
    
    
        org.springframework.boot
        spring-boot-starter-test
        test
        
            
                org.junit.vintage
                junit-vintage-engine

2. 编写实体类

package top.fjy8018.jdbcbench;

/**
 * @author F嘉阳
 * @date 2019-12-28 13:08
 */
@Entity(name = "tb_test")
@Data
public class DBTest {

    @Id
    private Integer id;

    private String name;
}

实现JPA接口

package top.fjy8018.jdbcbench;

/**
 * @author F嘉阳
 * @date 2019-12-28 13:09
 */
public interface DBTestRepository extends JpaRepository {}

3. 配置JDBC连接

此处有个重要配置，就是rewriteBatchedStatements=true，加上该配置，批量写入数据的性能能提升数百倍。

参考资料：jdbcTemplate.batchUpdate在批量执行的时候，性能差没有效果，看看怎么解决的。

1577623991663

spring:
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    username: admin
    password: password
    url: jdbc:mysql://192.168.1.9:13306/test?useSSL=false&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true&serverTimezone=Asia/Shanghai
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 10000
        order_inserts: true
        generate_statistics: true

4.准备数据库

每个PXC分片只开启一个节点，导入一个节点中则不会产生限流，导入完成后拷贝数据文件到其他节点再启动即可

修改PXC节点文件，然后重启PXC服务

## 不等待事务提交先写入硬盘
innodb_flush_log_at_trx_commit = 0
## 日志数据直接写入磁盘，不进入日志缓冲区
innodb_flush_method = O_DIRECT
## 缓存越大越好
innodb_buffer_pool_size = 200M

创建t_test数据表

CREATE TABLE tb_test(
	id INT UNSIGNED PRIMARY KEY,
    name VARCHAR(200) NOT NULL
);

配置MyCat




        
        
                



                

                

                
                     

        
        
        
        
        
        
                select user()
                
                
                
        
        
                select user()


5.编写单元测试
package top.fjy8018.jdbcbench;

@Slf4j
@Component
class JdbcBenchTest extends JdbcBenchApplicationTests{

    @Autowired
    private DBTestRepository repository;

    @Autowired
    private JdbcTemplate jdbcTemplate;

    @Test
    public void insertTest(){
        DBTest t = new DBTest();
        t.setId(1);
        t.setName(1+"，测试实体");
        DBTest save = repository.save(t);
        Assert.notNull(save,"保存测试失败");
    }

    @Test
    public void insertData() throws Exception {
        for (int i = 1; i <= 10000000;) {
            List tests = new LinkedList<>();
            for (int j = 1; j <= 50000; j++) {
                DBTest t = new DBTest();
                t.setId(j+i);
                t.setName(j+i+"，测试实体");
                tests.add(t);
            }
            i = i+50000;
            log.info("插入{}条数据",i);
            batchInsert(tests);
            tests.clear();
        }
    }

    @Transactional
    public void batchInsert(List dbTestList) {
        String sql = "insert into tb_test(id,name) values(?,?)";
        log.info("批次大小：{}",dbTestList.size());
        jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                String name = dbTestList.get(i).getName();
                int id = dbTestList.get(i).getId();
                ps.setInt(1, id);
                ps.setString(2, name);
            }
            public int getBatchSize() {
                return dbTestList.size();
            }
        });
    }
}

执行单元测试，大约1小时写入完成，不同服务器配置写入性能差距明显，应当以实际为准
三、数据导入后的操作

关闭PXC节点(还原配置文件)
拷贝数据文件到其他PXC节点
关闭 MyCat(还原配置文件)
启动PC和 MyCat

四、大数据性能测试——limit关键字
SELECT id, name FROM tb_test LIMIT 100, 100;
SELECT id, name FROM tb_test LIMIT 10000, 100;
SELECT id, name FROM tb_test LIMIT 1000000, 100;
SELECT id, name From tb_test LIMIT 5000000, 100;

现象：执行最后一条sql语句时速度慢，且cpu和内存占用率极大，主要对mycat节点内存占用一直居高不下
分析索引使用情况，rows字段显示其操作了4746302条数据，走全表扫描



id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra




1
SIMPLE
tb_test

ALL




4746302
100




原因：

全表扫描,速度极慢
limit语句的查询时间与起始记录的位置成正比
mysql limit的语句是很方便,但是对记录很多的表并不适合直接使用

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	tb_test		ALL					4746302	100



                          
             
                     
                     最后修改：2019 年 12 月 30 日
                 
© 允许规范转载
                 
             
                         
             
                  
                 
                     
                         
                             
                             赞赏作者
                         
                         
                             
                                 支付宝
微信
        
                                
                            
                            

                            
                            
    

                             
 
 
                         

                     

                 

        
                        如果觉得我的文章对你有用，请随意赞赏


       
       
        
          
下一篇 
     上一篇 
        
       
       
        
            
    
    
    

                    
            
        
        
                                    

                    发表评论                        
                            取消回复                        
                        
                            
                            使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款                        
                    
                    
                        
                        
                            评论                                *
                            
                            
                                                        
                                私密评论
                                
                                    
                                        
                                        
                                    
                                
                            
                                                    
                        
                                                                            
                                                                
                                    名称                                        *
                                    
                                                                                
                                        
                                                                                🎲
                                                                            
                                

                                
                                    邮箱                                                                                *
                                                                            
                                    
                                

                                
                                    地址


     
             
     
               
       
        
             热门文章
 
                             最新评论
  
                          随机文章
             
            
        
       
       
       
        

         
          
                
                
                      GraphQL实践10——Netflix Dgs Graphql异步订阅 
                      浏览次数: 154526
                    
              

                
                
                      hadoop分布式搭建的一些坑 
                      浏览次数: 26809
                    
              

                
                
                      LeetCode刷题记录 
                      浏览次数: 6002
                    
              

                
                
                      Typecho  Pinghsu 主题加入备案信息 
                      浏览次数: 4869
                    
              

                
                
                      CentOS 7 安装KVM和WebVirtMgr管理面板 
                      浏览次数: 4636
                    
              
         
        
                   
        

         
                              

              
                                
              
                  
              
              
                  
                       gdydguuauo 
                  
                  
                      
                          真好呢                      
                  
              
          
                    

              
                                
              
                  
              
              
                  
                       gdrauldllh 
                  
                  
                      
                          真棒！                      
                  
              
          
                    

              
                                
              
                  
              
              
                  
                       xqhivvvaoc 
                  
                  
                      
                          怎么收藏这篇文章？                      
                  
              
          
                    

              
                                
              
                  
              
              
                  
                       123 
                  
                  
                      
                          项目怎么启动查看效果                      
                  
              
          
                    

              
                                
              
                  
              
              
                  
                       我不是大仙 
                  
                  
                      
                          高手，学习了                      
                  
              
          
                   
        
                   
        

            
            
                
                
                      Hadoop分布式搭建测试 
                      浏览次数: 913
                    
              

                
                
                      GraphQL实践6——Netflix Dgs Graphql N+1问题 
                      浏览次数: 295
                    
              

                
                
                      GitLab修改配置后nginx无法启动 
                      浏览次数: 2499
                    
              

                
                
                      Ansible安装配置 
                      浏览次数: 1549
                    
              

                
                
                      GraphQL实践7——Netflix Dgs Graphql分页查询 
                      浏览次数: 416
                    
              
            
        
       
      
               
               
       博客信息
       
                        131文章数目
                        
               8评论数目
                       
               7年278天运行天数
             2 个月前最后活动
       
      
                        
                
          
              文章标签
              
                  Java SpringBoot JPA MySQL Percona PXC              
          
          
              
                  文章目录




    

        
            PXC系列学习笔记（六）——Java千万数据量导入和查询优化 
            F嘉阳 • 2019 年 12 月 30 日     
            <h2><a id="content-一介绍" href="#content-一介绍" class="heading-permalink" aria-hidden="true" title="Permalink"></a>一、介绍</h2>
<p>通用导入方法：数据量大应当使用load data方式导入，而不是使用source sql文件的方式</p>
<p>此处为了验证PXC集群性能，需要创造一千万条数据进行SQL语句性能测试</p>
<h2><a id="content-二数据生成" href="#content-二数据生成" class="heading-permalink" aria-hidden="true" title="Permalink"></a>二、数据生成</h2>
<p>若对数据内容无要求，则可使用通用工具生成随机数据，此处面向具体业务场景，故使用Java应用程序生成数据</p>
<h3><a id="content-1-创建springboot工程" href="#content-1-创建springboot工程" class="heading-permalink" aria-hidden="true" title="Permalink"></a>1. 创建SpringBoot工程</h3>
<p>使用SpringBoot快速创建数据导入工程，引入jdbc和jpa依赖</p>
<pre><code class="language-xml">...
<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.2.2.RELEASE</version>
    <relativePath/> <!-- lookup parent from repository -->
</parent>
...
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>

    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
        <exclusions>
            <exclusion>
                <groupId>org.junit.vintage</groupId>
                <artifactId>junit-vintage-engine</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>
</code></pre>
<h3><a id="content-2-编写实体类" href="#content-2-编写实体类" class="heading-permalink" aria-hidden="true" title="Permalink"></a>2. 编写实体类</h3>
<pre><code class="language-java">package top.fjy8018.jdbcbench;

/**
 * @author F嘉阳
 * @date 2019-12-28 13:08
 */
@Entity(name = "tb_test")
@Data
public class DBTest {

    @Id
    private Integer id;

    private String name;
}
</code></pre>
<p>实现JPA接口</p>
<pre><code class="language-java">package top.fjy8018.jdbcbench;

/**
 * @author F嘉阳
 * @date 2019-12-28 13:09
 */
public interface DBTestRepository extends JpaRepository<DBTest,Integer> {}
</code></pre>
<h3><a id="content-3-配置jdbc连接" href="#content-3-配置jdbc连接" class="heading-permalink" aria-hidden="true" title="Permalink"></a>3. 配置JDBC连接</h3>
<p>此处有个重要配置，就是<code>rewriteBatchedStatements=true</code>，加上该配置，批量写入数据的性能能提升数百倍。</p>
<p>参考资料：<a rel="noopener noreferrer" href="https://blog.csdn.net/shushugood/article/details/81005718">jdbcTemplate.batchUpdate在批量执行的时候，性能差没有效果，看看怎么解决的。</a></p>
<p><img src="https://gitea.fjy8018.top/fjy8018/images/raw/branch/blog/img/1577623991663.png" alt="1577623991663" loading="lazy"  style=""></p>
<pre><code class="language-yaml">spring:
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    username: admin
    password: password
    url: jdbc:mysql://192.168.1.9:13306/test?useSSL=false&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true&serverTimezone=Asia/Shanghai
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 10000
        order_inserts: true
        generate_statistics: true
</code></pre>
<h3><a id="content-4准备数据库" href="#content-4准备数据库" class="heading-permalink" aria-hidden="true" title="Permalink"></a>4.准备数据库</h3>
<ul>
<li>
<p>每个PXC分片只开启一个节点，导入一个节点中则不会产生限流，导入完成后拷贝数据文件到其他节点再启动即可</p>
</li>
<li>
<p>修改PXC节点文件，然后重启PXC服务</p>
<pre><code class="language-ini">## 不等待事务提交先写入硬盘
innodb_flush_log_at_trx_commit = 0
## 日志数据直接写入磁盘，不进入日志缓冲区
innodb_flush_method = O_DIRECT
## 缓存越大越好
innodb_buffer_pool_size = 200M
</code></pre>
</li>
<li>
<p>创建t_test数据表</p>
<pre><code class="language-mysql">CREATE TABLE tb_test(
	id INT UNSIGNED PRIMARY KEY,
    name VARCHAR(200) NOT NULL
);
</code></pre>
</li>
<li>
<p>配置MyCat</p>
<pre><code class="language-xml"><table name="t_test" dataNode="dn1,dn2" rule="mod-long" />
</code></pre>
<pre><code class="language-xml"><?xml version="1.0"?>
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">
<mycat:schema xmlns:mycat="http://io.mycat/">
        <!--配置虚拟逻辑数据表-->
        <schema name="test" checkSQLschema="false" sqlMaxLimit="100">
                <table name="t_user" dataNode="dn1,dn2" rule="mod-long" />
                <table name="tb_test" dataNode="dn1,dn2" rule="mod-long" />
                <table name="t_customer" dataNode="dn1,dn2" rule="sharding-customer">
                     <childTable name="t_orders" primaryKey="ID" joinKey="customer_id" parentKey="id"/>
                </table>
        </schema>
        <!--配置分片关系-->
        <dataNode name="dn1" dataHost="cluster1" database="test" />
        <dataNode name="dn2" dataHost="cluster2" database="test" />
        <!--配置连接信息-->
        <dataHost name="cluster1" maxCon="1000" minCon="10" balance="0"
                writeType="0" dbType="mysql" dbDriver="native" switchType="1"
                slaveThreshold="100">
                <heartbeat>select user()</heartbeat>
                <!--多数情况 读多写少-->
                <writeHost host="W1" url="pxc4:3306" user="admin"
                     password="admin">
                </writeHost>
        </dataHost>
        <dataHost name="cluster2" maxCon="1000" minCon="10" balance="0"
                writeType="0" dbType="mysql" dbDriver="native" switchType="1"
                slaveThreshold="100">
                <heartbeat>select user()</heartbeat>
                <!--多数情况 读多写少-->
                <writeHost host="W1" url="pxc7:3306" user="admin"
                     password="admin">
                </writeHost>
        </dataHost>
</mycat:schema>

</code></pre>
</li>
</ul>
<h3><a id="content-5编写单元测试" href="#content-5编写单元测试" class="heading-permalink" aria-hidden="true" title="Permalink"></a>5.编写单元测试</h3>
<pre><code class="language-java">package top.fjy8018.jdbcbench;

@Slf4j
@Component
class JdbcBenchTest extends JdbcBenchApplicationTests{

    @Autowired
    private DBTestRepository repository;

    @Autowired
    private JdbcTemplate jdbcTemplate;

    @Test
    public void insertTest(){
        DBTest t = new DBTest();
        t.setId(1);
        t.setName(1+"，测试实体");
        DBTest save = repository.save(t);
        Assert.notNull(save,"保存测试失败");
    }

    @Test
    public void insertData() throws Exception {
        for (int i = 1; i <= 10000000;) {
            List<DBTest> tests = new LinkedList<>();
            for (int j = 1; j <= 50000; j++) {
                DBTest t = new DBTest();
                t.setId(j+i);
                t.setName(j+i+"，测试实体");
                tests.add(t);
            }
            i = i+50000;
            log.info("插入{}条数据",i);
            batchInsert(tests);
            tests.clear();
        }
    }

    @Transactional
    public void batchInsert(List<DBTest> dbTestList) {
        String sql = "insert into tb_test(id,name) values(?,?)";
        log.info("批次大小：{}",dbTestList.size());
        jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                String name = dbTestList.get(i).getName();
                int id = dbTestList.get(i).getId();
                ps.setInt(1, id);
                ps.setString(2, name);
            }
            public int getBatchSize() {
                return dbTestList.size();
            }
        });
    }
}
</code></pre>
<p>执行单元测试，大约1小时写入完成，不同服务器配置写入性能差距明显，应当以实际为准</p>
<h2><a id="content-三数据导入后的操作" href="#content-三数据导入后的操作" class="heading-permalink" aria-hidden="true" title="Permalink"></a>三、数据导入后的操作</h2>
<ol>
<li>关闭PXC节点(还原配置文件)</li>
<li>拷贝数据文件到其他PXC节点</li>
<li>关闭 MyCat(还原配置文件)</li>
<li>启动PC和 MyCat</li>
</ol>
<h2><a id="content-四大数据性能测试limit关键字" href="#content-四大数据性能测试limit关键字" class="heading-permalink" aria-hidden="true" title="Permalink"></a>四、大数据性能测试——limit关键字</h2>
<pre><code class="language-sql">SELECT id, name FROM tb_test LIMIT 100, 100;
SELECT id, name FROM tb_test LIMIT 10000, 100;
SELECT id, name FROM tb_test LIMIT 1000000, 100;
SELECT id, name From tb_test LIMIT 5000000, 100;
</code></pre>
<p>现象：执行最后一条sql语句时速度慢，且cpu和内存占用率极大，<strong>主要对mycat节点内存占用一直居高不下</strong></p>
<p>分析索引使用情况，rows字段显示其操作了4746302条数据，走全表扫描</p>
<table>
<thead>
<tr>
<th>id</th>
<th>select_type</th>
<th>table</th>
<th>partitions</th>
<th>type</th>
<th>possible_keys</th>
<th>key</th>
<th>key_len</th>
<th>ref</th>
<th>rows</th>
<th>filtered</th>
<th>Extra</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>SIMPLE</td>
<td>tb_test</td>
<td></td>
<td>ALL</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>4746302</td>
<td>100</td>
<td></td>
</tr>
</tbody>
</table>
<p>原因：</p>
<ul>
<li>全表扫描,速度极慢</li>
<li>limit语句的查询时间与起始记录的位置成正比</li>
<li>mysql limit的语句是很方便,但是对记录很多的表并不适合直接使用</li>
</ul>

一、介绍

二、数据生成

1. 创建SpringBoot工程

2. 编写实体类

3. 配置JDBC连接

4.准备数据库

5.编写单元测试

三、数据导入后的操作

四、大数据性能测试——limit关键字

发表评论 取消回复 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

PXC系列学习笔记（六）——Java千万数据量导入和查询优化

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款