当前位置:网站首页>Clickhouse case: inconsistent query results

Clickhouse case: inconsistent query results

2020-12-07 19:20:56 Yannic

The problem background

A user reported that he used ClickHouse Clustering the same query returns different results , Whether it is ClickHouse Data can't guarantee consistency , There is still a problem with the cluster .

For databases , The accuracy of the query data is crucial , I query the confirmed data and you give me inconsistent results , So, in terms of the usability of this result , Therefore, the importance of this issue to users is self-evident .

After receiving feedback from users about this problem , The first time and the user confirmed the specific use of the user .

  1. Create local tables at each node in the cluster , The table engine is Kafka At the same time, the corresponding view is created ( consumption Kafka The data in );
  2. Create a distributed table , Watch engine Distributed, Summary view ;
  3. Executing the same query multiple times returned inconsistent results .

Querying data is done through distributed tables , To find out why the data returned by each query is inconsistent , First, we need to understand the principle of distributed tables .

Distributed table

have Distributed engines The table itself does not store any data , But distributed queries can be performed on multiple nodes . The read will be parallelized automatically , No parameter configuration or manual intervention is required . Random selection of a query shard Of replica To read . If the table has an index, use the index first .

Distributed engine parameters : Cluster name in the server configuration file , Remote database name , Remote table name , Data fragmentation key ( Optional ).

Distributed(logs, default, hits[, sharding_key])

The query will be performed from default.hits Read data in table .

The cluster configuration of the example in this article is as follows :

<?xml version="1.0" encoding="UTF-8"?>
<yandex>
    <clickhouse_remote_servers>
        <default_cluster>
            <shard>
                <internal_replication>false</internal_replication>
                <replica>
                    <host>10.0.3.27</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>10.0.3.41</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>10.0.3.46</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>10.0.3.26</host>
                    <port>9000</port>
                </replica>
            </shard>
        </default_cluster>
    </clickhouse_remote_servers>
    <zookeeper-servers>
        <node>
            <host>10.0.3.12</host>
            <port>2181</port>
        </node>
        <node>
            <host>10.0.3.3</host>
            <port>2181</port>
        </node>
        <node>
            <host>10.0.3.23</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
</yandex>

Cluster name :default_cluster, It includes two pieces , Two copies of each slice .

Fragmentation : Each piece ( The server ) Contains different data ( To read all the data , must Visit all tiles ).

copy : Multiple servers with the same redundancy ( When reading data, you can access the data on any copy ).

When a copy is specified , The read operation will be for each slice Select an available copy . You can also configure algorithms for load balancing ( Preferences for accessing replicas (load_balancing = random/nearest_hostname/first_or_random/round_robin)– Please refer to the official documents for details load_balancing Set up .

Problem recurrence

colony :

┌─cluster─────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ default_cluster │         1 │            1 │           1 │ 10.0.3.27 │ 10.0.3.27    │ 9000 │        1 │ default │                  │            0 │                       0 │
│ default_cluster │         1 │            1 │           2 │ 10.0.3.41 │ 10.0.3.41    │ 9000 │        0 │ default │                  │            0 │                       0 │
│ default_cluster │         2 │            1 │           1 │ 10.0.3.46 │ 10.0.3.46    │ 9000 │        0 │ default │                  │            0 │                       0 │
│ default_cluster │         2 │            1 │           2 │ 10.0.3.26 │ 10.0.3.26    │ 9000 │        0 │ default │                  │            0 │                       0 │
└─────────────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

As shown above : There are two shards in the cluster , Two for each slice 2 copy

shard_num 1: 10.0.3.27 10.0.3.41

shard_num 1: 10.0.3.46 10.0.3.26

To create local Non replicated table 、 Create a distributed table 、 Insert data into a distributed table 、 Query distributed tables

CREATE TABLE test.ddl_test ON cluster default_cluster(
  `Year` UInt16,
  `Quarter` UInt8,
  `Month` UInt8,
  `DayofMonth` UInt8,
  `DayOfWeek` UInt8,
  ...
) ENGINE = MergeTree
PARTITION BY Year
ORDER BY (Carrier, FlightDate)
SETTINGS index_granularity = 8192;
CREATE TABLE test.ddl_all ON cluster default_cluster AS test.ddl_test
ENGINE = Distributed(default_cluster, test, ddl_test, rand())
10.0.3.27 :) INSERT INTO ddl_all SELECT * FROM ontime;

INSERT INTO ddl_all SELECT *
FROM ontime

 Progress: 185.13 million rows, 134.51 GB (413.80 thousand rows/s., 300.65 MB/s.) ██████████ 99%Ok.

0 rows in set. Elapsed: 447.398 sec. Processed 185.13 million rows, 134.51 GB (413.80 thousand rows/s., 300.65 MB/s.)
# clickhouse-client -h 10.0.3.27 <<< "select count(1) from test.ddl_all"
142975770
# clickhouse-client -h 10.0.3.27 <<< "select count(1) from test.ddl_all"
134720581

You can see that the same query returns different results .

## 41 And 27 The same piece and will internal_replication Set to false
# clickhouse-client -h 10.0.3.41  <<< "select count(1) from test.ddl_test" 
92562599
# clickhouse-client -h 10.0.3.27  <<< "select count(1) from test.ddl_test" 
92562599
## 46 And 26 The same piece and will internal_replication Set to true
# clickhouse-client -h 10.0.3.46  <<< "select count(1) from test.ddl_test" 
50413171
# clickhouse-client -h 10.0.3.26  <<< "select count(1) from test.ddl_test" 
42157982

# clickhouse-client -h 10.0.3.41 <<< "select count(1) from test.ddl_all"
142975770
# clickhouse-client -h 10.0.3.27 <<< "select count(1) from test.ddl_all"
142975770
# clickhouse-client -h 10.0.3.46 <<< "select count(1) from test.ddl_all"
142975770
# clickhouse-client -h 10.0.3.26 <<< "select count(1) from test.ddl_all"
134720581

# clickhouse-client -h 10.0.3.27 --max_parallel_replicas=2 <<< "select count(1) from test.ddl_all"
185133752
# clickhouse-client -h 10.0.3.26 --max_parallel_replicas=2 <<< "select count(1) from test.ddl_all"
227283180
# clickhouse-client -h 10.0.3.41 --max_parallel_replicas=2 <<< "select count(1) from test.ddl_all"
185133752
# clickhouse-client -h 10.0.3.46 --max_parallel_replicas=2 <<< "select count(1) from test.ddl_all"
235538369

Principle analysis

  1. First query Distributed The process of the table engine is to query through the local table first , Under the same partition as the current node Replication The replica node will not receive the query request , A partition different from the current node randomly selects one of the shards to send the request , Then aggregate the data returned by each partition, and finally return the final result .
  2. If you specify --max_parallel_replicas Greater than 1 individual , It's going to be parallel to multiple ( The preceding paragraph specifies ) Do not send the query request to the replica under the same partition of the current node , The results returned by these copies of the received requests are then aggregated ( It doesn't get rid of repetition ), So you can see from the previous chapter that we specified --max_parallel_replicas=2 At the node 10.0.3.46 and 10.0.3.26 More than the correct results have been found on 185133752 The number of .
  3. Distributed tables are closely related to local tables ( Similar to the concept of view ), If it is set in the same slice <internal_replication>false</internal_replication> Inserting data through a distributed table writes to multiple copies at the same time , So that each copy has complete data , At this point through Distributed The table engine queries the distributed table and returns the correct results . But this situation may lead to inconsistent states of the final replicas ( If not used Zookeeper To coordinate , Any interruption of a single node will result in inconsistency of final data ).

Finally, we set all partitions in the cluster :<internal_replication>false</internal_replication>

After that, carry out the above test , The results are as follows :

# clickhouse-client -h 10.0.3.41  <<< "select count(1) from test.ddl_local"
92567953
# clickhouse-client -h 10.0.3.27  <<< "select count(1) from test.ddl_local"
92567953
# clickhouse-client -h 10.0.3.46  <<< "select count(1) from test.ddl_local"
92565799
# clickhouse-client -h 10.0.3.26  <<< "select count(1) from test.ddl_local"
92565799
# clickhouse-client -h 10.0.3.41  <<< "select count(1) from test.ddl_all"
185133752
# clickhouse-client -h 10.0.3.27  <<< "select count(1) from test.ddl_all"
185133752
# clickhouse-client -h 10.0.3.46  <<< "select count(1) from test.ddl_all"
185133752
# clickhouse-client -h 10.0.3.26  <<< "select count(1) from test.ddl_all"
185133752
# clickhouse-client -h 10.0.3.26 --max_parallel_replicas=2  <<< "select count(1) from test.ddl_all"
277701705

You can see the results of the appeal analysis , Normal query of distributed table can get correct results , Use --max_parallel_replicas=2 Specifies that the number of copies of the simultaneous parallel query is 2 You get a result with redundancy 277701705, It's not the right result 185133752.

Solution

  1. Use Replicated MergeTree family surface
  2. It is recommended not to pass Distributed Table insert data ( Suitable for inquiry )
  3. The number of copies is greater than or equal to 2 When , Distributed tables must be built on Replicated Engine local table , This can avoid a lot of exceptions .

reference

1 https://clickhouse.tech/docs/en/engines/table-engines/special/distributed/

2 https://github.com/ClickHouse/ClickHouse/issues/5835

3 https://github.com/ClickHouse/ClickHouse/issues/1443

4 https://clickhouse.tech/docs/zh/engines/table-engines/special/distributed/

5 https://clickhouse.tech/docs/en/operations/settings/settings/#settings-load_balancing

6 https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/replication/

7 https://github.com/ClickHouse/ClickHouse/issues/1854

Original statement , This article is authorized by the author + Community publication , Unauthorized , Shall not be reproduced .

If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

版权声明
本文为[Yannic]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/11/20201116120812213x.html