当前位置:网站首页>Comparison test report of dolphin dB and infixdb

Comparison test report of dolphin dB and infixdb

2020-12-08 09:53:41 DolphinDB

InfluxDB It is an open source sequential database , from Go Language implementation . At its core is a custom storage engine TSM Tree, Time series data are optimized , Give priority to the performance of inserting and querying data .InfluxDB Use class SQL Query language of InfluxQL, And provide out of the box time series mathematics and statistical functions . For monitoring 、 Real time analysis 、 The Internet of things 、 Sensor data and other application scenarios . It is the most popular time series database at present .

DolphinDB Database It is an analytical distributed time series database , Built in processing streaming data processing engine , With built-in parallel and distributed computing capabilities , And provides distributed file systems , Support cluster expansion .DolphinDB With C++ To write , Very fast response . Offer something similar to Python Script language for data manipulation , Support class standards SQL The grammar of . Providing other common programming languages API, Easy integration with existing applications . Historical data analysis modeling and real-time stream data processing in the financial field , As well as the massive sensor data processing and real-time analysis in the field of the Internet of things .

This article will DolphinDB and InfluxDB Compare the performance test .

In this test , The hardware configuration is as follows :

equipment :DellXPS 8920(07DC)

CPU:Inter Core i7-7700 CPU @ 3.60GHz,4 The core 8 Threads

Memory :16GB

Hard disk :512GB SSD

operating system :Ubuntu 16.04 x64

because InfluxDB Cluster version closed source , So this test uses the DolphinDB and InfluxDB They are all stand-alone versions , And all configuration items are default configuration .

1. The data generated

In this test , We use NYSE On the website 2016 year 10 month 24 The stock trading data of the New York Stock Exchange in Japan produced Quotes_Big surface . Take out a part of the data and make up Quotes_Small surface . Table structure is like table 1 Shown .Quotes_Big The amount of data in the table is 78,721,394 strip ,Quotes_Small The amount of data in the table is 18,314,172 strip . The data download link and preprocessing script are shown in the appendix 1.

because DolphinDB and InfluxDB Differences in the way they are stored , We use the following design : We will Time Column is specified as InfluxDB Medium timestamp; take Exchange and Symbol Column is specified as InfluxDB Medium Tag Column ( It's like an indexed column ); take Bid_Price、Bid_Size、Offer_Price、Offer_Size Designated as InfluxDB Medium field Column ( It's like a column without an index ). stay DolphinD In the system, according to Symbol List on disk RANGE Partition , It is divided into 8 Districts .

 surface 1.  Data type mapping
surface 1. Data type mapping

2. Database query performance test

We are right. 11 Kind of commonly used SQL The query is tested by comparison .Quotes_Small The test results of table are as follows 2 Shown ,Quotes_Big The test results of table are as follows 3 Shown . among , about DolphinDB Test of , We use DolphinDB Official GUI; about InfluxDB, We use official PythonAPI. Execution time is in milliseconds , Contains only the execution time of the query itself , It doesn't include the time when the results are displayed . To reduce the effect of special values , Every query executes 10 Time , The data in the table is 10 Total time of queries . The script of this test is shown in the appendix 2.

In almost all tests ,DolphinDB We're all ahead of InfluxDB Many times , In some cases, the gap is even greater than 2 An order of magnitude (100 times ).InfluxDB The only one leading DolphinDB The test is in Quotes_Small Table No 4 Test projects , This is due to the small amount of data DolphinDB The advantage of parallel search partition is not obvious . However , In case of large amount of data ,DolphinDB In all the tests, the performance on is better than InfluxDB.DolphinDB The design of parallel search partition makes it more efficient in filtering queries InfluxDB.

 surface 2. Quotes_Small Table query performance test results
surface 2. Quotes_Small Table query performance test results ( Data volume :18,314,172)
 surface 3. Quotes_Big Table query performance test results
surface 3. Quotes_Big Table query performance test results ( Data volume :78,721,394)

3. Built in function calculation performance test

We tested 8 individual DolphinDB and InfluxDB Common built-in functions provided by . In all tests ,DolphinDB The performance is better than InfluxDB 1-2 An order of magnitude (10-100 times ). because DolphinDB Excellent for functions with sliding windows , stay “ Moving average ”(moving_average) tests ,DolphinDB Is leading in performance InfluxDB 3 More than orders of magnitude (>1000 times ).

InfluxDB stay Quotes_Big In the calculation of standard deviation and moving average of tables, there is a problem of insufficient memory after a long time of jamming (out of memory), explain InfluxDB Insufficient support for complex analysis scenarios of large-scale data . and DolphinDB Excellent performance in this respect .

Quotes_Small The test results of table are as follows 4 Shown ,Quotes_Big The test results of table are as follows 5 Shown . The unit of data in the table is millisecond . To reduce the effect of special values , Every calculation is done 10 Time . See Appendix for test script 3.

 surface 4. Quotes_Small Table calculation performance test results
surface 4. Quotes_Small Table calculation performance test results ( Data volume :18,314,172)
 surface 5. Quotes_Big Table calculation performance test results
surface 5. Quotes_Big Table calculation performance test results

4. I/O Performance testing

We used... Separately DolphinDB and InfluxDB Of JavaAPI, The time required to write data of the same size in the same environment is compared . because InfluxDB Default batch-size by 5000, So in two database Writers , The amount of data inserted in a single time is 5000 strip . We also compared the same size of csv The time required for the document . The result of the comparison is , Whether it's write performance or export performance ,DolphinDB All better than InfluxDB. For the same size 18,314,172 Data ,DolphinDB The export speed of is InfluxDB Of 2 times ; The writing speed is about 12 times .

The test results are shown in the table below 6 Shown . The data in the table is in milliseconds . See Appendix for test code 4.
 surface 6. I/O Performance test results
surface 6. I/O Performance test results

5. Disk footprint test

We compared Quotes_Small Table and Quotes_Big Table by DolphinDB and InfluxDB The amount of space on the disk after storage . Results show , Both databases compress and store data , The compression ratio is about the same order of magnitude , All in 20%-30% Between , but DolphinDB The compression effect is better .

The test results are shown in the table below 7 Shown , The unit of data in the table is MB.
 surface 7.  Disk footprint test results
surface 7. Disk footprint test results

6. Other aspects of comparison

DolphinDB In addition to demonstrating superior performance in benchmarking , It also has the following advantages :

(1)InfluxDB adopt InfluxQL To manipulate the database , This is a kind of SQL Language ; and DolphinDB Built in full scripting language , Not only support SQL Language , And it supports imperative 、 To quantify 、 functional 、 Metaprogramming 、RPC And other programming paradigms , More functions can be easily implemented .

(2)InfluxDB For specific file format data, for example csv There is no official support for batch import of files . Users can only read files through open source third-party tools or by themselves , Regular as InfluxDB Specified input format , Re pass API Bulk import . You can only import 5000 That's ok , Not only the operation is complex , It's also extremely inefficient . Contrast with ,DolphinDB The script language provides loadText、loadTextEx function , Users can directly import txt or csv file , And it's more efficient , More user friendly .

(3)DolphinDB Provide 400 Other built-in functions , It can satisfy the historical data modeling and real-time stream data processing in the financial field , And the real-time monitoring and data real-time analysis and processing in the field of Internet of things . Provide the leading edge in time series data processing 、 lagging 、 Accumulation window 、 Sliding window and other indicators of the function , And the performance is optimized , Excellent performance . So with InfluxDB comparison ,DolphinDB There are more applicable scenarios .

(4)InfluxDB Table join is not supported , and DolphinDB Not only table joins are supported , Also on the asof join And window join Other non simultaneous connection mode has been optimized .

(5)InfluxDB in , Grouping time series (GroupBy) The biggest unit is the week (week); and DolphinDB Supports grouping of all built-in time types , The maximum unit is month (month). So in the time series feature ,DolphinDB There is also better support .

(6)DolphinDB Transaction support when using distributed table engine , And when multiple copies of a partition are written , Ensure strong consistency .

appendix

appendix 1. Data download is a preprocessing script

Data download link :nyxdata

stay DolphinDB Data preprocessing in , The preprocessing script is as follows :

DATA_DIR = "…"
PTNDB_DIR = DATA_DIR + "/NYSETAQSeq"    
db = database(PTNDB_DIR, SEQ, 16)
def convertTime(x) {
    return string(nanotimestamp(2016.10.24).temporalAdd(x/1000, "us")).substr(0, 26)
}

//  produce  Quotes_Big.csv
Quotes = loadTextEx(db, `Quotes, , DATA_DIR+"/EQY_US_ALL_NBBO_20161024", '|')
Quotes_Big = select convertTime(time) as Time, Exchange, Symbol, Bid_Price, Bid_Size, Offer_Price, Offer_Size from Quotes
saveText(Quotes_Big, DATA_DIR + "/Quotes_Big.csv")

// produce Quotes_Small.csv
symbols = exec distinct Symbol from Quotes
symbols = symbols[0..(symbols.size() - 1) % 4 == 3]
Quotes_Small = select * from Quotes_Big where Symbol in symbols
saveText(Quotes_Small, DATA_DIR + "/Quotes_Small.csv")

//  Building databases and tables 
if (existsDatabase(PTNDB_DIR)) {
dropDatabase(PTNDB_DIR)
}

sym = `A`D`G`J`M`P`S`V`Z
db = database(PTNDB_DIR, RANGE_PTN, symbol(sym))

// Quotes_Big surface 
Quotes = loadTextEx(db, `Quotes, `Symbol, DATA_DIR + “/Quotes_Big.csv”)

// Quotes_Small surface 
Quotes = loadTextEx(db, `Quotes, `Symbol, DATA_DIR + “/Quotes_Small.csv”)
12345678910111213141516171819202122232425262728293031
 Copy code 

appendix 2. Database query performance script

 surface 8. InfluxDB Query performance test cases
surface 8. InfluxDB Query performance test cases

 surface 9. DolphinDB Query performance test cases
surface 9. DolphinDB Query performance test cases

appendix 3. Built in function calculation performance test cases
 surface 10. InfluxDB Built in function calculation performance test cases
surface 10. InfluxDB Built in function calculation performance test cases
 surface 11. DolphinDB Built in function calculation performance test cases
surface 11. DolphinDB Built in function calculation performance test cases

appendix 4. I/O Performance test routine

InfluxDBInsert.java:

package test;

import org.influxdb.InfluxDB;
import org.influxdb.InfluxDBFactory;
import org.influxdb.dto.BatchPoints;
import org.influxdb.dto.Point;
import java.util.concurrent.TimeUnit;

public class InfluxDBInsert {
    public static void main(String[] args) {
InfluxDBinfluxDB = InfluxDBFactory.connect("http://192.168.1.30:8086");
        int size = 18314172;
        long timer = 0;
        for (int i = 0; i< size; i++) {
BatchPoints.BuilderbatchBuilder = BatchPoints.database("TEST");
            for (int j = 0; j < 5000 &&i< size; i++, j++) {
Point.Builder builder = Point.measurement("TEST")
.time(System.nanoTime(), TimeUnit.NANOSECONDS)
.addField("Bid_Price", 2.5)
.addField("Bid_Size", 5)
.addField("Offer_Price", 12.5)
.addField("Offer_Size", 6)
.tag("Exchange", "B")
.tag("Sym", "ALV");
batchBuilder.point(builder.build());
            }
            long start = System.currentTimeMillis();
influxDB.write(batchBuilder.build());
            timer += (System.currentTimeMillis() - start);
        }
System.out.println(timer);
    }
}
123456789101112131415161718192021222324252627282930313233
 Copy code 

DolphinDBInsert.java:

package test;

import com.xxdb.DBConnection;
import com.xxdb.data.*;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class DolphinDBInsert {
    public static void main(String[] args) throws IOException {
        int size = 18314172;
        int range = 5000;
BasicLongVector Time = new BasicNanoTimestampVector(range);
BasicDoubleVectorBid_Price = new BasicDoubleVector(range);
BasicIntVectorBid_Size = new BasicIntVector(range);
BasicDoubleVectorOffer_Price = new BasicDoubleVector(range);
BasicIntVectorOffer_Size = new BasicIntVector(range);
BasicStringVector Exchange = new BasicStringVector(range);
BasicStringVector Symbol = new BasicStringVector(range);

DBConnection conn = new DBConnection();
conn.connect("192.168.1.30", 8888);
conn.run("if(existsDatabase('/home/psui/Desktop/workspace/data/NYSETAQ')){dropDatabase('/home/psui/Desktop/workspace/data/NYSETAQ')}");
conn.run("db = database('/home/psui/Desktop/workspace/data/NYSETAQ', RANGE, 0 5 10)");
conn.run("tbl = table(200:0,`Time`Bid_Price`Bid_Size`Offer_Price`Offer_Size`Exchange`Symbol,[NANOTIMESTAMP,DOUBLE,INT,DOUBLE,INT,STRING,SYMBOL])");
conn.run("Quotes = db.createPartitionedTable(tbl,'Quotes','Bid_Size')");

        long timer = 0;
        for (int i = 0; i< size; i++) {
            if (i + 5000 >= size) {
                Time = new BasicNanoTimestampVector(size-i);
Bid_Price = new BasicDoubleVector(size-i);
Bid_Size = new BasicIntVector(size-i);
Offer_Price = new BasicDoubleVector(size-i);
Offer_Size = new BasicIntVector(size-i);
                Exchange = new BasicStringVector(size-i);
                Symbol = new BasicStringVector(size-i);

                for (int j = 0; i< size; i++, j++) {
Time.setLong(j, System.nanoTime());
Bid_Price.setDouble(j, 2.4);
Bid_Size.setInt(j, 6);
Offer_Price.setDouble(j, 3.5);
Offer_Size.setInt(j, 2);
Exchange.setString(j, "A");
Symbol.setString(j, "ALV");
                }
            } else {
                for (int j = 0; j < 5000 &&i< size; i++, j++) {
Time.setLong(j, System.nanoTime());
Bid_Price.setDouble(j, 2.4);
Bid_Size.setInt(j, 6);
Offer_Price.setDouble(j, 3.5);
Offer_Size.setInt(j, 2);
Exchange.setString(j, "A");
Symbol.setString(j, "ALV");
                }
            }
            Map<String, Entity> map = new HashMap<>();
map.put("Time", Time);
map.put("Bid_Price", Bid_Price);
map.put("Bid_Size", Bid_Size);
map.put("Offer_Price", Offer_Price);
map.put("Offer_Size", Offer_Size);
map.put("Exchange", Exchange);
map.put("Symbol", Symbol);
            long start = System.currentTimeMillis();
conn.upload(map);
conn.run("tbl = table(Time,Bid_Price,Bid_Size,Offer_Price,Offer_Size,Exchange,Symbol)");
conn.run("Quotes.append!(tbl)");
            timer += (System.currentTimeMillis()-start);
        }
conn.run("Quotee = loadTable(db, 'Quotes')");
        Entity entity = conn.run("select count(*) from Quotee");
System.out.println(entity.getString());
System.out.println(timer);
    }
}
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
 Copy code 

Export data :

InfluxDB:

$ influx -host localhost -database Quotes -format csv -execute “” >Quotes.csv
1
 Copy code 

DolphinDB:

saveText(Quotes, DATA_DIR + "/Quotes.csv")
1
 Copy code 

Official website :www.dolphindb.com

mailbox :info@dolphindb.com

版权声明
本文为[DolphinDB]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201208095322472d.html