当前位置：网站首页>I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
I'm afraid that the spread sequence calculation of arbitrage strategy is not as simple as you think
2020-11-06 01:18:00 【itread01】
More exciting content , Welcome to the public account ： Quantity technology house . Want to get the complete policy code shared in this issue , Please add wechat technology house ：sljsz01
The price difference is calculated “ Misunderstandings ”
When we test the strategic signals generated by the mutual operation of two or more financial assets , Inevitably need to involve different price time series , Align with the timeline , Arbitrage is one of them . However , Most of them introduce arbitrage strategies 、 Statistical arbitrage articles , For the generating calculation of price difference series , It's very simple to handle , It's basically the subtraction of two time series . For lower frequency signals , This is not a big deal , But in the field of medium and high frequency signals , Direct subtraction , There will be some problems .
This is because , For different asset price series , There is an exchange push time 、 And the difference in arrival time . Even if we see two back testing Tick The timestamp of is exactly the same , When the real offer server receives the push quotation , Also according to the first 、 In the latter order . We found in the actual transaction that , For example, Shanghai futures exchange for a variety of different delivery month contracts , The push of stock exchange in slicing data is not simultaneous , It's pushed in the order of delivery months , For example, according to RB2010、RB2101、RB2015, Push in this order , The same is true for other varieties , And for the same 500ms Within the slicing time of , received RB2010、RB2101、RB2015 Of Tick The timestamp of the data , It's the same .
Another example is the cross exchange arbitrage of digital currency , Even if the two exchanges transmit at the same time Tick Information , Due to the physical location of the exchange server, the transmission time is different , The probability of arriving at our strategy signal computing server will be different .
A typical example of different frequency of price arrival
If the market information arrival time has priority , There will be a certain amount of price difference calculated by direct subtraction “ Lag ” or “ The future function ” The question is , The frequency of price arrival is different , Then we can't directly subtract the price difference . All in all , We need a more realistic spread calculation method .
Let's look at an example of different frequency of price arrival , In other words, the push frequency of the two varieties is different . If we need stock index futures 、 Stocks ETF Carry out the design of arbitrage strategy , With IC And Zhongzheng 500ETF For example , Calculate the spread of cash arbitrage .
IC Stock index futures Tick Information , Our sources are Wind,IC The corresponding CICC , Its market push frequency is every 1 second 2 Pen data ,Level1 Free market push is 1 The mouth of the gear plate , That is to say, only buy 1、 Sell 1 Information about , Data time is the trading time of stock index futures ：9:29-15:00. Let's take a look at IC Of Tick Sample data .
Let's take a look at the Chinese securities certificate 500ETF Information about , It also comes from Wind,500ETF Market information push frequency comparison IC Much lower , Every time 3 There will be 1 Pen data ,Level1 Free market has 5 Plate mouth of gear , Buy it now 1 To buy 5、 Sell 1 To sell 5, Data push time ：9:15-15:00, Include the call auction period of the stock . Let's take a look at 500ETF Of Tick Sample data .
Skillfully use Pandas Of Merge Function
For this push frequency is different 、 There are also different data on the timeline , Calculate the spread , We need to synthesize according to the time axis .Python Pandas Ku's Merge Function , Exactly what we need . Let's briefly introduce Merge Function .
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,left_index=False, right_index=False, sort=True,suffixes=('x', 'y'), copy=True, indicator=False,validate=None)
When we do data synthesis , The most commonly used one is before 4 Group arguments ：
left: The left side of the splice DataFrame thing
right: Right side of the splice DataFrame thing
on: The name of the column or index level to be added . It has to be on the left and right DataFrame Found in the object , For financial time series , Generally speaking, it's a timeline
how: One of ‘left’, ‘right’, ‘outer’, ‘inner’, Presupposition inner.inner It's the intersection ,outer Take Union . such as left：[‘A’,‘B’,‘C’];right[’'A,‘C’,‘D’];inner If you take the intersection ,left That's what happened A Will be with right Buy one A Match and splice , If not, it's B, stay right There is no match in , You lose .'outer’ Take Union , What appears is A It will match one by one , If not, the missing value will be added to the missing part .
And this 4 Group arguments , The preprocessing of arbitrage spread calculation ,how Fields are the most important . We use actual data , Look at the difference how The value of the field , Will calculate the final price difference , What kind of impact .
First ,how = “inner”, Take the intersection of the timeline , There are only two tables DATETIME All of us have time , Will appear in the final summary table . Let's show you the calculated summary table , And calculate the price difference sequence and draw .
secondly ,how = “outer”, Take the union of time axes , Just two watches DATETIME Make any list. Some time , It's going to show up in the final matrix , If the other table has no data , Press nan Value padding .
Because of outer The way data is processed , There is a lot of nana value , We can't calculate the spread directly , The usual processing method is to forward fill null data , About to nan Values are filled with the nearest non null value , Recalculate the future （ Middle price ） The price difference , And draw .
Again ,how = 'left', Merge according to the left time axis . Click on the left table (IC) The timeline matches the right table one by one , The time axis of the left table is reserved , The right table has the time of , It is incorporated into the general table , There is no such time in the right table , With nan Instead of .
Also need to forward fill null data , Then we can calculate the future （ Middle price ） The price difference .
Finally ,how = 'right', Merge according to the time axis in the right table . Click on the right table (500ETF) The timeline of is matched with the left table one by one , The timeline in the right table is reserved , The left watch has the time of , It is incorporated into the general table , The left watch doesn't change the time , With nan Instead of .
Due to the frequency of futures information compared to stocks ETF Higher ,nan It mainly appears in the earlier stage of stock call auction than futures , This part nan Data may be deleted as appropriate .
We combine the charts drawn by different price spread calculation methods , You can see , Top left how="inner" The picture of , Points are the most sparse , Because both prices need to be available at the same time , To calculate the price difference ; And the top right how="outer" The picture of , Price differentials are the most intensive , Just one set of price changes , It will calculate 1 Second price difference , And the two pictures below how="left"、how="right", The intensity is in between .
The price difference is calculated in different ways , Bring about the difference in the way of strategy driven
The price difference is calculated in different ways , On the face of it Merge Function selection how The arguments to are different , The resulting price difference series results are different . But it's different how The choice of arguments , In fact, there are different strategies behind it 、 Strategic logic .
No matter in the backtesting of strategy , Treat market information , All need to adopt one kind of “ Event driven ” The way to test , This is the most close to the back test of real trading . Let's assume that historical data is also like a firm offer , Every time a new data is generated , Give it to us once , And every time we get a new piece of information , It's a new event , This event drives the subsequent strategy signal calculation , And the judgment of the opening and closing conditions corresponding to the signal .
Let's go back to different ways of calculating the spread , Its corresponding , In fact, it's the different driving strategies .
how=‘outer’： The corresponding is futures 、 The concurrent driving of stock market , As long as there are stocks 、 Any update of futures information , Our program updates the spread , To determine whether a trade signal is triggered , Now the signal is calculated and triggered , Most frequently .
how = 'left'： The corresponding is the futures market single drive , That is, we don't care whether the stock market reaches , As long as the futures information is updated , The stock price spread is calculated by combining the latest information stored , And determine whether to trigger a trading signal .
how = 'right'： A single drive corresponding to the stock market , That is, we don't care whether the futures market reaches or not , As long as the stock information is updated , Futures combine the latest information stored to calculate the spread , And determine whether to trigger a trading signal ,left and right How to trigger , The signal is not as good as outer Frequently .
how=‘inner’： The corresponding is futures 、 The stock market is driven both ways , We are generally back testing 、 We don't use this way in any firm offer , In the first section of this article , I introduced to you , It is basically impossible for the market to arrive at the same time , This drive is too idealistic , It will also reduce a lot of trading opportunities .
The driving mode that the firm offer should choose
To sum up , We're back testing 、 The alternative way to trade , It can be divided into two categories ： Concurrent driver of two-way market 、 One way market driving . So , These two kinds of different driving methods , How to choose ？
The author according to the statistical arbitrage strategy of real trading experience , Here are some suggestions ：
Two types of assets that calculate the spread , There is a clear distinction of activity 、 Subordination ： For example, the near and far months of Futures （ The trading activity of contracts in recent months is usually greater than that in distant months ）、 Futures arbitrage of stocks and stock index futures （ Stock index futures have a price discovery effect on stock spot ） etc. , This should be a time of active trading 、 Varieties with leading role , As the main driver , Driven by a single market .
Two types of assets that calculate the spread , There is no clear distinction 、 Subordination ： For example, cross exchange arbitrage of digital currency （OKEX、 Arbitrage between fire currency exchanges , It's quite active , The relationship is equal ）, Can adopt the concurrent drive of two-way market , To capture more trading opportunities .
Once the drive mode is determined , In data merging 、 Back testing 、 And the development of the real offer trading system , All need to be driven in the same way , In order to ensure the consistency between the back test results and the real offer transaction to the maximum extent .
If you share this Python Code is interested in , Welcome to add wechat ：sljsz01, Communicate with me
Previous dry goods sharing recommended reading
【 Quantity technology house | Quantitative investment strategy series share 】 Futures position following strategy of mature traders
How to get free digital currency history information
【 Quantity technology house | Quantitative investment strategy series share 】 Multi period resonant trading strategy
【 Quantity technology house | Financial data analysis series sharing 】 Why is it that the evidence is in evidence 500（IC） Is the most suitable index for long-term long
It's not easy to get the spot information ？ The seasonality of goods is hard to track ？ One click solution to the trouble free Python Crawler sharing
【 Quantity technology house | Financial data analysis series sharing 】 How to copy bottom commodity futures correctly 、 Commodities
【 Quantity technology house | Quantitative investment strategy series share 】 Stock index futures IF Minute volatility statistics strategy
【 Quantity technology house | Python Crawler series sharing 】 Real time monitoring of major stock market announcements Python
- C++ 数字、string和char*的转换
- Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
- C + + number, string and char * conversion
- C + + Learning -- capacity() and resize() in C + +
- C + + Learning -- about code performance optimization
C + + programming experience (6): using C + + style type conversion
Latest party and government work report ppt - Park ppt
Online ID number extraction birthday tool
Field pointer? Dangling pointer? This article will help you understand!
GVRP of hcna Routing & Switching
- LeetCode 91. 解码方法
- Seq2seq implements chat robot
- [chat robot] principle of seq2seq model
- Leetcode 91. Decoding method
- HCNA Routing＆Switching之GVRP
- GVRP of hcna Routing & Switching
- HDU7016 Random Walk 2
- [Code+＃1]Yazid 的新生舞会
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- HDU7016 Random Walk 2
- [code + 1] Yazid's freshman ball
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- Qt Creator 自动补齐变慢的解决
- HALCON 20.11：如何处理标定助手品质问题
- HALCON 20.11：标定助手使用注意事项
- Solution of QT creator's automatic replenishment slowing down
- Halcon 20.11: how to deal with the quality problem of calibration assistant
- Halcon 20.11: precautions for use of calibration assistant
- "Top ten scientific and technological issues" announced| Young scientists 50 ² forum
- Reverse linked list
- JS data type
- Remember the bug encountered in reading and writing a file
- Singleton mode
- 在这个 N 多编程语言争霸的世界，C++ 究竟还有没有未来？
- In this world of N programming languages, is there a future for C + +?
- js Promise
- js 数组方法 回顾
- ES6 template characters
- js Promise
- JS array method review
- 【Golang】️走进 Go 语言️ 第一课 Hello World
- [golang] go into go language lesson 1 Hello World