Python quantitative factor measurement and plotting ultra-detailed process code

Quantitative factors are usually measured by simulating trades and calculating various metrics, which:

Third-party libraries to be used for measurement: numpy, pandas, talib
Third-party libraries needed for plotting: matplotlib, seaborn

Other libraries are added additionally as required by the policy

Factorization framework

Here the blogger shares the process he often uses when measuring and hopes to make progress with you!

The whole process from factor to return is measured as follows: strategy (factor portfolio) -> buy and sell signals -> buy and sell points -> return

We therefore measure this for each individual stock:

1. Pre-processing of stock data

First of all, here is a commonly used tool to import, including the library used for measurement and plotting library (including the picture of the Chinese display blank solution)

# For measurements
import numpy as np
import pandas as pd
from copy import deepcopy
from tqdm import tqdm
from datetime import datetime
import talib
# For drawing
import matplotlib as mpl
import  as plt
import seaborn as sns
%matplotlib inline
# Mapping Reality Chinese
()
[""] = (20,10)
['-serif'] = ['Arial Unicode MS']  # Current fonts support Chinese
['axes.unicode_minus'] = False  # Solve the problem that saving an image with a negative sign '-' is shown as a square
# Other
import warnings
("ignore")

Then there is a loop to read the code of the stock:

import os
def readfile(path, limit=None):
    files = (path)
    file_list = []
    for file in files:  # Traversing folders
        if not (file):
            file_list.append(path + '/' + file)
    if limit:
        return file_list[:limit]
    return file_list
stock_dict = {}
for _file in tqdm(readfile("../data/stock_data")):
    if not _file.endswith(".pkl"):
        continue
    # TODO Here you can add a filter if you need to add the current stock to the measured stock pool
    file_df = pd.read_pickle(_file)
    file_df.set_index(["Date"], inplace=True)
    file_df. = ""
    file_df.index = pd.to_datetime(file_df.index)
    file_df.rename(columns={'Opening':'open',"Closing.":"close","Highest.":"high","Minimum.":"low","Volume.":"volume"},inplace=True)
    stock_code = _file.split("/")[-1].replace(".pkl", '')
    # TODO Here you can add a date to intercept a part of the data
    stock_dict[stock_code] = file_df

The above section processes the stock data, and the processed data is stored in the stock_dict variable, where the key is the stock code and the value is the stock data.

2. Indicator measurement

When measuring metrics, let's take a stock as an example:

for _index,_stock_df in tqdm(stock_dict.items()):
    measure_df = deepcopy(_stock_df)

In the code:

Here measure_df is the dataframe data to be measured
The use of deepcopy is to prevent the measurement process from affecting the original data.

We can then cycle through each line of this one stock (representing each day) and measure the trading rules as follows:

Buy rule: buy signal issued & no current position, then buy
Sell rules: Sell signals issued & current position, then sold

# Beginning to measure
trade_record_list = []
this_trade:dict = None
for _mea_i, _mea_series in measure_df.iterrows(): # Cycle every day
    if signal a buy (i.e. commit to buying):
        if this_trade is None:  # No current position, then buy
            this_trade = {
                "buy_date": _mea_i,
                "close_record": [_mea_series['close']],
            }
    elif signal to sell:
        if this_trade is not None:  # To execute a sell
            this_trade['sell_date'] = _mea_i
            this_trade['close_record'].append(_mea_series['close'])
            trade_record_list.append(this_trade)
            this_trade = None
    else:
        if this_trade is not None:  # Currently have positions
            this_trade['close_record'].append(_mea_series['close'])

In the above code, we have saved every complete transaction (buy->hold->sell), in the trade_record_list variable, and every complete transaction is recorded:

{
    'buy_date': Timestamp('2015-08-31 00:00:00'), # Time to buy
    'close_record': [41.1,42.0,40.15,40.65,36.6,32.97], # Record of closing prices
    'sell_date': Timestamp('2015-10-12 00:00:00')} # Time to sell
    # TODO can also add metrics for custom records
}

3. Organization of measurements

Use (trade_record_list) directly to see the total transaction results:

The process of organizing is also relatively simple and independent of cycling through this trade and then calculating the desired metrics, such as the annualized return of a single trade can be used:

trade_record_df = (trade_record_list)
for _,_trade_series in trade_record_df.iterrows():
    trade_record_df.loc[_i,'Annualized rate of return'] = (_trade_series['close_record'][-1] - _trade_series['close_record'][0])/_trade_series['close_record'][0]/(_trade_series['sell_date'] - _trade_series['buy_date']).days * 365 # Annualized returns
    # TODO Add more metrics here based on your desired results

4. Mapping of results

The code for plotting is usually more fixed, such as a win rate plot:

# Clear the drawing cache
()
()
# Start plotting
(figsize=(10, 14), dpi=100)
# Use seaborn to chart wins
fig = ((total_measure_record).(2), annot=True, cmap="RdBu_r",center=0.5)
("Winning percentage chart.")
scatter_fig = fig.get_figure()
# Saved locally
scatter_fig.savefig("Winning percentage chart.")
scatter_fig.show() # Last shown

To this article on the Python quantitative factor measurement and plotting of ultra-detailed process code is introduced to this article, more related Python quantitative factor measurement content please search my previous articles or continue to browse the following related articles I hope that you will support me more in the future!