Skip to content

zinchse/ts-is-fresh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

🍹 TS IS FRESH

Time Series Importance based Selection and FeatuRe Extraction on basis of Scalable Hypothesis tests.

The algorithm incorporates a combination of feature selection based on their importance and feature generation using the [tsfresh] lib. By adopting this approach, I achieved on real data from the stock exchange the increase in prediction accuracy by 12%, while improving model performance with a speedup of 24%.

🗺️ Overview

At the first step, the algorithm tries to understand which features of one time series can be useful. To do this, it generates a huge number of statistical features using the tsfresh library. Then it selects them using statistical hypotheses and feature importance values. All values are calculated using block Cross-Validation schema.

image

At the second step, the algorithm uses information about which features were selected from the previous stage. For all correlated and available time series (other currencies on the exchange), these features are also calculated. After that, they also go through two stages of selection - statistical and selection based on importance values.

image

📊 Results

Compared to the situation where we only use target currency data, we have the 24% speedup and 12% increase in accuracy!

Time (s) RMSE (mean)
only target table 1.3 0.118
with the features of other tables 10.04 0.096
with selected features of other tables 1.0 0.104

🚀 Quick Start

Open In Colab

The [dataset] size has order of several hundred million records. To reproduce my result You can extract it in data/raw folder and use .ipynb from /notebooks.

About

*toy* feature selection pipeline for HFT

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages