Temporal Relational Ranking for Stock Prediction中提供的数据预处理方法,在测试集中也是采用全局max和min进行特征标准化,会造成信息泄露?
pri_min = np.min(selected_EOD[begin_date_row:, 4]) price_max = np.max(selected_EOD[begin_date_row:, 4]) print(self.tickers[stock_index], 'minimum:', pri_min, 'maximum:', price_max, 'ratio:', price_max / pri_min) if price_max / pri_min > 10: print('!!!!!!!!!') mov_aver_features = mov_aver_features / price_max
并且采用offset划分训练集和测试集?
def get_batch(self, offset=None): if offset is None: offset = random.randrange(0, self.valid_index) seq_len = self.parameters['seq'] mask_batch = self.mask_data[:, offset: offset + seq_len + self.steps] mask_batch = np.min(mask_batch, axis=1) return self.eod_data[:, offset:offset + seq_len, :], \ np.expand_dims(mask_batch, axis=1), \ np.expand_dims( self.price_data[:, offset + seq_len - 1], axis=1 ), \ np.expand_dims( self.gt_data[:, offset + seq_len + self.steps - 1], axis=1 )
Temporal Relational Ranking for Stock Prediction中提供的数据预处理方法,在测试集中也是采用全局max和min进行特征标准化,会造成信息泄露?
pri_min = np.min(selected_EOD[begin_date_row:, 4]) price_max = np.max(selected_EOD[begin_date_row:, 4]) print(self.tickers[stock_index], 'minimum:', pri_min, 'maximum:', price_max, 'ratio:', price_max / pri_min) if price_max / pri_min > 10: print('!!!!!!!!!') mov_aver_features = mov_aver_features / price_max并且采用offset划分训练集和测试集?
def get_batch(self, offset=None): if offset is None: offset = random.randrange(0, self.valid_index) seq_len = self.parameters['seq'] mask_batch = self.mask_data[:, offset: offset + seq_len + self.steps] mask_batch = np.min(mask_batch, axis=1) return self.eod_data[:, offset:offset + seq_len, :], \ np.expand_dims(mask_batch, axis=1), \ np.expand_dims( self.price_data[:, offset + seq_len - 1], axis=1 ), \ np.expand_dims( self.gt_data[:, offset + seq_len + self.steps - 1], axis=1 )