[关键词]
[摘要]
基于1970—2021年中国大陆地震目录、地震序列目录和历史地震震源机制资料,构建地震序列类型判定训练、检验特征样本数据集,将地震序列标签分为多震型、主余型、孤立型三类。采用特征递归消除-随机森林(RFE-RF)机器学习算法建立地震特征参数和地震序列类型之间的非线性映射关系,对震后3个不同时间节点的地震序列类型进行早期预测,并对特征重要性进行讨论。结果显示,数据预处理方法对模型分类性能有重要影响,同类样本中位值补齐缺失特征并采用随机重采样方法预处理可达到较高的分类预测效果。对分类结果的交叉检验结果显示,在震后1天,三类样本的总体报准率可达0.93。考察模型最优特征子集随时间的变化可见,在地震刚发生时(即缺乏地震序列资料数据的情况下),相对于传统的历史地震序列类比,主震震源机制相关参数以及主震震源机制P轴方位相对于附近区域应力场的偏差等相关参数具有更大的分类贡献率。随着震后时间的延长,序列相关特征成为地震序列类型判定的主要因素。在震后3天,在可能已发生最大余震的地震序列数据集中,主震与最大余震震级差成为判定地震序列类型的关键因素。相较于单一的随机森林(RF)模型,RFE-RF模型的在震后1天测试集中报准率提高了0.41,能够更有效地对地震序列类型加以区分。
[Key word]
[Abstract]
Utilizing the earthquake catalog from the Chinese mainland from 1970 to 2021,along with seismic sequence catalogs and historical earthquake source mechanism data,this study constructs a training and testing dataset for determining seismic sequence types. Seismic sequences are categorized into three distinct labels based on prior research:multiplet mainshocks type,mainshock-aftershock type,and isolated earthquake type. The Feature Recursive Elimination-Random Forest(RFE-RF) machine learning algorithm is employed to establish a nonlinear mapping between seismic characteristic parameters and seismic sequence types. This approach enables the early prediction of seismic sequence types at three different time nodes post-earthquake and discusses the significance of various features. The findings underscore the pivotal role of data preprocessing methods in the model's classification performance. Missing features are effectively imputed using the median value of the same sample,and the data is preprocessed using a random resampling method,yielding a high classification prediction effect. Cross-validation of the classification outcomes reveals an overall accuracy rate of 0.93 for the three types of samples one day after the earthquake. The parameters related to the main seismic source mechanism and the deviation of the P-axis azimuth from the local stress field are identified as having a greater classification contribution rate than traditional historical seismic sequence analogies at the immediate aftermath of an earthquake,i.e.,in the absence of seismic sequence data. As the post-earthquake time progresses,sequence-related features emerge as the primary determinants of the earthquake sequence type. In the seismic sequence dataset three days post-earthquake,where the maximum aftershock has occurred,the magnitude difference between the main shock and the maximum aftershock becomes a critical factor in sequence type determination. Compared to the standalone Random Forest(RF) model,the RFE-RF model demonstrates an enhanced accuracy rate of 0.41 in the test set one day after the earthquake,indicating its superior ability to distinguish between seismic sequence types.
[中图分类号]
P315
[基金项目]
地震动力学国家重点实验室开放基金(LED2022B05)资助