本文排版定稿已在中国知网网络首发,如需阅读全文请打开知网首页,并搜索该论文题目即可查看。

基于Boruta特征筛选与Optuna超参数优化的杉木人工林生物量机器学习模型构建

Construction of Machine Learning Models for Biomass of Chinese Fir Plantations Based on Boruta Feature Selection and Optuna Hyperparameter Optimization

  • 摘要: 以广东省第8次森林资源连续清查固定样地中的杉木人工林为研究对象,采用Boruta算法筛选林分因子与气候因子,分别用Optuna与Random search进行超参数优化,构建随机森林(RF)、支持向量回归(SVR)、人工神经网络(ANN)模型,模型性能通过R2、RMSE和MAE评估。结果表明:仅使用Boruta筛选出的N、D、H、P、Age建立模型时,所有模型的R2均高于0.76;筛选出11个气候因子,其中与温度有关的有8个因子,与降水量有关的有3个因子。在优选林分因子的基础上加入全部气候因子后,模型性能进一步提升,其中RF模型提升最显著;使用优选林分因子和优选气候因子建立的模型效能,均优于使用优选林分因子和全部气候因子的模型效能,其中,ANN模型的性能提升最为显著。将优选林分因子和优选气候因子建立的模型进行超参数优化,Optuna优化效果均优于Random search。经Boruta筛选和Optuna优化后,ANN模型在调优后实现最大相对改进,RF模型展现出最优的预测性能(R2=0.9271),模型性能排序从高到低为RF、ANN、SVR。

     

    Abstract: Using Cunninghamia lanceolata plantation data from permanent sample plots of the 8th National Forest Inventory (NFI-8) in Guangdong Province as the research subject, the Boruta algorithm was employed to screen stand factors and climate factors. Hyperparameter optimization was performed using Optuna and Random Search (for comparison) to construct Random Forest (RF), Support Vector Regression (SVR), and Artificial Neural Network (ANN) models. Model performance was evaluated using R2, RMSE, and MAE. When models were built using only the Boruta-selected stand factors N, D, H, P, and Age, the R2 for all models exceeded 0.76. A total of 11 climate factors were retained, among which 8 were temperature-related and 3 were precipitation-related. Adding all climate factors to the selected stand factors further improved model performance, with the RF model showing the most significant improvement. Models built using the selected stand factors and selected climate factors outperformed those using the selected stand factors and all climate factors, with the ANN model demonstrating the most notable performance gain. When models constructed with the selected stand and climate factors underwent hyperparameter optimization, Optuna consistently yielded better results than Random Search. After Boruta selection and Optuna optimization, the ANN model achieved the greatest relative improvement post-tuning, while the RF model exhibited the best predictive performance (R2 = 0.9271). The overall model performance ranking from highest to lowest was RF, ANN, SVR.

     

/

返回文章
返回