曾妮,陳俊豪,傅清爽
摘要:為解決單目標玩家在僅知道當天的天氣狀況下如何規(guī)劃最佳行動策略的問題,提出一種基于貪心算法的動態(tài)規(guī)劃策略。通過分析單目標玩家的狀態(tài)轉(zhuǎn)移過程,提出基于Floyd算法得出最短路徑以及貪心算法的最優(yōu)后續(xù)決策期望方法,分析最終收益的期望值,從而選擇一種最佳行動策略,并通過蒙特卡洛模擬對天氣進行隨機模擬,將出現(xiàn)概率最大的視為最佳路線進行對比檢驗。分析結(jié)果表明:該策略能夠使玩家在一般情況的未知天氣組合下選擇出最佳行動路線,使得最終資金收益值達到最大。
關(guān)鍵詞:動態(tài)規(guī)劃模型;蒙特卡洛模擬;貪心算法;Floyd算法;決策模型
中圖分類號:TP391.9? ? ?文獻標識碼:A
文章編號:1009-3044(2021)20-0141-03
Dynamic Programming Strategy Based on Greedy Algorithm
ZENG Ni1, CHEN Jun-hao2, FU Qing-shuang3
(1.School of Science, Jiangxi University of Science and Technology, Ganzhou 341000,China;2.School of Civil and Surveying Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China; 3.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China)
Abstract: To solve the problem of single target player under just know the day's weather conditions due to the problem of how to plan the best strategies in this paper, a dynamic planning strategy based on greedy algorithm, through the analysis of the status of the single target player transfer process resource state function model is established and the optimal decision model of funds, it is concluded that the shortest path based on Floyd algorithm, and the optimal expected follow-up decision-making method based on greedy algorithm, the analysis of the subsequent decisions ultimately earnings expectations, to choose a best course of action strategy, and through monte carlo simulation to stochastic simulation of the weather, will be regarded as the best route with the highest probability compared test The analysis results show that this strategy enables the player to choose the best course of action under the general circumstance of unknown weather combination, which makes the final capital gain reach the maximum.
Key words: dynamic programming model; monte carlo simulation; greedy algorithm; floyd algorithm;decision-making mode
1 引言
近年來,越來越多的探險家為了領(lǐng)略沙漠壯觀的景色以及對自己毅力的考驗進而選擇徒步穿越沙漠,為了更方便地對探險家行走方式進行研究,將此過程模擬成一款穿越沙漠的小游戲,從沙漠的起點出發(fā)前往所規(guī)劃的終點過程,會受到多種因素的限制,而探險家穿越沙漠希望能夠在預(yù)計時間內(nèi)到達終點且此過程花費的成本最少,因此途中如何進行決策將面臨挑戰(zhàn)。
程凱等[1]通過將地圖數(shù)字化后,通過歷遍前往礦山以及村莊的所有路徑,從中得到在天氣已知的情況下第一關(guān)和第二關(guān)的最優(yōu)解,其次,在天氣未知的情況下,通過最大似然估計得到未來天氣的分布函數(shù)來預(yù)測未來天氣,但具體最佳行動策略仍未得出確切解。臧洋等[2]根據(jù)Bellman-Ford算法和最短路的思想,通過確定目標函數(shù)和約束條件,搭建線性規(guī)劃模型,得到在天氣已知的情況下每種情況的最優(yōu)策略,但對于單個玩家在天氣未知的情況下沒有給出具體的分析。
筆者基于貪心算法在單目標玩家僅知當天天氣狀況下,對比得出最優(yōu)后續(xù)決策期望的選擇策略,并通過蒙特卡洛模擬對天氣進行隨機模擬,將出現(xiàn)概率最大的視為最佳路線進行對比檢驗,驗證了該種選擇策略方法的可行度。
2 模型建立
2.1 問題提出