报告人:陈志平 教授(西安交通大学)
时间:2025年04月10日 15:00-
地点:理科楼LA103
摘要:We consider a Markov Decision Process (MDP) problem where the risks arising from epistemic and aleatoric uncertainties are assessed by Bayesian composite risk (BCR) measures, the resulting model is called the BCR-MDP model. The time dependence of the risk measures allows one to capture the decision maker’s (DM) dynamic risk preferences timely as more and more information about both uncertainties is obtained. This makes the new BCR-MDP model more flexible than the traditional MDP models. Unlike the traditional MDP model where the control/action at each episode is purely based on the current state, the new model allows the control also depends on the probability distribution of the epistemic uncertainty, which reflects the fact in many practical instances that cumulative information about epistemic uncertainty often affects the DM’s belief about the future aleatoric uncertainty and hence his/her action. The new modeling paradigm subsumes a number of existing MDP models including distributionally robust MDP models and Bayes-adaptive MDP models and generates so-called preference robust MDP models. Moreover, we demonstrate finite-horizon BCR-MDP model may be solved through dynamic programming technique, and extend the discussion to the infinite-horizon case. By using Bellman equations, we show under some standard conditions, asymptotic convergence of the optimal values as the episodic variable goes to infinity. Finally, we carry out numerical tests on an infinite horizon inventory control problem and show the effectiveness of the proposed model and numerical schemes.
邀请人: 蒋杰
欢迎广大师生积极参与!