Finite-sample analysis for sarsa with linear function approximation S Zou, T Xu, Y Liang Advances in neural information processing systems 32, 2019 | 170 | 2019 |
Improving sample complexity bounds for (natural) actor-critic algorithms T Xu, Z Wang, Y Liang Advances in Neural Information Processing Systems 33, 4358-4369, 2020 | 115* | 2020 |
Crpo: A new approach for safe reinforcement learning with convergence guarantee T Xu, Y Liang, G Lan International Conference on Machine Learning, 11480-11491, 2021 | 100* | 2021 |
Two time-scale off-policy TD learning: Non-asymptotic analysis over Markovian samples T Xu, S Zou, Y Liang Advances in Neural Information Processing Systems 32, 2019 | 80 | 2019 |
Reanalysis of variance reduced temporal difference learning T Xu, Z Wang, Y Zhou, Y Liang arXiv preprint arXiv:2001.01898, 2020 | 45 | 2020 |
Enhanced first and zeroth order variance reduced algorithms for min-max optimization T Xu, Z Wang, Y Liang, HV Poor | 39* | 2020 |
Algorithms for the estimation of transient surface heat flux during ultra-fast surface cooling ZF Zhou, TY Xu, B Chen International Journal of Heat and Mass Transfer 100, 1-10, 2016 | 34 | 2016 |
Non-asymptotic convergence of adam-type reinforcement learning algorithms under Markovian sampling H Xiong, T Xu, Y Liang, W Zhang Proceedings of the AAAI Conference on Artificial Intelligence 35 (12), 10460 …, 2021 | 29 | 2021 |
Proximal gradient descent-ascent: Variable convergence under k {\L} geometry Z Chen, Y Zhou, T Xu, Y Liang arXiv preprint arXiv:2102.04653, 2021 | 26 | 2021 |
When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? T Xu, Y Zhou, K Ji, Y Liang arXiv preprint arXiv:1806.04339, 2018 | 23* | 2018 |
Doubly robust off-policy actor-critic: Convergence and optimality T Xu, Z Yang, Z Wang, Y Liang International Conference on Machine Learning, 11581-11591, 2021 | 22 | 2021 |
Sample complexity bounds for two timescale value-based reinforcement learning algorithms T Xu, Y Liang International Conference on Artificial Intelligence and Statistics, 811-819, 2021 | 22 | 2021 |
Faster algorithm and sharper analysis for constrained markov decision process T Li, Z Guan, S Zou, T Xu, Y Liang, G Lan arXiv preprint arXiv:2110.10351, 2021 | 19 | 2021 |
When will generative adversarial imitation learning algorithms attain global convergence Z Guan, T Xu, Y Liang International Conference on Artificial Intelligence and Statistics, 1117-1125, 2021 | 15 | 2021 |
Model-Based Offline Meta-Reinforcement Learning with Regularization S Lin, J Wan, T Xu, Y Liang, J Zhang arXiv preprint arXiv:2202.02929, 2022 | 11 | 2022 |
Provably efficient offline reinforcement learning with trajectory-wise reward T Xu, Y Wang, S Zou, Y Liang arXiv preprint arXiv:2206.06426, 2022 | 9 | 2022 |
Per-etd: A polynomially efficient emphatic temporal difference learning method Z Guan, T Xu, Y Liang arXiv preprint arXiv:2110.06906, 2021 | 7 | 2021 |
Deterministic policy gradient: Convergence analysis H Xiong, T Xu, L Zhao, Y Liang, W Zhang Uncertainty in Artificial Intelligence, 2159-2169, 2022 | 5 | 2022 |
A Unifying Framework of Off-Policy General Value Function Evaluation T Xu, Z Yang, Z Wang, Y Liang Advances in Neural Information Processing Systems 35, 13570-13583, 2022 | 2* | 2022 |
Constraint‐based multi‐agent reinforcement learning for collaborative tasks X Shang, T Xu, I Karamouzas, M Kallmann Computer Animation and Virtual Worlds, e2182, 2023 | 1 | 2023 |