|
QuarkJets posted:I peer review AI/ML papers for several journals, AMA what AI/ML research do you publish, personally?
|
# ¿ Feb 10, 2022 13:33 |
|
|
# ¿ May 21, 2024 08:54 |
|
I just ran across this: https://algorithmsbook.com/files/dm.pdf via https://news.ycombinator.com/item?id=31123683 “Algorithms for Decision Making”. The pdf will always be free. Looks like a nice survey. Does anyone else have free (not pirated) e-pdf algorithm books to recommend? I like to collect them for my little PDF library, covering a range of topics. HPC/Scientific computing, AI/ML, basic comp sci, computational geometry, etc.
|
# ¿ Apr 26, 2022 15:00 |
|
bob dobbs is dead posted:mediocre book you've actually read and did problems from beats great book you hoard in your pdf trove any day of the week While I don’t disagree, I’ve already done my homework years ago and actively work. Some of us still like to keep a personal library up to date.
|
# ¿ Apr 26, 2022 16:29 |
|
bob dobbs is dead posted:so have i and so do i, altho this 'relevance eng' job just involves normal software dev nowadays. still applies imo That’s great. Care to share any resources you’ve found useful?
|
# ¿ Apr 26, 2022 16:38 |
|
I'm pondering the following sort of ML problem: a game/simulation with many independent non-interacting agents each acting according to the same exact model, continuous input space, continuous output space, dynamic environment, continuous (real) reward function evaluated ONLY at the end of the game/simulation (not per step in the game/simulation). Reward function cannot be used to calculate gradient of model parameters (e.g., no backprop). Assume solution is, say, a pytorch implementation of whatever flavor NN you desire. What training strategies might you consider other than neuroevolution?
|
# ¿ Apr 10, 2024 20:30 |
|
|
# ¿ May 21, 2024 08:54 |
|
mightygerm posted:Sounds like a Q-learning or PPO problem to me. They should be able to learn a policy even when the reward function is null until the end of an episode. yeah, was considering NAF Q-learning for the continuous space but I thought that required loss/reward at each iteration, not just end of an episode (see algo 1 in Gu). guess I'll poke at other variants. ultrafilter posted:You might look into Bayesian optimization as an alternative to any RL-based approach. already solved things that way anyone have a favorite actor/critic approach for delayed rewards?
|
# ¿ Apr 10, 2024 22:18 |