|
pmchem posted:I'm pondering the following sort of ML problem: a game/simulation with many independent non-interacting agents each acting according to the same exact model, continuous input space, continuous output space, dynamic environment, continuous (real) reward function evaluated ONLY at the end of the game/simulation (not per step in the game/simulation). Reward function cannot be used to calculate gradient of model parameters (e.g., no backprop). Assume solution is, say, a pytorch implementation of whatever flavor NN you desire. Sounds like a Q-learning or PPO problem to me. They should be able to learn a policy even when the reward function is null until the end of an episode.
|
# ¿ Apr 10, 2024 20:58 |
|
|
# ¿ May 21, 2024 05:54 |
|
Cyril Sneer posted:Are there any LLM/NLP gurus in here? Asking before I make a big effort post. I’m familiar with the use and deployment of LLMs, training one from scratch not so much.
|
# ¿ Apr 17, 2024 16:38 |
|
Yeah, you’re looking for a vector database. You can manually add the timestamp/video id as metadata to retrieve them alongside your sentence. There’s a couple good open source implementations like weaviate and lancedb.
|
# ¿ Apr 17, 2024 20:03 |