The Artificial Intelligence & Machine Learning Megathread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Artificial Intelligence & Machine Learning Megathread

mightygerm: Jun 29, 2002

pmchem posted:

I'm pondering the following sort of ML problem: a game/simulation with many independent non-interacting agents each acting according to the same exact model, continuous input space, continuous output space, dynamic environment, continuous (real) reward function evaluated ONLY at the end of the game/simulation (not per step in the game/simulation). Reward function cannot be used to calculate gradient of model parameters (e.g., no backprop). Assume solution is, say, a pytorch implementation of whatever flavor NN you desire.

What training strategies might you consider other than neuroevolution?

Sounds like a Q-learning or PPO problem to me. They should be able to learn a policy even when the reward function is null until the end of an episode.

# ¿ Apr 10, 2024 20:58

Adbot: ADBOT LOVES YOU

# ¿ May 21, 2024 05:54

mightygerm: Jun 29, 2002

Cyril Sneer posted:

Are there any LLM/NLP gurus in here? Asking before I make a big effort post.

I�m familiar with the use and deployment of LLMs, training one from scratch not so much.

# ¿ Apr 17, 2024 16:38

mightygerm: Jun 29, 2002

Yeah, you�re looking for a vector database. You can manually add the timestamp/video id as metadata to retrieve them alongside your sentence.

There�s a couple good open source implementations like weaviate and lancedb.

# ¿ Apr 17, 2024 20:03

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Artificial Intelligence & Machine Learning Megathread