Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
mightygerm
Jun 29, 2002



pmchem posted:

I'm pondering the following sort of ML problem: a game/simulation with many independent non-interacting agents each acting according to the same exact model, continuous input space, continuous output space, dynamic environment, continuous (real) reward function evaluated ONLY at the end of the game/simulation (not per step in the game/simulation). Reward function cannot be used to calculate gradient of model parameters (e.g., no backprop). Assume solution is, say, a pytorch implementation of whatever flavor NN you desire.

What training strategies might you consider other than neuroevolution?

Sounds like a Q-learning or PPO problem to me. They should be able to learn a policy even when the reward function is null until the end of an episode.

Adbot
ADBOT LOVES YOU

mightygerm
Jun 29, 2002



Cyril Sneer posted:

Are there any LLM/NLP gurus in here? Asking before I make a big effort post.

I’m familiar with the use and deployment of LLMs, training one from scratch not so much.

mightygerm
Jun 29, 2002



Yeah, you’re looking for a vector database. You can manually add the timestamp/video id as metadata to retrieve them alongside your sentence.

There’s a couple good open source implementations like weaviate and lancedb.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply