Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
ultrafilter
Aug 23, 2007

It's okay if you have any questions.


I'm a practicing data scientist looking to improve the state of the art both in industry and academia. I can't speak to frameworks or systems too much, but I can say a lot about how to manage AI/ML products, how this discipline fits into data science as a whole, and how data science fits into an organization's strategy. I also do research on areas related to AI/ML although I don't publish in any of the big conferences (and I can tell you all about the horrible problems with that system).

I'm glad to see this thread up, and I'm going to toss in links to the SAL data science thread as well as a collection of papers I've been curating during the pandemic.

Adbot
ADBOT LOVES YOU

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Forgot to mention there's also a scientific programming thread in SAL that may have some overlap with topics discussed here.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


The key is not to use information from the test set in your feature engineering. Do your split first, and then do feature engineering.

If your data has a hierarchical structure, it's best to split at the highest level of the hierarchy. For example, if you have an ad data set where each record represents an ad but there are multiple ads for a given campaign, you run the risk of leakage if you include data from a single campaign in the training and test set.

Beyond that, the authors of Applied Predictive Modeling wrote a book on feature engineering.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


I've never seen a systematic treatment of dealing with hierarchical data in ML. There are papers on how to convert individual classifiers to deal with it, but not much on general theory.

For clustering, I'd try to create a record for each city with features generated from the data based on some meaningful domain knowledge and try to cluster those. But clustering is a more tricky problem than people generally realize, so I'd also spend some time thinking about what I want out of the data.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


If you're not worried about dataset shift (but you should be).

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


I know what you're talking about but I think it's better described as spurious correlations rather than data leakage. Data leakage is something you did wrong, but a spurious correlation is just there in the data set regardless of what you do. There's some very recent research on how to handle that (search for "invariant risk minimization") but it's still a big issue.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Boris Galerkin posted:

I'm just not really sure how this is any different from newton's method or like basically any numerical method that minimizes an objective/loss function.

The goal in traditional optimization is to fit the points you have. The goal here is to fit the points you don't have. There are a lot of methods in common but the problems are fundamentally different.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


That's entirely plausible. ChatGPT is trying to predict what a human would say in response to the prompt and it probably has a lot of examples of that response in its training data.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


https://twitter.com/stokel/status/1726502623967392060

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Nektu posted:

I read something that apparently generative AI is used to recognize diseases in very early stages, and I realized that I have no idea at all how AI can be applied to problems, and why it returns result.

Can you link us to what you read?

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


If a generative AI produces images that are similar to a target image, how strongly does that suggest that the target image is AI-generated?

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Replication crisis in AI/ML when?

Adbot
ADBOT LOVES YOU

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


You might look into Bayesian optimization as an alternative to any RL-based approach.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply