Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
overeager overeater
Oct 16, 2011

"The cosmonauts were transfixed with wonderment as the sun set - over the Earth - there lucklessly, untethered Comrade Todd on fire."



Insurrectionist posted:

- What kinds of network designs are favored for generative NNs? Is there a focus of width (nodes per layer) over depth (# of layers) or vise versa? Do they utilise input-reducing layers like CNNs (pooling etc)?

It depends on the application, but for language models, BAD AT STUFF has it right that most have settled on decoder-only transformers as the architecture of choice. (e.g. Mistral, Llama and Gemma)

For generative tasks with images, diffusion models are where it's at (but people are still working with GANs). The diffusion denoising is typically done with the venerable U-Net, which does have the CNN-pooling layer structure you're probably familiar with.

Insurrectionist posted:

- What kind of activation functions are modern generative AIs using? Is it still plain old Sigmoid and the like? I remember we used ReLU a decent amount but I can't imagine that would be useful for anything more complex than the data analysis we were working with since it is hardly a sophisticated function.

The language models linked above generally pick one of the GLU variants. However, there's at least one ICLR 2024 paper which argues that plain ReLU is actually completely fine.

Insurrectionist posted:

- I remember most of the hardware bottleneck for neural networks was for training the NN, and not running the completed model. How advanced are the various free, locally runnable generative NNs nowadays? What kind of hardware so you need to run them?

The local LLMs are surprisingly good, is what I would say - they generate coherent answers and are pretty nifty for their size, but emphatically - don't expect a full ChatGPT replacement

Loading 7B language models as-is requires a GPU with 16 GB of VRAM, but with quantization you can get away with 8 GB.

(Comedy option: OnnxStream is set up to use the least memory necessary, letting you run Stable Diffusion XL on a Raspberry Pi Zero 2, albeit at the cost of requiring hours to generate an image)

Adbot
ADBOT LOVES YOU

overeager overeater
Oct 16, 2011

"The cosmonauts were transfixed with wonderment as the sun set - over the Earth - there lucklessly, untethered Comrade Todd on fire."



(e: posted before I saw the reply, whoops)

I would also start looking at the audio end first - in addition to the interface itself, there are different audio backends in Windows (MME, DirectSound, WASAPI), if you happen to be using PyAudio it's worth checking which one is actually being used.

Alternatively, to rule out anything happening in the model itself, try setting up the PyTorch profiler and compare the time spent in inference.

e2: Is the Windows version running natively or under WSL? My other thought was the Linux version possibly taking advantage of Triton or similar and falling back to a slower implementation on Windows

overeager overeater fucked around with this message at 03:58 on Mar 19, 2024

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply