|
Insurrectionist posted:- What kinds of network designs are favored for generative NNs? Is there a focus of width (nodes per layer) over depth (# of layers) or vise versa? Do they utilise input-reducing layers like CNNs (pooling etc)? It depends on the application, but for language models, BAD AT STUFF has it right that most have settled on decoder-only transformers as the architecture of choice. (e.g. Mistral, Llama and Gemma) For generative tasks with images, diffusion models are where it's at (but people are still working with GANs). The diffusion denoising is typically done with the venerable U-Net, which does have the CNN-pooling layer structure you're probably familiar with. Insurrectionist posted:- What kind of activation functions are modern generative AIs using? Is it still plain old Sigmoid and the like? I remember we used ReLU a decent amount but I can't imagine that would be useful for anything more complex than the data analysis we were working with since it is hardly a sophisticated function. The language models linked above generally pick one of the GLU variants. However, there's at least one ICLR 2024 paper which argues that plain ReLU is actually completely fine. Insurrectionist posted:- I remember most of the hardware bottleneck for neural networks was for training the NN, and not running the completed model. How advanced are the various free, locally runnable generative NNs nowadays? What kind of hardware so you need to run them? The local LLMs are surprisingly good, is what I would say - they generate coherent answers and are pretty nifty for their size, but emphatically - don't expect a full ChatGPT replacement Loading 7B language models as-is requires a GPU with 16 GB of VRAM, but with quantization you can get away with 8 GB. (Comedy option: OnnxStream is set up to use the least memory necessary, letting you run Stable Diffusion XL on a Raspberry Pi Zero 2, albeit at the cost of requiring hours to generate an image)
|
# ¿ Mar 14, 2024 17:21 |
|
|
# ¿ May 17, 2024 18:00 |
|
(e: posted before I saw the reply, whoops) I would also start looking at the audio end first - in addition to the interface itself, there are different audio backends in Windows (MME, DirectSound, WASAPI), if you happen to be using PyAudio it's worth checking which one is actually being used. Alternatively, to rule out anything happening in the model itself, try setting up the PyTorch profiler and compare the time spent in inference. e2: Is the Windows version running natively or under WSL? My other thought was the Linux version possibly taking advantage of Triton or similar and falling back to a slower implementation on Windows overeager overeater fucked around with this message at 03:58 on Mar 19, 2024 |
# ¿ Mar 19, 2024 03:52 |