Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
skybolt_1
Oct 21, 2010
Fun Shoe
Posting in here in the hope that some of you might be able to help me out with some issues I've been running into with both BOINC and FAH. I have been flipping back and forth between BOINC and FAH trying to figure out the best way that I can utilize my new-to-me Geforce RTX 2060 when I am not using my computer. This seems to be a lot to ask, because what I have found with both FAH and BOINC is that neither platform seems to understand the concept of "run only when I am not using my computer, and don't permanently prevent the machine from entering sleep mode after 4 hours of inactivity." I am unable to get BOINC to honor the first requirement; it will run GPUGRID tasks regardless of whether or not the machine is in use, even though "Suspend GPU computing when the computer is in use" is selected. I was unable to get FAH to honor the second requirement; the machine would never, EVER enter sleep mode.

Most of the forums for things like GPUgrid are deader than this forum so I figured that I would try here first. Any ideas?

Adbot
ADBOT LOVES YOU

skybolt_1
Oct 21, 2010
Fun Shoe

unpronounceable posted:

Did you set the client to always run on the gpu vs. running based on preferences? That's the only thing I can think of offhand, if you've already set the "in use" option and timeframe.

Nope, it's set to run based on preferences.

I have heard that for a while it wasn't recommended to use a cc_config.xml to control the behavior of BOINC but that might have changed? I don't use one today, just the GUI configuration tools.

skybolt_1
Oct 21, 2010
Fun Shoe
I added the suspend_debug flag to my cc_config.xml and got a bit more data. Of particular interest, this is what popped up in my log:

code:
28-Nov-2022 11:07:50 [---] Windows is suspending operations
28-Nov-2022 11:07:50 [---] Suspending computation - requested by operating system
28-Nov-2022 11:07:50 [---] [suspend] net_susp: yes; file_xfer_susp: no; reason: requested by operating system
28-Nov-2022 11:07:50 [---] Suspending network activity - requested by operating system
28-Nov-2022 11:07:51 [---] [suspend] net_susp: yes; file_xfer_susp: no; reason: requested by operating system
28-Nov-2022 11:07:51 [---] [suspend] net_susp: yes; file_xfer_susp: no; reason: requested by operating system
28-Nov-2022 11:07:52 [---] [suspend] net_susp: yes; file_xfer_susp: no; reason: requested by operating system
28-Nov-2022 11:33:55 [---] Resuming after OS suspension
28-Nov-2022 11:33:55 [---] Resuming computation
28-Nov-2022 11:33:55 [---] [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
28-Nov-2022 11:33:55 [---] Resuming network activity
28-Nov-2022 11:33:59 [GPUGRID] Sending scheduler request: Requested by project.
28-Nov-2022 11:33:59 [GPUGRID] Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
28-Nov-2022 11:33:59 [---] Windows is resuming operations
28-Nov-2022 11:33:59 [---] Suspending computation - computer is in use
What is interesting is that in the BOINC manager, GPUGRID remains running even after I suspend it through the manager, but does not increment i.e. it does not appear to be completing any work and the time spent doesn't increment as would be expected. Also, there are the usual Python processes running in task manager consuming CPU and GPU. Once I kill the primary Python process GPUGRID shows as "suspended" as expected.

Sounds like this is a GPUGRID-specific problem, so I should probably go over to those forums and see what their thoughts are...

skybolt_1
Oct 21, 2010
Fun Shoe

mdxi posted:

Also, be sure to check your global compute prefs and the project-specific preferences, both on the GPUGrid site. I think that compute prefs pulled as part of an update (i.e. runtime) override prefs set though a manager app (which happen at startup).

Thank you! I think that this was the root of my problem, in a way. Under the preferences in GPUGrid, I discovered the following:

code:
Run only the selected applications:
ACEMD 3: yes
ACEMD 4: yes
Quantum Chemistry (CPU): yes
Quantum Chemistry (CPU, beta): yes
Python Runtime (CPU, beta): yes
Python Runtime (GPU, beta): yes
I set all of the "beta" options to "no". Will see if that makes a difference.

skybolt_1
Oct 21, 2010
Fun Shoe

mdxi posted:

I have retired from WCG. Been crunching for them since 6 Dec 2017. Final totals: 420y 302d 17:17:44 CPU time over 1,532,109 WUs.

I'm winding down all other BOINC projects as well (just finishing in-flight WUs). I had DENIS and Einstein as my "backup" projects, but truth be told I've done more work for them than for WCG since The Troubles began.

No plans to stop contributing to FAH.

Why is this if you don't mind me asking?

Adbot
ADBOT LOVES YOU

skybolt_1
Oct 21, 2010
Fun Shoe

mdxi posted:

I really loved WCG's blend of biomedical and climate projects. But since IBM unceremoniously ditched the project ~2 years ago, it hasn't been stable for more than a month at a time, and the only half-way active project has been MCM. ARP has not returned to finish up its first run. SCC has not been able to get new WUs out the door. HSTB has departed to find a new home. OPNG has, according to the forums, sometimes been available, but only on GPU -- and I had long since switched my GPUs to doing work for FAH. Even when things are "stable", WUs run dry every weekend as soon as there's no staff around to kick the various processes which distribute and ingest them.

Most problematic for me is that the communication went from being rapid and fantastic (under IBM!) to sluggish and evasive (under a university research group!). No one really knows what's causing these problems, and the "answers" which are given -- when they are given -- are frequently at odds with observable behavior of the system. Everything just feels sad and bad. I blame IBM for this state of affairs, but that hardly matters and does not change the fact that the project no longer appears to be competently administered. The outcome is unfortunate, regardless of who caused it.

As for the other projects, it has nothing to do with the quality of what they're doing, or how things are being run. It has to do with the fact that I built up my volunteer computing infrastructure (hardware and software) largely to support WCG; it is what I felt was worth devoting the time (in software tooling) and cost (in hardware, power, and cooling) to. Over the past two years I slowly spun down from six dedicated machines, to three, to one, to none. DENIS is a fantastic project, but by its nature it is bursty, and its WUs are very quick to crunch; it didn't need all that hardware. Einstein is an interesting project, but the LIGO/VIRGO teams have real funding from elsewhere, and while I love astrophysics/cosmology it doesn't feel as impactful as biomedical research does. It's basically the same reason that I left Primegrid -- the project which got me into volunteer grid computing -- years and years ago. So it's easier to just turn off BOINC.

FAH is a well-behaved and self-contained piece of software, and now it just runs on my desktop machine. It has it 95%+ to itself, as I work almost exclusively on laptops.

As for the machines, one effectively became an upgrade for my desktop machine. One turned into an upgrade of my wife's gaming PC. I sent the mobo + CPU from one to my brother to upgrade one of his systems. The other three are powered off and collecting dust in the garage. I'm torn on whether to try to find good homes for them, or just donate them to Goodwill and let people be confused about why someone would give away a 16-core machine, and why that CPU would be paired with a GPU that's unsuitable for anything other than displaying a desktop (because I only ever had two "good" crunching GPUs; the other machines just had old spares to let me do diagnostics and upgrades).

Makes sense. The bulk of my computing is being done on a ESXi box w/ an ubuntu VM running the BOINC package on 40 vCPUs. I have done a little GPU computing, it has a Quadro P400 passed through to it. Your comment on the astrophysics / cosmology makes sense, I am of the same mindset.

Not sure whether my current setup makes sense to move to FAH, my understanding is that FAH really doesn't benefit much from CPU processing (which I have in spades) but more so from GPU.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply