Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

Question about atomics. I guess I don't really understand if/when they are necessary.

If I have a simple type like uint32_t being read/written by multiple threads, wouldn't a read/write already be atomic in terms of the assembly instructions? (assuming x86_64). Is there any possible way that a read could pick up some partially written value?

Is it only a concern when not assuming architecture, say some 8-bit microcontroller or something that maybe can't write the whole 32bits at once?

Adbot
ADBOT LOVES YOU

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

peepsalot posted:

Question about atomics. I guess I don't really understand if/when they are necessary.

If I have a simple type like uint32_t being read/written by multiple threads, wouldn't a read/write already be atomic in terms of the assembly instructions? (assuming x86_64). Is there any possible way that a read could pick up some partially written value?

Is it only a concern when not assuming architecture, say some 8-bit microcontroller or something that maybe can't write the whole 32bits at once?

inc tells the cpu to do multiple operations - read value, then set the value to that plus one. if two threads are updating, they can both read at the same time (getting the same value) and then set to the value incremented by one - this causes final result to just be just value++ instead of value+2. in asm you can make this atomic by specifying a lock prefix when issuing the instruction (and this lock prefix is still present in x86-64 asm.) you generally can't assume variables are atomic unless it's explicitly stated somewhere.

Bruegels Fuckbooks fucked around with this message at 19:50 on Oct 4, 2019

feedmegin
Jul 30, 2008

peepsalot posted:

Question about atomics. I guess I don't really understand if/when they are necessary.

If I have a simple type like uint32_t being read/written by multiple threads, wouldn't a read/write already be atomic in terms of the assembly instructions? (assuming x86_64). Is there any possible way that a read could pick up some partially written value?

Is it only a concern when not assuming architecture, say some 8-bit microcontroller or something that maybe can't write the whole 32bits at once?

As well as the above, consider a CPU with a 32 bit path to memory that's doing an unaligned 32 bit read/write that will therefore straddle two 32 bit words.

pseudorandom name
May 6, 2007

The other thing you have to worry about is what sort of memory operations can be reordered around your atomic loads or stores -- e.g. if you're implementing a lock, you don't want any accesses to be moved before lock acquisition or after lock release.

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

OK, well a little more info about the code in question. I'm using a vector to cache some calculation. I initialize the vector to all 0xFFFFFFF, which I know cannot be a valid result from the calculation.

I'm not doing any increment operations, only assignment.

AIUI on x86 everything is at least 16byte aligned (unless I had some weird layout struct union or something?), so I don't think the alignment aspect is relevant either?

code:
class ClassesInfo {
    static constexpr uint32_t NO_INDEX = ~uint32_t(0);

    const uint32_t classmod;
    mutable vector<uint32_t> offsets;
    ...    

    ClassesInfo(uint32_t cm) : classmod(cm), offsets(classmod, NO_INDEX) {
        ...
    }

    // called by many threads, each of which has a const reference to a single ClassesInfo object 
    uint32_t get_offset(uint64_t p) const {
        uint32_t r = p % classmod;
        if (offsets[r] == NO_INDEX) {
            uint32_t index = /* somewhat expensive calculations */;
            offsets[r] = index;
            return index;
        } else {
            return offsets[r];
        }
    }
}
BTW, the code calls things "Classes" in the sense of modular arithmetic, not OOP.

So in this specific case would not using atomics potentially cause any problems?

Also, I know its possible that for example two threads end up doing the same calculation by chance, but I don't particularly care since they will arrive at the same result. I'm trying to avoid as much overhead as possible, so not using mutex/locks etc here.

peepsalot fucked around with this message at 21:13 on Oct 4, 2019

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today
Not using atomics (or other protection) is UB. If the target environment is such that no special measures are necessary for the operations you perform, then that's what the operations will compile down to. Don't try to second-guess language semantics; the compiler will bite you.

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

pseudorandom name posted:

The other thing you have to worry about is what sort of memory operations can be reordered around your atomic loads or stores -- e.g. if you're implementing a lock, you don't want any accesses to be moved before lock acquisition or after lock release.
This is multi-level also - you have to make sure the compiler doesn't reorder accesses around your lock, and also that your cpu doesn't reorder accesses around your lock.

This:

Ralith posted:

If the target environment is such that no special measures are necessary for the operations you perform, then that's what the operations will compile down to.
is true but it's also a trust but verify situation - you can actually just go look at what the compiler generates there if it's important enough and you want to be sure.

feedmegin
Jul 30, 2008

peepsalot posted:

AIUI on x86 everything is at least 16byte aligned

No? Though j random variable on the stack is probably naturally aligned, of course; unaligned accesses tend to be more eg wire protocol stuff on networks tbf.

Also 'this happens to work without atomics for me on this specific ISA, with this specific sized variable, allocated in this specific way' is uh not a good reason to be like 'why bother with atomics at all' kindafing.

feedmegin fucked around with this message at 22:13 on Oct 4, 2019

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

OK, rather than "everything", what I meant was stuff on the heap, i.e. a base address returned by malloc.

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

OK, rather than "everything", what I meant was stuff on the heap, i.e. a base address returned by malloc.
x86(_64) is not a specific heap allocator

Volguus
Mar 3, 2009

peepsalot posted:

Also, I know its possible that for example two threads end up doing the same calculation by chance, but I don't particularly care since they will arrive at the same result. I'm trying to avoid as much overhead as possible, so not using mutex/locks etc here.

Atomics really don't have much overhead, especially when compared to mutexes. If there's code that will modify primitives from multiple threads, atomics is the least you can do. Don't sweat it, just go with them, unless you know for a fact that getting rid of atomics will significantly improve the speed of your program and the gain is worth it for you (and works on your platform and the stars align so nothing explodes in your face).

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
malloc is required to return memory that is sufficiently aligned for all fundamental types, but it’s never been clear whether to count e.g. vector types as fundamental; among other things, nobody wants to align malloc to 64 bytes just because the system supports AVX512. Linux and Windows align malloc to 8 or 16 depending on whether the target is 64-bit, which is fairly common; Darwin is something of an exception because it always aligns malloc to 16.

Regardless, the addresses of individual fields or array elements on the heap may be less aligned than the allocation they’re a part of.

Foxfire_
Nov 8, 2010

gcc appears to believe that a store to an aligned int on x64 is inherently atomic. It also is smart enough to believe this about a .store() on a std::atomic<int>, so while it is probably safe to use bare uint32_t in your code, you get no performance improvement.

code:
#include <atomic>
#include <cstdint>

extern std::atomic<uint32_t> global;

void Foo(uint32_t value)
{
    global.store(value, std::memory_order_relaxed);
}
compiles into:

code:
Foo(unsigned int):
        mov     DWORD PTR global[rip], edi
        ret
(and somewhere later after all your work is done, you need a memory fence)

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

OK, I'll do it the proper way and use atomic operations. One more question though, I'm using OpenMP for the parallel aspect of the code. Does anyone know if there's practically any difference between using "#pragma omp atomic" vs "std::atomic" ?

I guess the OpenMP version seems a lot more convenient to implement, I haven't fully grokked the std library way of doing it.
I found this comparison of different variations and there's like 10 ways to use std::atomic? https://www.arangodb.com/2015/02/comparing-atomic-mutex-rwlocks/
I'm compiling on clang-9 if it matters.

baby puzzle
Jun 3, 2011

I'll Sequence your Storm.
I think I've brought this up many times already but I still have no solution. How do I debug crashes that people are getting on their machines? All I have is a log file with a callstack that is all just addresses of functions. I've tried shipping the pdb file, but for some reason that doesn't work on users' machines, and I still just get function addresses in the callstack and no function names. I've tried generating crash "minidump" files, but none that I have ever received has ever been useful or even valid.

There has to be a way to simply take the address of a function and just figure out which function that is supposed to be, right??

feedmegin
Jul 30, 2008

baby puzzle posted:

I think I've brought this up many times already but I still have no solution. How do I debug crashes that people are getting on their machines? All I have is a log file with a callstack that is all just addresses of functions. I've tried shipping the pdb file, but for some reason that doesn't work on users' machines, and I still just get function addresses in the callstack and no function names. I've tried generating crash "minidump" files, but none that I have ever received has ever been useful or even valid.

There has to be a way to simply take the address of a function and just figure out which function that is supposed to be, right??

ASLR might gently caress you there...

Beef
Jul 26, 2004
The #pragma omp atomic is not just an atomic move, it also puts in the correct fences such that the other threads don't work with stale data.
In your case, you might not even need the atomic.
- Simple stores are indeed atomic in x86. I can double check, but I remember that split cache line writes are handled correctly in the backend, you do pay for it in performance terms though. Definitely do an aligned malloc.
- Worst case it will recalculate the index if a thread is working on a stale read value. If this worries you, definitely use std::atomic with a memory_order_release or just the #pragma omp atomic. These atomics are not just an atomic update, but also insert the correct memory fences to make sure that no thread works on stale values.


peepsalot posted:


code:
class ClassesInfo {

    mutable vector<uint32_t> offsets;
    ...    


    // called by many threads, each of which has a const reference to a single ClassesInfo object 
    uint32_t get_offset(uint64_t p) const {
        uint32_t r = p % classmod;
        if (offsets[r] == NO_INDEX) {
            uint32_t index = /* somewhat expensive calculations */;
            offsets[r] = index;
            return index;
        } else {
            return offsets[r];
        }
    }
}


Consider changing how and where the offsets are stored and updated here. The abuse of a mutable member to update values of a const instance is real a coding horror, especially in the context of multithreading where you see someone const as 'safe, read only'. Can you initialize all offsets once, in a separate #omp parallel for loop? I guarantee it will be faster (avoids cache trashing etc), unless your offsets[] is super sparse in practice.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Dren posted:

i know about address sanitizer and just found out about undefined behavior sanitizer, also about sanitize stack protector and stack canaries. Anyone got any other things for finding a corruption in a single threaded program? (it’s not actually single threaded but there’s only two threads and i trust the i/o library much more than the other code)

Looks like you got it, but in the future you might be interested in https://rr-project.org/ too.

Star War Sex Parrot
Oct 2, 2003

Beef posted:

I remember that split cache line writes are handled correctly in the backend
It depends on how you got the pointer. You can cast it to a std::atomic all you want but if it straddles a cache line you’re hosed. We had a nasty bug to track down about a year ago for that exact reason: torn read of an atomic pointer that in pretty rare scenarios would straddle a cache line. Basically we had buffers coming out of an allocator that in turn were divvied up and cast to other types, and someone goofed and didn’t think about alignment requirements for the atomic types.

Your later comment about using aligned alloc is probably right though. :)

Star War Sex Parrot fucked around with this message at 12:56 on Oct 8, 2019

Beef
Jul 26, 2004
Nasty. Was that for x86?

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Star War Sex Parrot posted:

It depends on how you got the pointer. You can cast it to a std::atomic all you want but if it straddles a cache line you’re hosed. We had a nasty bug to track down about a year ago for that exact reason: torn read of an atomic pointer that in pretty rare scenarios would straddle a cache line. Basically we had buffers coming out of an allocator that in turn were divvied up and cast to other types, and someone goofed and didn’t think about alignment requirements for the atomic types.

What a nightmare.

Star War Sex Parrot
Oct 2, 2003

Beef posted:

Nasty. Was that for x86?
Yep.

Subjunctive posted:

What a nightmare.
I actually found it pretty fun to debug, but yeah it was tricky. We had a hypothesis for the root cause pretty quickly, and then we wrote a contrived Google Test that emulated the corner case that we thought we were hitting and sure enough it reproduced the torn access pretty quickly.

Star War Sex Parrot fucked around with this message at 15:52 on Oct 8, 2019

Foxfire_
Nov 8, 2010

I don't think omp atomic inserts any barriers; gcc doesn't compile it that way.

If you want fences, you need to tack seq_cst onto the end of the pragma (reverse of std::atomic where it defaults to fencing)

Dren
Jan 5, 2001

Pillbug

Subjunctive posted:

Looks like you got it, but in the future you might be interested in https://rr-project.org/ too.

Thanks that looks amazing. I’ve run across a paid tool that does the same thing, didn’t know there is an open source one. My primary dev environment is in virtualbox but we could switch to vmware for something like this.

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

Beef posted:

The abuse of a mutable member to update values of a const instance is real a coding horror...
Isn't that the whole point of mutable members though? What else would they possibly be for? Are you saying that *any* use of mutable is "abuse" and a coding horror in your opinion? I mean, I was just using it for memo-ization of values that are essentially constant once initialized.

Anyways, I briefly implemented it with omp atomic, and it worked fine with no noticeable performance impact.
I also tried doing the calculations all beforehand, in parallel, like you said and it wasn't noticeably faster. Maybe marginally slower, but the measurement differences were in the noise.

After more profiling, the whole "expensive calculation" I was trying to memoize ended up being negligible enough that I just reverted to calculating it on the fly, so I ditched the whole vector cache deal entirely; no more mutable. But its still been an interesting topic to learn more about.

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today
C++ has many features it probably shouldn't, or that are almost never justified in practice. The mutable keyword on data members is one of them.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Without mutable you would not be able to have a const member function that locks a mutex. Obviously it could easily be misused for awful things, but I'm not sure that I ever actually have seen that happen. Updating a cached value is only an awful thing if you have an incorrect mental model for what const means in C++ (notable it does not mean immutable or pure).

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Plorkyeran posted:

Without mutable you would not be able to have a const member function that locks a mutex. Obviously it could easily be misused for awful things, but I'm not sure that I ever actually have seen that happen.

he says, immediately following a series of posts discussing its UBful use on an unguarded std::vector.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
It's not like the code would have been fine and dandy if the function hadn't been marked const. The use of mutable isn't the problem with that code.

Beef
Jul 26, 2004

Plorkyeran posted:

... incorrect mental model for what const means in C++ (notable it does not mean immutable or pure).

No, but the point is that it is very often used as a proxy, with C/C++ lacking those proper immutable semantics.


Tangentially, the fact that const can be cast away at any point in C/C++ is one of those things you pay for, even if you are not using it. It frustrated Walter Bright enough to add an immutable modifier to D.

edit: :actually:
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#con-constants-and-immutability

quote:

You can’t have a race condition on a constant.

Beef fucked around with this message at 16:59 on Oct 10, 2019

baby puzzle
Jun 3, 2011

I'll Sequence your Storm.
So now that I know how not to do it, What’s the best way to get crash reports from users? I’m sure there’s a library or something that just works? I’m on Windows.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Beef posted:

No, but the point is that it is very often used as a proxy, with C/C++ lacking those proper immutable semantics.
constexpr!

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

baby puzzle posted:

So now that I know how not to do it, What’s the best way to get crash reports from users? I’m sure there’s a library or something that just works? I’m on Windows.

I’ve had good success with breakpad.

Zopotantor
Feb 24, 2013

...und ist er drin dann lassen wir ihn niemals wieder raus...

That has its own problems. Which is why we'll get things like "constinit" and "consteval" in C++20.
https://www.youtube.com/watch?v=Xb6u8BrfHjw&t=2210s

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS
I like how consteval does the thing you'd imagine constexpr doing when put on a function.

Xarn
Jun 26, 2015
C++11 constexpr was an intentionally crippled prototype and we've been fixing it ever since...

It still sucks.

Zopotantor
Feb 24, 2013

...und ist er drin dann lassen wir ihn niemals wieder raus...

Xarn posted:

C++11 constexpr was an intentionally crippled prototype and we've been fixing it ever since...

It still sucks.

Basically all of C++ is like that.
- exception specifications
- auto_ptr
- volatile
- Unicode literals
- iterators (they were a really good idea, but iterator+sentinel is a better one)
- ...

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:

Someone needs to bite the bullet and make C+++ that drops backwards compatibility for a bunch of things. it will never get used though

peepsalot
Apr 24, 2007

        PEEP THIS...
           BITCH!

I got a cmake / toolchain question. I have a project that uses cmake and I want to have it build with all clang/llvm tools.

I found that I have to initialize cmake with:
CXX="clang++" LDFLAGS="-fuse-ld=lld" cmake .

I also found these instructions to verify the correct linker is used:

https://lld.llvm.org/#using-lld posted:

If you are in doubt whether you are successfully using LLD or not, run readelf --string-dump .comment <output-file> and examine the output.

And when I run that on my binary, I see lld and clang being used, but also GCC?

quote:

String dump of section '.comment':
[ 0] GCC: (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
[ 2b] Linker: LLD 9.0.0
[ 3d] clang version 9.0.0-svn374193-1~exp1~20191009183852.57 (branches/release_90)
I don't understand where that's still coming from. I do have some *dynamically* linked libraries, which I believe were built with GCC, but would dynamic libraries cause GCC to leave a comment in the binary?

Or is there possibly some other part of the toolchain I'm not setting correctly to make it 100% clang/llvm?

Adbot
ADBOT LOVES YOU

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

I got a cmake / toolchain question. I have a project that uses cmake and I want to have it build with all clang/llvm tools.

I found that I have to initialize cmake with:
CXX="clang++" LDFLAGS="-fuse-ld=lld" cmake .

I also found these instructions to verify the correct linker is used:


And when I run that on my binary, I see lld and clang being used, but also GCC?

I don't understand where that's still coming from. I do have some *dynamically* linked libraries, which I believe were built with GCC, but would dynamic libraries cause GCC to leave a comment in the binary?

Or is there possibly some other part of the toolchain I'm not setting correctly to make it 100% clang/llvm?

At a guess, cmake is probably invoking gcc to drive the linker by default, and you haven't overridden that; you're just telling gcc to in turn delegate to ldd. Which should work just fine, but you could configure cmake to use clang for that too if you prefer.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply