C/C++ Programming Questions Not Worth Their Own Thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »

peepsalot: Apr 24, 2007; ��PEEP THIS...
��BITCH!

Question about atomics. I guess I don't really understand if/when they are necessary.

If I have a simple type like uint32_t being read/written by multiple threads, wouldn't a read/write already be atomic in terms of the assembly instructions? (assuming x86_64). Is there any possible way that a read could pick up some partially written value?

Is it only a concern when not assuming architecture, say some 8-bit microcontroller or something that maybe can't write the whole 32bits at once?

# ? Oct 4, 2019 19:04

Adbot: ADBOT LOVES YOU

# ? Jun 7, 2024 12:38

Bruegels Fuckbooks: Sep 14, 2004; Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

peepsalot posted:

Question about atomics. I guess I don't really understand if/when they are necessary.

If I have a simple type like uint32_t being read/written by multiple threads, wouldn't a read/write already be atomic in terms of the assembly instructions? (assuming x86_64). Is there any possible way that a read could pick up some partially written value?

Is it only a concern when not assuming architecture, say some 8-bit microcontroller or something that maybe can't write the whole 32bits at once?

inc tells the cpu to do multiple operations - read value, then set the value to that plus one. if two threads are updating, they can both read at the same time (getting the same value) and then set to the value incremented by one - this causes final result to just be just value++ instead of value+2. in asm you can make this atomic by specifying a lock prefix when issuing the instruction (and this lock prefix is still present in x86-64 asm.) you generally can't assume variables are atomic unless it's explicitly stated somewhere.

Bruegels Fuckbooks fucked around with this message at 19:50 on Oct 4, 2019

# ? Oct 4, 2019 19:47

feedmegin: Jul 30, 2008

peepsalot posted:

Question about atomics. I guess I don't really understand if/when they are necessary.

If I have a simple type like uint32_t being read/written by multiple threads, wouldn't a read/write already be atomic in terms of the assembly instructions? (assuming x86_64). Is there any possible way that a read could pick up some partially written value?

Is it only a concern when not assuming architecture, say some 8-bit microcontroller or something that maybe can't write the whole 32bits at once?

As well as the above, consider a CPU with a 32 bit path to memory that's doing an unaligned 32 bit read/write that will therefore straddle two 32 bit words.

# ? Oct 4, 2019 20:17

pseudorandom name: May 6, 2007

The other thing you have to worry about is what sort of memory operations can be reordered around your atomic loads or stores -- e.g. if you're implementing a lock, you don't want any accesses to be moved before lock acquisition or after lock release.

# ? Oct 4, 2019 20:17

peepsalot: Apr 24, 2007; ��PEEP THIS...
��BITCH!

OK, well a little more info about the code in question. I'm using a vector to cache some calculation. I initialize the vector to all 0xFFFFFFF, which I know cannot be a valid result from the calculation.

I'm not doing any increment operations, only assignment.

AIUI on x86 everything is at least 16byte aligned (unless I had some weird layout struct union or something?), so I don't think the alignment aspect is relevant either?

code:

class ClassesInfo {
    static constexpr uint32_t NO_INDEX = ~uint32_t(0);

    const uint32_t classmod;
    mutable vector<uint32_t> offsets;
    ...    

    ClassesInfo(uint32_t cm) : classmod(cm), offsets(classmod, NO_INDEX) {
        ...
    }

    // called by many threads, each of which has a const reference to a single ClassesInfo object 
    uint32_t get_offset(uint64_t p) const {
        uint32_t r = p % classmod;
        if (offsets[r] == NO_INDEX) {
            uint32_t index = /* somewhat expensive calculations */;
            offsets[r] = index;
            return index;
        } else {
            return offsets[r];
        }
    }
}

BTW, the code calls things "Classes" in the sense of modular arithmetic, not OOP.

So in this specific case would not using atomics potentially cause any problems?

Also, I know its possible that for example two threads end up doing the same calculation by chance, but I don't particularly care since they will arrive at the same result. I'm trying to avoid as much overhead as possible, so not using mutex/locks etc here.

peepsalot fucked around with this message at 21:13 on Oct 4, 2019

# ? Oct 4, 2019 21:05

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Not using atomics (or other protection) is UB. If the target environment is such that no special measures are necessary for the operations you perform, then that's what the operations will compile down to. Don't try to second-guess language semantics; the compiler will bite you.

# ? Oct 4, 2019 21:23

Jeffrey of YOSPOS: Dec 22, 2005; GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

pseudorandom name posted:

The other thing you have to worry about is what sort of memory operations can be reordered around your atomic loads or stores -- e.g. if you're implementing a lock, you don't want any accesses to be moved before lock acquisition or after lock release.

This is multi-level also - you have to make sure the compiler doesn't reorder accesses around your lock, and also that your cpu doesn't reorder accesses around your lock.

This:

Ralith posted:

If the target environment is such that no special measures are necessary for the operations you perform, then that's what the operations will compile down to.

is true but it's also a trust but verify situation - you can actually just go look at what the compiler generates there if it's important enough and you want to be sure.

# ? Oct 4, 2019 22:01

feedmegin: Jul 30, 2008

peepsalot posted:

AIUI on x86 everything is at least 16byte aligned

No? Though j random variable on the stack is probably naturally aligned, of course; unaligned accesses tend to be more eg wire protocol stuff on networks tbf.

Also 'this happens to work without atomics for me on this specific ISA, with this specific sized variable, allocated in this specific way' is uh not a good reason to be like 'why bother with atomics at all' kindafing.

feedmegin fucked around with this message at 22:13 on Oct 4, 2019

# ? Oct 4, 2019 22:09

peepsalot: Apr 24, 2007; ��PEEP THIS...
��BITCH!

feedmegin posted:

No?

OK, rather than "everything", what I meant was stuff on the heap, i.e. a base address returned by malloc.

# ? Oct 4, 2019 23:39

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

OK, rather than "everything", what I meant was stuff on the heap, i.e. a base address returned by malloc.

x86(_64) is not a specific heap allocator

# ? Oct 4, 2019 23:51

Volguus: Mar 3, 2009

peepsalot posted:

Also, I know its possible that for example two threads end up doing the same calculation by chance, but I don't particularly care since they will arrive at the same result. I'm trying to avoid as much overhead as possible, so not using mutex/locks etc here.

Atomics really don't have much overhead, especially when compared to mutexes. If there's code that will modify primitives from multiple threads, atomics is the least you can do. Don't sweat it, just go with them, unless you know for a fact that getting rid of atomics will significantly improve the speed of your program and the gain is worth it for you (and works on your platform and the stars align so nothing explodes in your face).

# ? Oct 5, 2019 00:41

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

malloc is required to return memory that is sufficiently aligned for all fundamental types, but it�s never been clear whether to count e.g. vector types as fundamental; among other things, nobody wants to align malloc to 64 bytes just because the system supports AVX512. Linux and Windows align malloc to 8 or 16 depending on whether the target is 64-bit, which is fairly common; Darwin is something of an exception because it always aligns malloc to 16.

Regardless, the addresses of individual fields or array elements on the heap may be less aligned than the allocation they�re a part of.

# ? Oct 5, 2019 01:54

Foxfire_: Nov 8, 2010

gcc appears to believe that a store to an aligned int on x64 is inherently atomic. It also is smart enough to believe this about a .store() on a std::atomic<int>, so while it is probably safe to use bare uint32_t in your code, you get no performance improvement.

code:

#include <atomic>
#include <cstdint>

extern std::atomic<uint32_t> global;

void Foo(uint32_t value)
{
    global.store(value, std::memory_order_relaxed);
}

compiles into:

code:

Foo(unsigned int):
        mov     DWORD PTR global[rip], edi
        ret

(and somewhere later after all your work is done, you need a memory fence)

# ? Oct 5, 2019 04:30

peepsalot: Apr 24, 2007; ��PEEP THIS...
��BITCH!

OK, I'll do it the proper way and use atomic operations. One more question though, I'm using OpenMP for the parallel aspect of the code. Does anyone know if there's practically any difference between using "#pragma omp atomic" vs "std::atomic" ?

I guess the OpenMP version seems a lot more convenient to implement, I haven't fully grokked the std library way of doing it.
I found this comparison of different variations and there's like 10 ways to use std::atomic? https://www.arangodb.com/2015/02/comparing-atomic-mutex-rwlocks/
I'm compiling on clang-9 if it matters.

# ? Oct 5, 2019 04:31

baby puzzle: Jun 3, 2011; I'll Sequence your Storm.

I think I've brought this up many times already but I still have no solution. How do I debug crashes that people are getting on their machines? All I have is a log file with a callstack that is all just addresses of functions. I've tried shipping the pdb file, but for some reason that doesn't work on users' machines, and I still just get function addresses in the callstack and no function names. I've tried generating crash "minidump" files, but none that I have ever received has ever been useful or even valid.

There has to be a way to simply take the address of a function and just figure out which function that is supposed to be, right??

# ? Oct 7, 2019 17:13

feedmegin: Jul 30, 2008

baby puzzle posted:

I think I've brought this up many times already but I still have no solution. How do I debug crashes that people are getting on their machines? All I have is a log file with a callstack that is all just addresses of functions. I've tried shipping the pdb file, but for some reason that doesn't work on users' machines, and I still just get function addresses in the callstack and no function names. I've tried generating crash "minidump" files, but none that I have ever received has ever been useful or even valid.

There has to be a way to simply take the address of a function and just figure out which function that is supposed to be, right??

ASLR might gently caress you there...

# ? Oct 7, 2019 18:38

Beef: Jul 26, 2004

The #pragma omp atomic is not just an atomic move, it also puts in the correct fences such that the other threads don't work with stale data.
In your case, you might not even need the atomic.
- Simple stores are indeed atomic in x86. I can double check, but I remember that split cache line writes are handled correctly in the backend, you do pay for it in performance terms though. Definitely do an aligned malloc.
- Worst case it will recalculate the index if a thread is working on a stale read value. If this worries you, definitely use std::atomic with a memory_order_release or just the #pragma omp atomic. These atomics are not just an atomic update, but also insert the correct memory fences to make sure that no thread works on stale values.

peepsalot posted:

code:

class ClassesInfo {

    mutable vector<uint32_t> offsets;
    ...    


    // called by many threads, each of which has a const reference to a single ClassesInfo object 
    uint32_t get_offset(uint64_t p) const {
        uint32_t r = p % classmod;
        if (offsets[r] == NO_INDEX) {
            uint32_t index = /* somewhat expensive calculations */;
            offsets[r] = index;
            return index;
        } else {
            return offsets[r];
        }
    }
}

Consider changing how and where the offsets are stored and updated here. The abuse of a mutable member to update values of a const instance is real a coding horror, especially in the context of multithreading where you see someone const as 'safe, read only'. Can you initialize all offsets once, in a separate #omp parallel for loop? I guarantee it will be faster (avoids cache trashing etc), unless your offsets[] is super sparse in practice.

# ? Oct 8, 2019 11:43

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Dren posted:

i know about address sanitizer and just found out about undefined behavior sanitizer, also about sanitize stack protector and stack canaries. Anyone got any other things for finding a corruption in a single threaded program? (it�s not actually single threaded but there�s only two threads and i trust the i/o library much more than the other code)

Looks like you got it, but in the future you might be interested in https://rr-project.org/ too.

# ? Oct 8, 2019 11:56

Star War Sex Parrot: Oct 2, 2003

Beef posted:

I remember that split cache line writes are handled correctly in the backend

It depends on how you got the pointer. You can cast it to a std::atomic all you want but if it straddles a cache line you�re hosed. We had a nasty bug to track down about a year ago for that exact reason: torn read of an atomic pointer that in pretty rare scenarios would straddle a cache line. Basically we had buffers coming out of an allocator that in turn were divvied up and cast to other types, and someone goofed and didn�t think about alignment requirements for the atomic types.

Your later comment about using aligned alloc is probably right though.

Star War Sex Parrot fucked around with this message at 12:56 on Oct 8, 2019

# ? Oct 8, 2019 12:53

Beef: Jul 26, 2004

Nasty. Was that for x86?

# ? Oct 8, 2019 13:48

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Star War Sex Parrot posted:

It depends on how you got the pointer. You can cast it to a std::atomic all you want but if it straddles a cache line you�re hosed. We had a nasty bug to track down about a year ago for that exact reason: torn read of an atomic pointer that in pretty rare scenarios would straddle a cache line. Basically we had buffers coming out of an allocator that in turn were divvied up and cast to other types, and someone goofed and didn�t think about alignment requirements for the atomic types.

What a nightmare.

# ? Oct 8, 2019 13:59

Star War Sex Parrot: Oct 2, 2003

Beef posted:

Nasty. Was that for x86?

Yep.

Subjunctive posted:

What a nightmare.

I actually found it pretty fun to debug, but yeah it was tricky. We had a hypothesis for the root cause pretty quickly, and then we wrote a contrived Google Test that emulated the corner case that we thought we were hitting and sure enough it reproduced the torn access pretty quickly.

Star War Sex Parrot fucked around with this message at 15:52 on Oct 8, 2019

# ? Oct 8, 2019 14:48

Foxfire_: Nov 8, 2010

I don't think omp atomic inserts any barriers; gcc doesn't compile it that way.

If you want fences, you need to tack seq_cst onto the end of the pragma (reverse of std::atomic where it defaults to fencing)

# ? Oct 8, 2019 20:58

Dren: Jan 5, 2001; Pillbug

Subjunctive posted:

Looks like you got it, but in the future you might be interested in https://rr-project.org/ too.

Thanks that looks amazing. I�ve run across a paid tool that does the same thing, didn�t know there is an open source one. My primary dev environment is in virtualbox but we could switch to vmware for something like this.

# ? Oct 9, 2019 03:25

peepsalot: Apr 24, 2007; ��PEEP THIS...
��BITCH!

Beef posted:

The abuse of a mutable member to update values of a const instance is real a coding horror...

Isn't that the whole point of mutable members though? What else would they possibly be for? Are you saying that *any* use of mutable is "abuse" and a coding horror in your opinion? I mean, I was just using it for memo-ization of values that are essentially constant once initialized.

Anyways, I briefly implemented it with omp atomic, and it worked fine with no noticeable performance impact.
I also tried doing the calculations all beforehand, in parallel, like you said and it wasn't noticeably faster. Maybe marginally slower, but the measurement differences were in the noise.

After more profiling, the whole "expensive calculation" I was trying to memoize ended up being negligible enough that I just reverted to calculating it on the fly, so I ditched the whole vector cache deal entirely; no more mutable. But its still been an interesting topic to learn more about.

# ? Oct 9, 2019 18:55

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

C++ has many features it probably shouldn't, or that are almost never justified in practice. The mutable keyword on data members is one of them.

# ? Oct 9, 2019 19:09

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Without mutable you would not be able to have a const member function that locks a mutex. Obviously it could easily be misused for awful things, but I'm not sure that I ever actually have seen that happen. Updating a cached value is only an awful thing if you have an incorrect mental model for what const means in C++ (notable it does not mean immutable or pure).

# ? Oct 9, 2019 20:24

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Plorkyeran posted:

Without mutable you would not be able to have a const member function that locks a mutex. Obviously it could easily be misused for awful things, but I'm not sure that I ever actually have seen that happen.

he says, immediately following a series of posts discussing its UBful use on an unguarded std::vector.

# ? Oct 9, 2019 20:48

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

It's not like the code would have been fine and dandy if the function hadn't been marked const. The use of mutable isn't the problem with that code.

# ? Oct 10, 2019 00:22

Beef: Jul 26, 2004

Plorkyeran posted:

... incorrect mental model for what const means in C++ (notable it does not mean immutable or pure).

No, but the point is that it is very often used as a proxy, with C/C++ lacking those proper immutable semantics.

Tangentially, the fact that const can be cast away at any point in C/C++ is one of those things you pay for, even if you are not using it. It frustrated Walter Bright enough to add an immutable modifier to D.

edit: :actually:

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#con-constants-and-immutability

quote:

You can�t have a race condition on a constant.

Beef fucked around with this message at 16:59 on Oct 10, 2019

# ? Oct 10, 2019 16:49

baby puzzle: Jun 3, 2011; I'll Sequence your Storm.

So now that I know how not to do it, What�s the best way to get crash reports from users? I�m sure there�s a library or something that just works? I�m on Windows.

# ? Oct 10, 2019 23:06

roomforthetuna: Mar 22, 2005; I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Beef posted:

No, but the point is that it is very often used as a proxy, with C/C++ lacking those proper immutable semantics.

constexpr!

# ? Oct 11, 2019 01:16

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

baby puzzle posted:

So now that I know how not to do it, What�s the best way to get crash reports from users? I�m sure there�s a library or something that just works? I�m on Windows.

I�ve had good success with breakpad.

# ? Oct 11, 2019 02:50

Zopotantor: Feb 24, 2013; ...und ist er drin dann lassen wir ihn niemals wieder raus...

roomforthetuna posted:

constexpr!

That has its own problems. Which is why we'll get things like "constinit" and "consteval" in C++20.
https://www.youtube.com/watch?v=Xb6u8BrfHjw&t=2210s

# ? Oct 11, 2019 06:00

Jeffrey of YOSPOS: Dec 22, 2005; GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

I like how consteval does the thing you'd imagine constexpr doing when put on a function.

# ? Oct 11, 2019 06:23

Xarn: Jun 26, 2015

C++11 constexpr was an intentionally crippled prototype and we've been fixing it ever since...

It still sucks.

# ? Oct 11, 2019 07:18

Zopotantor: Feb 24, 2013; ...und ist er drin dann lassen wir ihn niemals wieder raus...

Xarn posted:

C++11 constexpr was an intentionally crippled prototype and we've been fixing it ever since...

It still sucks.

Basically all of C++ is like that.
- exception specifications
- auto_ptr
- volatile
- Unicode literals
- iterators (they were a really good idea, but iterator+sentinel is a better one)
- ...

# ? Oct 11, 2019 20:26

taqueso: Mar 8, 2004

Someone needs to bite the bullet and make C+++ that drops backwards compatibility for a bunch of things. it will never get used though

# ? Oct 11, 2019 21:34

peepsalot: Apr 24, 2007; ��PEEP THIS...
��BITCH!

I got a cmake / toolchain question. I have a project that uses cmake and I want to have it build with all clang/llvm tools.

I found that I have to initialize cmake with:
CXX="clang++" LDFLAGS="-fuse-ld=lld" cmake .

I also found these instructions to verify the correct linker is used:

https://lld.llvm.org/#using-lld posted:

If you are in doubt whether you are successfully using LLD or not, run readelf --string-dump .comment <output-file> and examine the output.

And when I run that on my binary, I see lld and clang being used, but also GCC?

quote:

String dump of section '.comment':
[ 0] GCC: (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
[ 2b] Linker: LLD 9.0.0
[ 3d] clang version 9.0.0-svn374193-1~exp1~20191009183852.57 (branches/release_90)

I don't understand where that's still coming from. I do have some *dynamically* linked libraries, which I believe were built with GCC, but would dynamic libraries cause GCC to leave a comment in the binary?

Or is there possibly some other part of the toolchain I'm not setting correctly to make it 100% clang/llvm?

# ? Oct 11, 2019 21:54

Adbot: ADBOT LOVES YOU

# ? Jun 7, 2024 12:38

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

peepsalot posted:

I got a cmake / toolchain question. I have a project that uses cmake and I want to have it build with all clang/llvm tools.

I found that I have to initialize cmake with:
CXX="clang++" LDFLAGS="-fuse-ld=lld" cmake .

I also found these instructions to verify the correct linker is used:

And when I run that on my binary, I see lld and clang being used, but also GCC?

I don't understand where that's still coming from. I do have some *dynamically* linked libraries, which I believe were built with GCC, but would dynamic libraries cause GCC to leave a comment in the binary?

Or is there possibly some other part of the toolchain I'm not setting correctly to make it 100% clang/llvm?

At a guess, cmake is probably invoking gcc to drive the linker by default, and you haven't overridden that; you're just telling gcc to in turn delegate to ldd. Which should work just fine, but you could configure cmake to use clang for that too if you prefer.

# ? Oct 11, 2019 22:16

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »