|
Question about atomics. I guess I don't really understand if/when they are necessary. If I have a simple type like uint32_t being read/written by multiple threads, wouldn't a read/write already be atomic in terms of the assembly instructions? (assuming x86_64). Is there any possible way that a read could pick up some partially written value? Is it only a concern when not assuming architecture, say some 8-bit microcontroller or something that maybe can't write the whole 32bits at once?
|
# ? Oct 4, 2019 19:04 |
|
|
# ? Jun 7, 2024 12:38 |
|
peepsalot posted:Question about atomics. I guess I don't really understand if/when they are necessary. inc tells the cpu to do multiple operations - read value, then set the value to that plus one. if two threads are updating, they can both read at the same time (getting the same value) and then set to the value incremented by one - this causes final result to just be just value++ instead of value+2. in asm you can make this atomic by specifying a lock prefix when issuing the instruction (and this lock prefix is still present in x86-64 asm.) you generally can't assume variables are atomic unless it's explicitly stated somewhere. Bruegels Fuckbooks fucked around with this message at 19:50 on Oct 4, 2019 |
# ? Oct 4, 2019 19:47 |
|
peepsalot posted:Question about atomics. I guess I don't really understand if/when they are necessary. As well as the above, consider a CPU with a 32 bit path to memory that's doing an unaligned 32 bit read/write that will therefore straddle two 32 bit words.
|
# ? Oct 4, 2019 20:17 |
|
The other thing you have to worry about is what sort of memory operations can be reordered around your atomic loads or stores -- e.g. if you're implementing a lock, you don't want any accesses to be moved before lock acquisition or after lock release.
|
# ? Oct 4, 2019 20:17 |
|
OK, well a little more info about the code in question. I'm using a vector to cache some calculation. I initialize the vector to all 0xFFFFFFF, which I know cannot be a valid result from the calculation. I'm not doing any increment operations, only assignment. AIUI on x86 everything is at least 16byte aligned (unless I had some weird layout struct union or something?), so I don't think the alignment aspect is relevant either? code:
So in this specific case would not using atomics potentially cause any problems? Also, I know its possible that for example two threads end up doing the same calculation by chance, but I don't particularly care since they will arrive at the same result. I'm trying to avoid as much overhead as possible, so not using mutex/locks etc here. peepsalot fucked around with this message at 21:13 on Oct 4, 2019 |
# ? Oct 4, 2019 21:05 |
|
Not using atomics (or other protection) is UB. If the target environment is such that no special measures are necessary for the operations you perform, then that's what the operations will compile down to. Don't try to second-guess language semantics; the compiler will bite you.
|
# ? Oct 4, 2019 21:23 |
|
pseudorandom name posted:The other thing you have to worry about is what sort of memory operations can be reordered around your atomic loads or stores -- e.g. if you're implementing a lock, you don't want any accesses to be moved before lock acquisition or after lock release. This: Ralith posted:If the target environment is such that no special measures are necessary for the operations you perform, then that's what the operations will compile down to.
|
# ? Oct 4, 2019 22:01 |
|
peepsalot posted:AIUI on x86 everything is at least 16byte aligned No? Though j random variable on the stack is probably naturally aligned, of course; unaligned accesses tend to be more eg wire protocol stuff on networks tbf. Also 'this happens to work without atomics for me on this specific ISA, with this specific sized variable, allocated in this specific way' is uh not a good reason to be like 'why bother with atomics at all' kindafing. feedmegin fucked around with this message at 22:13 on Oct 4, 2019 |
# ? Oct 4, 2019 22:09 |
|
OK, rather than "everything", what I meant was stuff on the heap, i.e. a base address returned by malloc.
|
# ? Oct 4, 2019 23:39 |
|
peepsalot posted:OK, rather than "everything", what I meant was stuff on the heap, i.e. a base address returned by malloc.
|
# ? Oct 4, 2019 23:51 |
|
peepsalot posted:Also, I know its possible that for example two threads end up doing the same calculation by chance, but I don't particularly care since they will arrive at the same result. I'm trying to avoid as much overhead as possible, so not using mutex/locks etc here. Atomics really don't have much overhead, especially when compared to mutexes. If there's code that will modify primitives from multiple threads, atomics is the least you can do. Don't sweat it, just go with them, unless you know for a fact that getting rid of atomics will significantly improve the speed of your program and the gain is worth it for you (and works on your platform and the stars align so nothing explodes in your face).
|
# ? Oct 5, 2019 00:41 |
|
malloc is required to return memory that is sufficiently aligned for all fundamental types, but it’s never been clear whether to count e.g. vector types as fundamental; among other things, nobody wants to align malloc to 64 bytes just because the system supports AVX512. Linux and Windows align malloc to 8 or 16 depending on whether the target is 64-bit, which is fairly common; Darwin is something of an exception because it always aligns malloc to 16. Regardless, the addresses of individual fields or array elements on the heap may be less aligned than the allocation they’re a part of.
|
# ? Oct 5, 2019 01:54 |
|
gcc appears to believe that a store to an aligned int on x64 is inherently atomic. It also is smart enough to believe this about a .store() on a std::atomic<int>, so while it is probably safe to use bare uint32_t in your code, you get no performance improvement.code:
code:
|
# ? Oct 5, 2019 04:30 |
|
OK, I'll do it the proper way and use atomic operations. One more question though, I'm using OpenMP for the parallel aspect of the code. Does anyone know if there's practically any difference between using "#pragma omp atomic" vs "std::atomic" ? I guess the OpenMP version seems a lot more convenient to implement, I haven't fully grokked the std library way of doing it. I found this comparison of different variations and there's like 10 ways to use std::atomic? https://www.arangodb.com/2015/02/comparing-atomic-mutex-rwlocks/ I'm compiling on clang-9 if it matters.
|
# ? Oct 5, 2019 04:31 |
|
I think I've brought this up many times already but I still have no solution. How do I debug crashes that people are getting on their machines? All I have is a log file with a callstack that is all just addresses of functions. I've tried shipping the pdb file, but for some reason that doesn't work on users' machines, and I still just get function addresses in the callstack and no function names. I've tried generating crash "minidump" files, but none that I have ever received has ever been useful or even valid. There has to be a way to simply take the address of a function and just figure out which function that is supposed to be, right??
|
# ? Oct 7, 2019 17:13 |
|
baby puzzle posted:I think I've brought this up many times already but I still have no solution. How do I debug crashes that people are getting on their machines? All I have is a log file with a callstack that is all just addresses of functions. I've tried shipping the pdb file, but for some reason that doesn't work on users' machines, and I still just get function addresses in the callstack and no function names. I've tried generating crash "minidump" files, but none that I have ever received has ever been useful or even valid. ASLR might gently caress you there...
|
# ? Oct 7, 2019 18:38 |
|
The #pragma omp atomic is not just an atomic move, it also puts in the correct fences such that the other threads don't work with stale data. In your case, you might not even need the atomic. - Simple stores are indeed atomic in x86. I can double check, but I remember that split cache line writes are handled correctly in the backend, you do pay for it in performance terms though. Definitely do an aligned malloc. - Worst case it will recalculate the index if a thread is working on a stale read value. If this worries you, definitely use std::atomic with a memory_order_release or just the #pragma omp atomic. These atomics are not just an atomic update, but also insert the correct memory fences to make sure that no thread works on stale values. peepsalot posted:
Consider changing how and where the offsets are stored and updated here. The abuse of a mutable member to update values of a const instance is real a coding horror, especially in the context of multithreading where you see someone const as 'safe, read only'. Can you initialize all offsets once, in a separate #omp parallel for loop? I guarantee it will be faster (avoids cache trashing etc), unless your offsets[] is super sparse in practice.
|
# ? Oct 8, 2019 11:43 |
|
Dren posted:i know about address sanitizer and just found out about undefined behavior sanitizer, also about sanitize stack protector and stack canaries. Anyone got any other things for finding a corruption in a single threaded program? (it’s not actually single threaded but there’s only two threads and i trust the i/o library much more than the other code) Looks like you got it, but in the future you might be interested in https://rr-project.org/ too.
|
# ? Oct 8, 2019 11:56 |
|
Beef posted:I remember that split cache line writes are handled correctly in the backend Your later comment about using aligned alloc is probably right though. Star War Sex Parrot fucked around with this message at 12:56 on Oct 8, 2019 |
# ? Oct 8, 2019 12:53 |
|
Nasty. Was that for x86?
|
# ? Oct 8, 2019 13:48 |
|
Star War Sex Parrot posted:It depends on how you got the pointer. You can cast it to a std::atomic all you want but if it straddles a cache line you’re hosed. We had a nasty bug to track down about a year ago for that exact reason: torn read of an atomic pointer that in pretty rare scenarios would straddle a cache line. Basically we had buffers coming out of an allocator that in turn were divvied up and cast to other types, and someone goofed and didn’t think about alignment requirements for the atomic types. What a nightmare.
|
# ? Oct 8, 2019 13:59 |
|
Beef posted:Nasty. Was that for x86? Subjunctive posted:What a nightmare. Star War Sex Parrot fucked around with this message at 15:52 on Oct 8, 2019 |
# ? Oct 8, 2019 14:48 |
|
I don't think omp atomic inserts any barriers; gcc doesn't compile it that way. If you want fences, you need to tack seq_cst onto the end of the pragma (reverse of std::atomic where it defaults to fencing)
|
# ? Oct 8, 2019 20:58 |
|
Subjunctive posted:Looks like you got it, but in the future you might be interested in https://rr-project.org/ too. Thanks that looks amazing. I’ve run across a paid tool that does the same thing, didn’t know there is an open source one. My primary dev environment is in virtualbox but we could switch to vmware for something like this.
|
# ? Oct 9, 2019 03:25 |
|
Beef posted:The abuse of a mutable member to update values of a const instance is real a coding horror... Anyways, I briefly implemented it with omp atomic, and it worked fine with no noticeable performance impact. I also tried doing the calculations all beforehand, in parallel, like you said and it wasn't noticeably faster. Maybe marginally slower, but the measurement differences were in the noise. After more profiling, the whole "expensive calculation" I was trying to memoize ended up being negligible enough that I just reverted to calculating it on the fly, so I ditched the whole vector cache deal entirely; no more mutable. But its still been an interesting topic to learn more about.
|
# ? Oct 9, 2019 18:55 |
|
C++ has many features it probably shouldn't, or that are almost never justified in practice. The mutable keyword on data members is one of them.
|
# ? Oct 9, 2019 19:09 |
|
Without mutable you would not be able to have a const member function that locks a mutex. Obviously it could easily be misused for awful things, but I'm not sure that I ever actually have seen that happen. Updating a cached value is only an awful thing if you have an incorrect mental model for what const means in C++ (notable it does not mean immutable or pure).
|
# ? Oct 9, 2019 20:24 |
|
Plorkyeran posted:Without mutable you would not be able to have a const member function that locks a mutex. Obviously it could easily be misused for awful things, but I'm not sure that I ever actually have seen that happen. he says, immediately following a series of posts discussing its UBful use on an unguarded std::vector.
|
# ? Oct 9, 2019 20:48 |
|
It's not like the code would have been fine and dandy if the function hadn't been marked const. The use of mutable isn't the problem with that code.
|
# ? Oct 10, 2019 00:22 |
|
Plorkyeran posted:... incorrect mental model for what const means in C++ (notable it does not mean immutable or pure). No, but the point is that it is very often used as a proxy, with C/C++ lacking those proper immutable semantics. Tangentially, the fact that const can be cast away at any point in C/C++ is one of those things you pay for, even if you are not using it. It frustrated Walter Bright enough to add an immutable modifier to D. edit: http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#con-constants-and-immutability quote:You can’t have a race condition on a constant. Beef fucked around with this message at 16:59 on Oct 10, 2019 |
# ? Oct 10, 2019 16:49 |
|
So now that I know how not to do it, What’s the best way to get crash reports from users? I’m sure there’s a library or something that just works? I’m on Windows.
|
# ? Oct 10, 2019 23:06 |
|
Beef posted:No, but the point is that it is very often used as a proxy, with C/C++ lacking those proper immutable semantics.
|
# ? Oct 11, 2019 01:16 |
|
baby puzzle posted:So now that I know how not to do it, What’s the best way to get crash reports from users? I’m sure there’s a library or something that just works? I’m on Windows. I’ve had good success with breakpad.
|
# ? Oct 11, 2019 02:50 |
|
roomforthetuna posted:constexpr! That has its own problems. Which is why we'll get things like "constinit" and "consteval" in C++20. https://www.youtube.com/watch?v=Xb6u8BrfHjw&t=2210s
|
# ? Oct 11, 2019 06:00 |
|
I like how consteval does the thing you'd imagine constexpr doing when put on a function.
|
# ? Oct 11, 2019 06:23 |
|
C++11 constexpr was an intentionally crippled prototype and we've been fixing it ever since... It still sucks.
|
# ? Oct 11, 2019 07:18 |
|
Xarn posted:C++11 constexpr was an intentionally crippled prototype and we've been fixing it ever since... Basically all of C++ is like that. - exception specifications - auto_ptr - volatile - Unicode literals - iterators (they were a really good idea, but iterator+sentinel is a better one) - ...
|
# ? Oct 11, 2019 20:26 |
|
Someone needs to bite the bullet and make C+++ that drops backwards compatibility for a bunch of things. it will never get used though
|
# ? Oct 11, 2019 21:34 |
|
I got a cmake / toolchain question. I have a project that uses cmake and I want to have it build with all clang/llvm tools. I found that I have to initialize cmake with: CXX="clang++" LDFLAGS="-fuse-ld=lld" cmake . I also found these instructions to verify the correct linker is used: https://lld.llvm.org/#using-lld posted:If you are in doubt whether you are successfully using LLD or not, run readelf --string-dump .comment <output-file> and examine the output. And when I run that on my binary, I see lld and clang being used, but also GCC? quote:String dump of section '.comment': Or is there possibly some other part of the toolchain I'm not setting correctly to make it 100% clang/llvm?
|
# ? Oct 11, 2019 21:54 |
|
|
# ? Jun 7, 2024 12:38 |
|
peepsalot posted:I got a cmake / toolchain question. I have a project that uses cmake and I want to have it build with all clang/llvm tools. At a guess, cmake is probably invoking gcc to drive the linker by default, and you haven't overridden that; you're just telling gcc to in turn delegate to ldd. Which should work just fine, but you could configure cmake to use clang for that too if you prefer.
|
# ? Oct 11, 2019 22:16 |