C/C++ Programming Questions Not Worth Their Own Thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Adhemar posted:

This is routinely done in the HFT space, but it’s possible because the exchange protocol spec makes strong guarantees about how the wire data is laid out. It’s just a matter of writing/generating your structs to match.

C++ does *not* make sufficiently strong guarantees about how types are laid out.

# ? Jul 14, 2019 04:06

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 23:22

Jeffrey of YOSPOS: Dec 22, 2005; GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

Ralith posted:

C++ does *not* make sufficiently strong guarantees about how types are laid out.

Right, but it's still what you do if you don't want to copy the data. You do have to actually know what your compiler does. It's not like attribute((packed)) doesn't exist, spec or no spec.

# ? Jul 14, 2019 04:46

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Jeffrey of YOSPOS posted:

Right, but it's still what you do if you don't want to copy the data. You do have to actually know what your compiler does. It's not like attribute((packed)) doesn't exist, spec or no spec.

It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing.

# ? Jul 14, 2019 05:35

Qwertycoatl: Dec 31, 2008

Ralith posted:

It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing.

I don't know what HFT codebases look like, but in my field (embedded) -fno-strict-aliasing is routine

# ? Jul 14, 2019 06:06

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Qwertycoatl posted:

I don't know what HFT codebases look like, but in my field (embedded) -fno-strict-aliasing is routine

There's a cost to that in reduced room for the optimizer to maneuver, and it's a shame to rely on nonstandard behavior when the problem can be solved within the bounds of the spec.

# ? Jul 14, 2019 07:35

Adhemar: Jan 21, 2004; Kellner, da ist ein scheussliches Biest in meiner Suppe.

I overlaid structs for years using reinterpret_cast when I worked in HFT, never had issues. I remember having to use #pragma pack. Things are simpler when you have 100% control over your compiler, OS, and hardware. You�re not writing general purpose code that has to run on any permutation of those. Speed is the only thing that matters.

# ? Jul 14, 2019 07:40

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on

# ? Jul 14, 2019 07:59

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Ralith posted:

It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing.

Jabor posted:

speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on

Flyweight seems great if you�re in C++, but pretty cumbersome if you are working in C without member functions, and where I�m not sure the compiler can/will boil away the extra indirection. Have you seen a version of it work for free in plain-if-modern C?

# ? Jul 14, 2019 14:46

roomforthetuna: Mar 22, 2005; I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Ralith posted:

If you want safe binary deserialization without copying your entire object, you can use the flyweight pattern, i.e. an object that wraps a pointer to your buffer and has accessors to pull out individual fields. That's how most encoding scheme tooling like flatbuffers or capnp work. Downside is lots of boilerplate if you're not using tools to generate it for you, upside is it's defined behavior and you can lay things out on the wire differently than a C++ struct.

Do you have a reference for what you mean by flyweight? I found this which seems to be talking about something quite different involving caches and such. I think you've probably explained what you meant sufficiently inline in the sentence, I'm more just questioning whether the term "flyweight" actually refers to what you described.

Over here is a discussion that said most of what has been said in this thread, where even the commenters mostly seem to agree that using memcpy is the least disastrous option. :/

# ? Jul 14, 2019 15:31

Adhemar: Jan 21, 2004; Kellner, da ist ein scheussliches Biest in meiner Suppe.

Jabor posted:

speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on

You�re conflating posters, I never said anything about turning off optimizations.

Anyway, I was only talking about a very specific use case. In general I wouldn�t recommend doing this as the performance gain is rarely worth the complexity.

# ? Jul 14, 2019 17:36

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Subjunctive posted:

Flyweight seems great if you’re in C++, but pretty cumbersome if you are working in C without member functions, and where I’m not sure the compiler can/will boil away the extra indirection. Have you seen a version of it work for free in plain-if-modern C?

It should still optimize just as well (you're generally running through very similar optimization passes) but you'd have a bunch of inline functions like "int64_t read_foo_bar(foo_buf foo)". Definitely more verbose, and you might want to be extra sure that a naive memcpy of the whole thing isn't sufficient, but I'd be surprised if it didn't work.

roomforthetuna posted:

Do you have a reference for what you mean by flyweight? I found this which seems to be talking about something quite different involving caches and such. I think you've probably explained what you meant sufficiently inline in the sentence, I'm more just questioning whether the term "flyweight" actually refers to what you described.

It might be an unconventional use of the term, but I think it's consistent with the definition given on Wikipedia. In particular, you're sharing the underlying buffer between arbitrarily many flyweight objects in the case of e.g. nested structures. If anyone can think of a clearer term, I'm all ears.

Adhemar posted:

You’re conflating posters, I never said anything about turning off optimizations.

Violating strict aliasing safely necessarily requires disabling the optimizations that rely on it.

# ? Jul 14, 2019 22:03

roomforthetuna: Mar 22, 2005; I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Ralith posted:

It might be an unconventional use of the term, but I think it's consistent with the definition given on Wikipedia. In particular, you're sharing the underlying buffer between arbitrarily many flyweight objects in the case of e.g. nested structures. If anyone can think of a clearer term, I'm all ears.

Maybe we're thinking of different contexts. One of the things on the Wikipedia article about the flyweight design pattern is it involves immutable objects, and is used to avoid repetition, neither of which relate to what I thought the context here was. The two contexts I'm thinking about are stuff sent over the wire in a binary encoding, and embedded stuff where you have some kind of piece of memory that you can access directly that contains hardware-related values encoded in some kind of structure. The underlying data here in either case is broadly not immutable (Though I suppose you could consider a received data packet to be immutable for the lifespan of its accessor object).

The clearer term for me would have been if you had just not said "flyweight pattern" and just said the sentence you already said that described what you meant that you said anyway.

I couldn't find any term for the "accessors into buffer" pattern though. I did find one person saying "I'm not a comp-sci person but I think cap'n proto is flyweight", while searching for what the cap'n proto guy himself would call this pattern.

Edit: Thinking about it a bit more, I think this is kind of the converse of the flyweight pattern as described by wikipedia. That's about having many small objects referring to a set of static-ish immutable larger objects (eg. thousands of 7-bit ascii characters being used to refer to <128 typescript glyphs), whereas this accessor thing is more like a small number of static immutable accessor objects (or sets of functions) that can each be used to operate upon any number of buffers. Anyway, it doesn't seem to be something that has a name.

roomforthetuna fucked around with this message at 22:50 on Jul 14, 2019

# ? Jul 14, 2019 22:38

Adhemar: Jan 21, 2004; Kellner, da ist ein scheussliches Biest in meiner Suppe.

Ralith posted:

Violating strict aliasing safely necessarily requires disabling the optimizations that rely on it.

I never violated it because I�m only aliasing between char* (the socket buffer) and integers of various sizes (the message fields). To my knowledge this is allowed. At least it was never an issue using the compilers I used at the time.
:shrug:

# ? Jul 14, 2019 22:48

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

It's nice to have a shorthand, because it's a widely useful pattern (and indeed applies to encoding messages as well as decoding them). We're trying to avoid repetition in the sense of not making copies of the data in the buffer, but instead exposing zero cost wrappers that kinda act like a copy. But it's definitely a bit of a stretch, I agree. Maybe a dedicated term needs to be coined.

roomforthetuna posted:

Edit: Thinking about it a bit more, I think this is kind of the converse of the flyweight pattern as described by wikipedia. That's about having many small objects referring to a set of static-ish immutable larger objects (eg. thousands of 7-bit ascii characters being used to refer to <128 typescript glyphs), whereas this accessor thing is more like a small number of static immutable accessor objects (or sets of functions) that can each be used to operate upon any number of buffers. Anyway, it doesn't seem to be something that has a name.

To clarify, I'm talking about objects containing a single pointer data member and with many accessor methods. You could have large numbers of these referencing the same buffer, since they're trivially copyable and might refer to different parts of the buffer.

Ralith fucked around with this message at 22:54 on Jul 14, 2019

# ? Jul 14, 2019 22:48

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Adhemar posted:

I never violated it because I’m only aliasing between char* (the socket buffer) and integers of various sizes (the message fields). To my knowledge this is allowed. At least it was never an issue using the compilers I used at the time.

It's (in certain cases) okay to cast a valid T* to a char*, but not vis versa. UB is often difficult to detect; that's why it's so insidious.

# ? Jul 14, 2019 22:50

Adhemar: Jan 21, 2004; Kellner, da ist ein scheussliches Biest in meiner Suppe.

Ralith posted:

It's (in certain cases) okay to cast a valid T* to a char*, but not vis versa. UB is often difficult to detect; that's why it's so insidious.

How is code using the Flyweight pattern implemented then? Seems you would have the same problem there.

# ? Jul 15, 2019 02:58

Spatial: Nov 15, 2007

hmm, sounds like the C standard should be updated to solve this extremely serious flaw in the langugage design. after all it's a tool that is supposed to make machines easier to program, not harder, so not supporting memory overlays is a pretty big problem.

# ? Jul 15, 2019 03:29

Spatial: Nov 15, 2007

my god. considering the ubiquity of embedded devices it would be shocking if the primary language used to program them was so grossly deficient

# ? Jul 15, 2019 03:32

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Adhemar posted:

How is code using the Flyweight pattern implemented then? Seems you would have the same problem there.

To access structured data, you return another flyweight object. To access indivisible fields, you memcpy, and if your compiler was released this decade and alignment concerns are accounted for it gets trivially optimized out.

# ? Jul 15, 2019 05:55

roomforthetuna: Mar 22, 2005; I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

So if you're doing the not-quite-flyweight thing of having a buffer and a bunch of accessors and using memcpy in the accessors to avoid problems with strict aliasing, is there a recommended way to make sure the buffer is aligned to allow for maximum optimization? Will things just naturally fall on well aligned boundaries if you don't actively go out of your way to ruin it? I mean assuming your data structure itself is designed with padding so values are internally aligned.

For example, is there a difference in layout between

code:

char nonsense[2];
char buffer[1024];

and

code:

char nonsense[2];
union {
  int64 aligner;
  char buffer[1024];
};

And will heap allocations like vectors always be aligned?

roomforthetuna fucked around with this message at 15:17 on Jul 16, 2019

# ? Jul 16, 2019 15:12

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

roomforthetuna posted:

So if you're doing the not-quite-flyweight thing of having a buffer and a bunch of accessors and using memcpy in the accessors to avoid problems with strict aliasing, is there a recommended way to make sure the buffer is aligned to allow for maximum optimization? Will things just naturally fall on well aligned boundaries if you don't actively go out of your way to ruin it? I mean assuming your data structure itself is designed with padding so values are internally aligned.

For example, is there a difference in layout between
code:
char nonsense[2];
char buffer[1024];
and
code:
char nonsense[2];
union {
  int64 aligner;
  char buffer[1024];
};
And will heap allocations like vectors always be aligned?

Aligning the buffer itself is easy; C++ has the alignas specifier since C++11. char arrays in larger structures can easily be unaligned without that, because if you were using them just as char arrays there'd be no benefit.

There may be cases where you additionally want to specify the alignment of the pointer into the buffer to the compiler, e.g. perhaps by saying "using aligned alignas(8) = char;" then using "aligned*" where appropriate. This doesn't seem to have any effect on x86 where unaligned loads Just Work, though.

IIRC operator new always returns memory aligned to 8 bytes or better on most platforms, but to be sure you can use the C++17 version which has an alignment argument.

# ? Jul 16, 2019 15:59

Xarn: Jun 26, 2015

std::vector<bool> was a mistake.

# ? Jul 17, 2019 10:26

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Ralith posted:

This doesn't seem to have any effect on x86 where unaligned loads Just Work, though.

That�s not universally true, is it? I thought that some of the SIMD instructions would fault on unaligned addresses.

# ? Jul 17, 2019 12:22

Absurd Alhazred: Mar 27, 2010; by Athanatos

Subjunctive posted:

That�s not universally true, is it? I thought that some of the SIMD instructions would fault on unaligned addresses.

Yeah, for example, the XMM register loads/stores/gathers require 16-byte (4 32 bit float) alignment.

What's nice is that libraries like GLM will choose the SSE-based implementation of matrix operations for properly aligned types.

# ? Jul 17, 2019 13:08

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Misalignment will also generally have some performance impact, although recent microarchitectures are a lot better at hiding this.

# ? Jul 17, 2019 18:49

Jeffrey of YOSPOS: Dec 22, 2005; GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

rjmccall posted:

Misalignment will also generally have some performance impact, although recent microarchitectures are a lot better at hiding this.

Yeah I gotta imagine spanning a cache line still has performance issues even if other unaligned accesses are unnoticeable.

# ? Jul 17, 2019 18:51

pseudorandom name: May 6, 2007

I think the current recommendation with SSE/AVX is to always use unaligned loads, but I don�t remember why I believe this.

# ? Jul 17, 2019 19:00

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.

# ? Jul 17, 2019 19:11

roomforthetuna: Mar 22, 2005; I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

rjmccall posted:

If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.

I always appreciate your knowledge (and easy to follow explanations) of the very low level stuff. Well, very low level from my middleware-person perspective, I'm sure there are lower levels!

# ? Jul 17, 2019 20:03

Absurd Alhazred: Mar 27, 2010; by Athanatos

rjmccall posted:

If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.

That is making me very angry.

# ? Jul 17, 2019 23:42

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Well, if you�re going to trap on alignment violations, you have to check those bits, and if you check those bits and you have the circuitry to make the operation work when they�re nonzero then :shrug:

I guess you could have an operation which just ignores those bits completely, like ARM64 TBI but low bits, but hoo boy does that sound like nothing but a source of crazy bugs. It�s not like it would be faster.

# ? Jul 17, 2019 23:52

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Oh, you also generallu get weaker memory-ordering / atomicity guarantees with unaligned accesses, of course.

# ? Jul 17, 2019 23:53

Ralith: Jan 12, 2011; I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Jeffrey of YOSPOS posted:

Yeah I gotta imagine spanning a cache line still has performance issues even if other unaligned accesses are unnoticeable.

The point is that the same code is generated regardless of whether you promise at compile time that the pointer will always be aligned. The dynamic alignment still matters, of course.

# ? Jul 18, 2019 02:56

big black turnout: Jan 13, 2009; Fallen Rib

I have a lovely library that returns raw pointers managed by the library through reference counting methods AddRef and RemRef. I want to wrap them in a smart pointer, preferably std::shared_ptr, and call AddRef/RemRef when copies are made/destructed. The latter seems easy enough with a custom deleter, but I'm struggling with making the right call to AddRef when it's copied.

# ? Jul 20, 2019 00:30

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

boost::intrusive_ptr or something like it is what you actually want. It's also like 10 lines of code to implement yourself.

# ? Jul 20, 2019 01:38

big black turnout: Jan 13, 2009; Fallen Rib

Yes it is. Thanks!

# ? Jul 20, 2019 02:02

Absurd Alhazred: Mar 27, 2010; by Athanatos

I will not have people besmirching the good name of COM!

# ? Jul 20, 2019 04:33

Slurps Mad Rips: Jan 25, 2009; Bwaltow!

Plorkyeran posted:

boost::intrusive_ptr or something like it is what you actually want. It's also like 10 lines of code to implement yourself.

I might suggest looking into my proposed (and currently on track for C++23) retain_ptr instead, since you don't have to worry about the "retain by default" semantic and having to pass "false" everywhere you want to adopt a pointer :v:

https://github.com/slurps-mad-rips/retain-ptr/

There's a bug with the builtin mixin type, but just specialize a retain_traits<YourType> and you're good to go

# ? Jul 20, 2019 06:29

Lime: Jul 20, 2004

You could use shared_ptr in this situation though, right? Custom deleters aren't called until the last shared_ptr destructs, so it doesn't matter that the shared_ptrs can't call AddRef when copied, just so long as the ref count is 1 when it first goes into the shared_ptr. You'd basically be nesting shared_ptr's ref count under the lovely library's ref count, which would just stay 1 the whole time.

# ? Jul 20, 2019 12:37

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 23:22

Slurps Mad Rips: Jan 25, 2009; Bwaltow!

Lime posted:

You could use shared_ptr in this situation though, right? Custom deleters aren't called until the last shared_ptr destructs, so it doesn't matter that the shared_ptrs can't call AddRef when copied, just so long as the ref count is 1 when it first goes into the shared_ptr. You'd basically be nesting shared_ptr's ref count under the lovely library's ref count, which would just stay 1 the whole time.

Bit of a waste, since shared_ptr allocates a control block. Now you've got two ref counts, *and* you're wasting a lot of extra space for just one pointer. 🤷‍♀️

# ? Jul 20, 2019 15:27

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »