Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Adhemar posted:

This is routinely done in the HFT space, but it’s possible because the exchange protocol spec makes strong guarantees about how the wire data is laid out. It’s just a matter of writing/generating your structs to match.

C++ does *not* make sufficiently strong guarantees about how types are laid out.

Adbot
ADBOT LOVES YOU

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

Ralith posted:

C++ does *not* make sufficiently strong guarantees about how types are laid out.
Right, but it's still what you do if you don't want to copy the data. You do have to actually know what your compiler does. It's not like attribute((packed)) doesn't exist, spec or no spec.

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Jeffrey of YOSPOS posted:

Right, but it's still what you do if you don't want to copy the data. You do have to actually know what your compiler does. It's not like attribute((packed)) doesn't exist, spec or no spec.
It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing.

Qwertycoatl
Dec 31, 2008

Ralith posted:

It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing.

I don't know what HFT codebases look like, but in my field (embedded) -fno-strict-aliasing is routine

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Qwertycoatl posted:

I don't know what HFT codebases look like, but in my field (embedded) -fno-strict-aliasing is routine
There's a cost to that in reduced room for the optimizer to maneuver, and it's a shame to rely on nonstandard behavior when the problem can be solved within the bounds of the spec.

Adhemar
Jan 21, 2004

Kellner, da ist ein scheussliches Biest in meiner Suppe.
I overlaid structs for years using reinterpret_cast when I worked in HFT, never had issues. I remember having to use #pragma pack. Things are simpler when you have 100% control over your compiler, OS, and hardware. You’re not writing general purpose code that has to run on any permutation of those. Speed is the only thing that matters.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Ralith posted:

It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing.

Jabor posted:

speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on

Flyweight seems great if you’re in C++, but pretty cumbersome if you are working in C without member functions, and where I’m not sure the compiler can/will boil away the extra indirection. Have you seen a version of it work for free in plain-if-modern C?

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Ralith posted:

If you want safe binary deserialization without copying your entire object, you can use the flyweight pattern, i.e. an object that wraps a pointer to your buffer and has accessors to pull out individual fields. That's how most encoding scheme tooling like flatbuffers or capnp work. Downside is lots of boilerplate if you're not using tools to generate it for you, upside is it's defined behavior and you can lay things out on the wire differently than a C++ struct.
Do you have a reference for what you mean by flyweight? I found this which seems to be talking about something quite different involving caches and such. I think you've probably explained what you meant sufficiently inline in the sentence, I'm more just questioning whether the term "flyweight" actually refers to what you described.

Over here is a discussion that said most of what has been said in this thread, where even the commenters mostly seem to agree that using memcpy is the least disastrous option. :/

Adhemar
Jan 21, 2004

Kellner, da ist ein scheussliches Biest in meiner Suppe.

Jabor posted:

speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on

You’re conflating posters, I never said anything about turning off optimizations.

Anyway, I was only talking about a very specific use case. In general I wouldn’t recommend doing this as the performance gain is rarely worth the complexity.

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Subjunctive posted:

Flyweight seems great if you’re in C++, but pretty cumbersome if you are working in C without member functions, and where I’m not sure the compiler can/will boil away the extra indirection. Have you seen a version of it work for free in plain-if-modern C?
It should still optimize just as well (you're generally running through very similar optimization passes) but you'd have a bunch of inline functions like "int64_t read_foo_bar(foo_buf foo)". Definitely more verbose, and you might want to be extra sure that a naive memcpy of the whole thing isn't sufficient, but I'd be surprised if it didn't work.


roomforthetuna posted:

Do you have a reference for what you mean by flyweight? I found this which seems to be talking about something quite different involving caches and such. I think you've probably explained what you meant sufficiently inline in the sentence, I'm more just questioning whether the term "flyweight" actually refers to what you described.
It might be an unconventional use of the term, but I think it's consistent with the definition given on Wikipedia. In particular, you're sharing the underlying buffer between arbitrarily many flyweight objects in the case of e.g. nested structures. If anyone can think of a clearer term, I'm all ears.


Adhemar posted:

You’re conflating posters, I never said anything about turning off optimizations.
Violating strict aliasing safely necessarily requires disabling the optimizations that rely on it.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Ralith posted:

It might be an unconventional use of the term, but I think it's consistent with the definition given on Wikipedia. In particular, you're sharing the underlying buffer between arbitrarily many flyweight objects in the case of e.g. nested structures. If anyone can think of a clearer term, I'm all ears.
Maybe we're thinking of different contexts. One of the things on the Wikipedia article about the flyweight design pattern is it involves immutable objects, and is used to avoid repetition, neither of which relate to what I thought the context here was. The two contexts I'm thinking about are stuff sent over the wire in a binary encoding, and embedded stuff where you have some kind of piece of memory that you can access directly that contains hardware-related values encoded in some kind of structure. The underlying data here in either case is broadly not immutable (Though I suppose you could consider a received data packet to be immutable for the lifespan of its accessor object).

The clearer term for me would have been if you had just not said "flyweight pattern" and just said the sentence you already said that described what you meant that you said anyway. :)

I couldn't find any term for the "accessors into buffer" pattern though. I did find one person saying "I'm not a comp-sci person but I think cap'n proto is flyweight", while searching for what the cap'n proto guy himself would call this pattern. :D

Edit: Thinking about it a bit more, I think this is kind of the converse of the flyweight pattern as described by wikipedia. That's about having many small objects referring to a set of static-ish immutable larger objects (eg. thousands of 7-bit ascii characters being used to refer to <128 typescript glyphs), whereas this accessor thing is more like a small number of static immutable accessor objects (or sets of functions) that can each be used to operate upon any number of buffers. Anyway, it doesn't seem to be something that has a name.

roomforthetuna fucked around with this message at 22:50 on Jul 14, 2019

Adhemar
Jan 21, 2004

Kellner, da ist ein scheussliches Biest in meiner Suppe.

Ralith posted:

Violating strict aliasing safely necessarily requires disabling the optimizations that rely on it.

I never violated it because I’m only aliasing between char* (the socket buffer) and integers of various sizes (the message fields). To my knowledge this is allowed. At least it was never an issue using the compilers I used at the time.
:shrug:

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today
It's nice to have a shorthand, because it's a widely useful pattern (and indeed applies to encoding messages as well as decoding them). We're trying to avoid repetition in the sense of not making copies of the data in the buffer, but instead exposing zero cost wrappers that kinda act like a copy. But it's definitely a bit of a stretch, I agree. Maybe a dedicated term needs to be coined.

roomforthetuna posted:


Edit: Thinking about it a bit more, I think this is kind of the converse of the flyweight pattern as described by wikipedia. That's about having many small objects referring to a set of static-ish immutable larger objects (eg. thousands of 7-bit ascii characters being used to refer to <128 typescript glyphs), whereas this accessor thing is more like a small number of static immutable accessor objects (or sets of functions) that can each be used to operate upon any number of buffers. Anyway, it doesn't seem to be something that has a name.

To clarify, I'm talking about objects containing a single pointer data member and with many accessor methods. You could have large numbers of these referencing the same buffer, since they're trivially copyable and might refer to different parts of the buffer.

Ralith fucked around with this message at 22:54 on Jul 14, 2019

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Adhemar posted:

I never violated it because I’m only aliasing between char* (the socket buffer) and integers of various sizes (the message fields). To my knowledge this is allowed. At least it was never an issue using the compilers I used at the time.
:shrug:

It's (in certain cases) okay to cast a valid T* to a char*, but not vis versa. UB is often difficult to detect; that's why it's so insidious.

Adhemar
Jan 21, 2004

Kellner, da ist ein scheussliches Biest in meiner Suppe.

Ralith posted:

It's (in certain cases) okay to cast a valid T* to a char*, but not vis versa. UB is often difficult to detect; that's why it's so insidious.

How is code using the Flyweight pattern implemented then? Seems you would have the same problem there.

Spatial
Nov 15, 2007

hmm, sounds like the C standard should be updated to solve this extremely serious flaw in the langugage design. after all it's a tool that is supposed to make machines easier to program, not harder, so not supporting memory overlays is a pretty big problem.

Spatial
Nov 15, 2007

my god. considering the ubiquity of embedded devices it would be shocking if the primary language used to program them was so grossly deficient

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Adhemar posted:

How is code using the Flyweight pattern implemented then? Seems you would have the same problem there.
To access structured data, you return another flyweight object. To access indivisible fields, you memcpy, and if your compiler was released this decade and alignment concerns are accounted for it gets trivially optimized out.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!
So if you're doing the not-quite-flyweight thing of having a buffer and a bunch of accessors and using memcpy in the accessors to avoid problems with strict aliasing, is there a recommended way to make sure the buffer is aligned to allow for maximum optimization? Will things just naturally fall on well aligned boundaries if you don't actively go out of your way to ruin it? I mean assuming your data structure itself is designed with padding so values are internally aligned.

For example, is there a difference in layout between
code:
char nonsense[2];
char buffer[1024];
and
code:
char nonsense[2];
union {
  int64 aligner;
  char buffer[1024];
};
And will heap allocations like vectors always be aligned?

roomforthetuna fucked around with this message at 15:17 on Jul 16, 2019

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

roomforthetuna posted:

So if you're doing the not-quite-flyweight thing of having a buffer and a bunch of accessors and using memcpy in the accessors to avoid problems with strict aliasing, is there a recommended way to make sure the buffer is aligned to allow for maximum optimization? Will things just naturally fall on well aligned boundaries if you don't actively go out of your way to ruin it? I mean assuming your data structure itself is designed with padding so values are internally aligned.

For example, is there a difference in layout between
code:
char nonsense[2];
char buffer[1024];
and
code:
char nonsense[2];
union {
  int64 aligner;
  char buffer[1024];
};
And will heap allocations like vectors always be aligned?

Aligning the buffer itself is easy; C++ has the alignas specifier since C++11. char arrays in larger structures can easily be unaligned without that, because if you were using them just as char arrays there'd be no benefit.

There may be cases where you additionally want to specify the alignment of the pointer into the buffer to the compiler, e.g. perhaps by saying "using aligned alignas(8) = char;" then using "aligned*" where appropriate. This doesn't seem to have any effect on x86 where unaligned loads Just Work, though.

IIRC operator new always returns memory aligned to 8 bytes or better on most platforms, but to be sure you can use the C++17 version which has an alignment argument.

Xarn
Jun 26, 2015
std::vector<bool> was a mistake.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Ralith posted:

This doesn't seem to have any effect on x86 where unaligned loads Just Work, though.

That’s not universally true, is it? I thought that some of the SIMD instructions would fault on unaligned addresses.

Absurd Alhazred
Mar 27, 2010

by Athanatos

Subjunctive posted:

That’s not universally true, is it? I thought that some of the SIMD instructions would fault on unaligned addresses.

Yeah, for example, the XMM register loads/stores/gathers require 16-byte (4 32 bit float) alignment.

What's nice is that libraries like GLM will choose the SSE-based implementation of matrix operations for properly aligned types.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Misalignment will also generally have some performance impact, although recent microarchitectures are a lot better at hiding this.

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

rjmccall posted:

Misalignment will also generally have some performance impact, although recent microarchitectures are a lot better at hiding this.
Yeah I gotta imagine spanning a cache line still has performance issues even if other unaligned accesses are unnoticeable.

pseudorandom name
May 6, 2007

I think the current recommendation with SSE/AVX is to always use unaligned loads, but I don’t remember why I believe this.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

rjmccall posted:

If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.
I always appreciate your knowledge (and easy to follow explanations) of the very low level stuff. Well, very low level from my middleware-person perspective, I'm sure there are lower levels!

Absurd Alhazred
Mar 27, 2010

by Athanatos

rjmccall posted:

If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.

That is making me very angry.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Well, if you’re going to trap on alignment violations, you have to check those bits, and if you check those bits and you have the circuitry to make the operation work when they’re nonzero then :shrug:

I guess you could have an operation which just ignores those bits completely, like ARM64 TBI but low bits, but hoo boy does that sound like nothing but a source of crazy bugs. It’s not like it would be faster.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Oh, you also generallu get weaker memory-ordering / atomicity guarantees with unaligned accesses, of course.

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

Jeffrey of YOSPOS posted:

Yeah I gotta imagine spanning a cache line still has performance issues even if other unaligned accesses are unnoticeable.

The point is that the same code is generated regardless of whether you promise at compile time that the pointer will always be aligned. The dynamic alignment still matters, of course.

big black turnout
Jan 13, 2009



Fallen Rib
I have a lovely library that returns raw pointers managed by the library through reference counting methods AddRef and RemRef. I want to wrap them in a smart pointer, preferably std::shared_ptr, and call AddRef/RemRef when copies are made/destructed. The latter seems easy enough with a custom deleter, but I'm struggling with making the right call to AddRef when it's copied.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
boost::intrusive_ptr or something like it is what you actually want. It's also like 10 lines of code to implement yourself.

big black turnout
Jan 13, 2009



Fallen Rib
Yes it is. Thanks!

Absurd Alhazred
Mar 27, 2010

by Athanatos
I will not have people besmirching the good name of COM!

Slurps Mad Rips
Jan 25, 2009

Bwaltow!

Plorkyeran posted:

boost::intrusive_ptr or something like it is what you actually want. It's also like 10 lines of code to implement yourself.

I might suggest looking into my proposed (and currently on track for C++23) retain_ptr instead, since you don't have to worry about the "retain by default" semantic and having to pass "false" everywhere you want to adopt a pointer :v:

https://github.com/slurps-mad-rips/retain-ptr/

There's a bug with the builtin mixin type, but just specialize a retain_traits<YourType> and you're good to go

Lime
Jul 20, 2004

You could use shared_ptr in this situation though, right? Custom deleters aren't called until the last shared_ptr destructs, so it doesn't matter that the shared_ptrs can't call AddRef when copied, just so long as the ref count is 1 when it first goes into the shared_ptr. You'd basically be nesting shared_ptr's ref count under the lovely library's ref count, which would just stay 1 the whole time.

Adbot
ADBOT LOVES YOU

Slurps Mad Rips
Jan 25, 2009

Bwaltow!

Lime posted:

You could use shared_ptr in this situation though, right? Custom deleters aren't called until the last shared_ptr destructs, so it doesn't matter that the shared_ptrs can't call AddRef when copied, just so long as the ref count is 1 when it first goes into the shared_ptr. You'd basically be nesting shared_ptr's ref count under the lovely library's ref count, which would just stay 1 the whole time.

Bit of a waste, since shared_ptr allocates a control block. Now you've got two ref counts, *and* you're wasting a lot of extra space for just one pointer. 🤷‍♀️

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply