Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Xerophyte
Mar 17, 2008

This space intentionally left blank

Twerk from Home posted:

Hot loop problem

Caveat: I'm not an expert on vectorization, especially not avx512.

I think your current code can be reasonably vectorized by using _mm512_i32gather_epi32 and _mm512_cvtepi32_epi8 to gather chunks of left-pairs and right-pairs to vectorize the arithmetic over, assuming it is safe to read 3 bytes past the end of the phenotype arrays. You have the requisite address offset array already in pairs, albeit in AoS. Unsure if it will be worth using the downconverts to pack 64 int8s instead of just vectorizing over 16 int32s, both are options.

You will need some extra gathers to go from your existing array of pairs to local vectors of pair elements. If you can make pairs natively be std::pair<std::vector<int32_t>, std::vector<int32_t>> or some other SoA type then that step would go away.

How much vectorizing what you have will gain you compared to just packing the data depends heavily on what the pair list happens to look like and how much the cache hates it. From what you describe I would guess that packing the input arrays harder will be more impactful but, y'know, test.

I doubt the exceptional case handling is that expensive in and of itself, it should be handled by branch prediction. Check perf stat or local equivalent. Removing exceptional cases means you can turn the core phenotype arrays into bitsets which I think will be worth it at some size of pairs, as you mentioned.

I wouldn't worry about avx clock speed. Yes, the cpu may clock down (due to a bios setting, or just because you hit the power limit). The cache stalls will take fewer cycles but the same wall time, the arithmetic will take much fewer cycles and much less wall time. It's possibly a concern if you're mixing in a little vectorized code in a majority non-vectorized context, not so much in a loop full of intrinsics.

Xerophyte fucked around with this message at 16:37 on Nov 7, 2022

Adbot
ADBOT LOVES YOU

nielsm
Jun 1, 2009



Is there a good reason the exceptional cases can't be 2 and 3 instead of 2 and -1? Then you only need 2 bits for each value.
If you make the exceptional cases that, you can test for them by: (a|b)&2 != 0
That only needs one compare and conditional jump, and you can extend it to test a packed set of values for any of them being exceptional, with a single conditional. Then if one of the values in the packed set is exceptional you can go check the values one at a time.

cheetah7071
Oct 20, 2010

honk honk
College Slice
you could eliminate a few instructions per loop by not calculating cscn, since it will always be equal to pair.size()-cscs-cncn-(# of exceptional cases). The exceptional case function will have to track the number of times its been called but that should be fairly cheap if it's as rare as you say.

Zopotantor
Feb 24, 2013

...und ist er drin dann lassen wir ihn niemals wieder raus...

Twerk from Home posted:

Pairs is sorted in an attempt to improve locality of phenotype accesses, like [{0,3}, {0, 200}, {2,4}, {2, 19}] and so on.

How are you sorting? If it's lexicographically, maybe try z-order or tiling to try further improving locality.

Beef
Jul 26, 2004
Most compilers have an option to spit out a vectorization report, an annotated version of your code. I had better results on HPC and genomics codes by making simple loops guided by that auto vectorization feedback. The Intel compiler used to be better for that, but I haven't touched it in a while.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
You all had some incredible advice, and if you want to see the non-vectorized progression in quick-bench: https://quick-bench.com/q/X4b-hgBsM6HUUK8DBInJadWJ0lY. Quick-bench started running out of memory, so that doesn't even include the final, ~15% faster than best in there solution that I landed on. It's not news to anyone here, but vector<bool> is insanely slow and even for very large datasets that spill out of L2, vector<bool> was slower than vector<int8_t> with only 1s and 0s in it, consistently.

I couldn't get quick-bench to do AVX-512, but the AVX-512 implementation was able to do the counting and accumulating faster than scalar accesses alone! However on the real world complete application analysis, the down-clocking from AVX instructions was so significant that a specific scalar implementation ended up faster for the whole program, even though the hot loop itself was a solid 25% faster than the fastest scalar solution. This was a very surprising result to me!!

Xerophyte posted:

Caveat: I'm not an expert on vectorization, especially not avx512.

I think your current code can be reasonably vectorized by using _mm512_i32gather_epi32 and _mm512_cvtepi32_epi8 to gather chunks of left-pairs and right-pairs to vectorize the arithmetic over, assuming it is safe to read 3 bytes past the end of the phenotype arrays. You have the requisite address offset array already in pairs, albeit in AoS. Unsure if it will be worth using the downconverts to pack 64 int8s instead of just vectorizing over 16 int32s, both are options.

This was a great vectorized solution that was fastest on this loop by a considerable amount. Gather 16 int32s, pack into 16 int8s, AND and XOR the matching pairs, and then use POPCNT on them: https://stackoverflow.com/questions/17354971/fast-counting-the-number-of-set-bits-in-m128i-register. I was surprised that vectorized popcnt doesn't exist until Ice Lake, it's not there on Skylake-X so I couldn't use it.

cheetah7071 posted:

you could eliminate a few instructions per loop by not calculating cscn, since it will always be equal to pair.size()-cscs-cncn-(# of exceptional cases). The exceptional case function will have to track the number of times its been called but that should be fairly cheap if it's as rare as you say.

This was another big win that I had overlooked. With all of the exceptional cases handled before this loop, only 0/1 values meant that I only needed to accumulate 2 of the 3 sets. I went with cscs and cscn because they were & and ^, respectively.


Zopotantor posted:

How are you sorting? If it's lexicographically, maybe try z-order or tiling to try further improving locality.

Z order sorting helped!

Beef posted:

Most compilers have an option to spit out a vectorization report, an annotated version of your code. I had better results on HPC and genomics codes by making simple loops guided by that auto vectorization feedback. The Intel compiler used to be better for that, but I haven't touched it in a while.

I think that this is a great plan in general and usually happens with sequential data access, but these scatter/gather instructions are so finicky that I think it's unclear to the compiler if they'd help what you're doing. These instructions, and AVX-512 in general are getting much better in newer Intel chips, and I wouldn't be surprised if future compilers start using them more often as gather latencies get lower, and downclocking becomes less of a big deal. We have some Ice Lake machines coming in, and I'm going to test on one just to see.

This was the microbenchmark winner, by a mile (25%!!!)
C++ code:
    if  (phenotypes_.capacity() < phenotypes_.size() + 3) {
        phenotypes_.resize(phenotypes_.size() + 3);
    }

    for (; i + 15 < pairs.first.size(); i+= 16) {
        auto left_addresses = _mm512_loadu_si512(&pairs.first[i]);
        auto right_addresses = _mm512_loadu_si512(&pairs.second[i]);

        auto lefts = _mm512_i32gather_epi32(left_addresses, phenotypes_.data(), 1);
        auto rights = _mm512_i32gather_epi32(right_addresses, phenotypes_.data(), 1);
        
        auto left_packed = _mm512_cvtepi32_epi8(lefts);
        auto right_packed = _mm512_cvtepi32_epi8(rights);

        auto cscs_batch = _mm_and_si128(left_packed, right_packed);
        auto cscn_batch = _mm_xor_si128(left_packed, right_packed);

        cscs += popcnt128(cscs_batch);
        cscn += popcnt128(cscn_batch);
    }
But push come to shove, in the context of the whole application, higher clocks from scalar instructions ended up winning on Xeon 6130s. Winning by a hair, less than 5%, but also more portable:
C++ code:
        for (size_t i = 0; i + 1 < pairs.size(); i += 2) {

            const auto [p1_a, p1_b] = pairs[i];
            const auto x1 = phenotypes_[p1_a];
            const auto y1 = phenotypes_[p1_b];

            cscs += x1 & y1;
            cscn += x1 ^ y1;

            const auto [p2_a, p2_b] = pairs[i + 1];
            auto x2 = phenotypes_[p2_a];
            auto y2 = phenotypes_[p2_b];

            cscs2 += x2 & y2;
            cscn2 += x2 ^ y2;
        }
I was pretty surprised that unrolling the loop and adding a second pair of accumulators was more than a 15% win. CPUs are really wide! Also, side note because I couldn't resist trying this on my Macbook Air: Apple CPUs are the widest of all. With two accumulators like that, my Macbook is able to do all all of that counting and summing in identical time to data access alone, just pulling x, y and not doing anything with them. Also, clearly these all created some extra work, where I have separate loops to handle the leftover stuff that the unrolled loop doesn't hit.

Twerk from Home fucked around with this message at 18:25 on Nov 8, 2022

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum
Do you know why you were downclocked? Was it auto from an avx instruction you can avoid (try smaller width instructions)? I don’t remember how to figure out how intel categorizes things, but there are heavy and light instructions which the cpu treats differently.

Also maybe look into your host setup and try to prevent the os scheduler from getting in the way with managing core affinity yourself (one thread per core, move all os crap off your cores, isolcpus is a hammer, cpusets can be useful). Not sure how much this matters for your use case…

Also if you are pushing a lot of data around consider keeping things one a single cpu, crossing the cpu boundary gets caught up in the cross-cpu transfer times which can be really expensive, especially if you are overwhelming it

Beef
Jul 26, 2004

Twerk from Home posted:

I think that this is a great plan in general and usually happens with sequential data access, but these scatter/gather instructions are so finicky that I think it's unclear to the compiler if they'd help what you're doing.

All big three compilers are really good these days at adding gather/scatters. Especially if you have an explicit indirect access " a[b[i]]" in the loop body. Do make use of the optimization report of your particular compiler, it will tell you where it failed to vectorize loops and why. There are also vector advisor tools to help guide you.

GCC12 is doing it at -O2 already.
https://godbolt.org/z/jn1b556Mf

And here is icx's output, which is also inserting the gather.
https://godbolt.org/z/15EnYq9xW

For clang I had to add -ffast-math
https://godbolt.org/z/KfKarq1bE


What they are good at, compared to a human, is to know when to /not/ use expensive avx512 ops. I had several bouts of manually adding avx512 intrinsics because gcc/icc was sticking to AVX512, only to find out that the compiler's autovec version was faster in practice.

Beef fucked around with this message at 21:19 on Nov 8, 2022

mobby_6kl
Aug 9, 2009

by Fluffdaddy
I just created a new programming project for myself by being an idiot. I basically bricked the touchscreen on my laptop by trying to update it, because the response to the pen wasn't good and that was supposed to help. Unfortunately a) it turned out to be the wrong update, and it didn't check for compatibility, b) it actually flashed some settings into the controller so no amount of driver reinstalls will help.

I opened a support ticket but I'm not too optimistic and assume there's a good chance I'll have to solve it myself.

The good news is that it's probably fixable. The update was done with a basic console app in Windows, and the touch controller is I2C and I even found a datasheet for a slightly different model already. So if it could write the wrong configuration, it should also be possible to read it back from a known good device (I'd have to find a volunteer for that). Pretty simple, right?



https://www.crystalfontz.com/controllers/GOODIX/GT911ProgrammingGuide/478/

The problem is I've no idea where to even start on PC. I've actually done some I2C programming on an ESP32 but it had a pretty easy interface available and that's pretty much the entire code (other than some #defines) to initialize and read from a register:
C++ code:
static esp_err_t i2c_master_init(void)
{
    int i2c_master_port = I2C_MASTER_NUM;

    i2c_config_t conf = {
        .mode = I2C_MODE_MASTER,
        .sda_io_num = I2C_MASTER_SDA_IO,
        .scl_io_num = I2C_MASTER_SCL_IO,
        .sda_pullup_en = GPIO_PULLUP_ENABLE,
        .scl_pullup_en = GPIO_PULLUP_ENABLE,
        .master.clk_speed = I2C_MASTER_FREQ_HZ,
    };

    i2c_param_config(i2c_master_port, &conf);

    return i2c_driver_install(i2c_master_port, conf.mode, I2C_MASTER_RX_BUF_DISABLE, I2C_MASTER_TX_BUF_DISABLE, 0);
}

static esp_err_t mpu9250_register_read(uint8_t reg_addr, uint8_t *data, size_t len)
{
    return i2c_master_write_read_device(I2C_MASTER_NUM, MPU9250_SENSOR_ADDR, &reg_addr, 1, data, len, I2C_MASTER_TIMEOUT_MS / portTICK_PERIOD_MS);
}
Would anyone know what would be the equivalent approach to do this in Windows? Or Linux, in the worst case? While I could figure Linux out, I'd have to rely on strangers to run this for me and ideally step 1 wouldn't be "install gentoo".

E: I guess I need "HID-over-I2C": https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/hid-over-i2c-guide and maybe this user-mode application example: https://learn.microsoft.com/en-us/samples/microsoft/windows-driver-samples/hclient-sample-application/

E2: Going to park this for now, I found exactly one copy of the debug software for these controllers on some sketchy russian site that should be able to read/write the configs for me. That probably has a better chance of success :D

mobby_6kl fucked around with this message at 19:36 on Nov 13, 2022

Ihmemies
Oct 6, 2012

gently caress this C++ course has loving pointers. Of course. I'm supposed to figure out the largest int of an array, and I only have access to the start and end pointer of the array. And I really don't understand anything. I barely managed to do a previous task where I had at least the size of the array provided. Now I don't have even that anymore. Just 2 pointers. WTF, what kind of sadist invented this poo poo

cheetah7071
Oct 20, 2010

honk honk
College Slice
Raw pointers are the kind of thing that you will almost never work directly with in real C++ core but understanding them is crucial because the systems that replace raw pointers only make sense if you understand what they're doing under the hood

While I sympathize with the trouble you're having, being able to solve that kind of problem is important for understanding C++

Ihmemies
Oct 6, 2012

I just wanted to vent, and I did knot know about any other thread. I have roughly 2 hours left to figure this out (it won't happen).

I'll probably have to :filez: a book about C, maybe it explains this in a way I can understand

Ihmemies fucked around with this message at 19:57 on Nov 15, 2022

cheetah7071
Oct 20, 2010

honk honk
College Slice
well, I don't want to do your homework for you, but I recommend reviewing pointer arithmetic, which I assume has been covered by this point. Doing some quick searches, there's a bunch of tutorials on it on youtube which might work better for your brain than the course has?

Ihmemies
Oct 6, 2012

I understand. Maybe I can catch those segfaults when iterating through the array... so the program won't crash from segfault, instead it returns the largest value I've found so far.

Edit: can't feasibly catch segfaults, drat.

Ihmemies fucked around with this message at 20:09 on Nov 15, 2022

cheetah7071
Oct 20, 2010

honk honk
College Slice
catching a segfault wouldn't necessarily help, either. Segfaults only happen if you attempt to access unmapped memory. The position immediately after the array might be mapped by some other object, and you might blithely walk into something else without realizing.

My second hint is to consider that you were given the pointer to (I assume, since that's how this usually works) the first location past the end of the array; any solution must use that or you wouldn't have been given it

Ihmemies
Oct 6, 2012

It's supposed to print something like this:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 6 9
9
9
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 6 9

but I manage to get

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 6 9
9
0
-1887031104 32521 6 0 6 16 1240251648 592132058 -1889350432 32521 -1887035560 32521 -1887035768 32521 -1887034912 32521 -1887035200 32521 -1887915361 32521 -1887035560 32521 -1888119286 32521 -1887035192 32521 -1887035768 32521 -1887037120 32521

So it's not going too hot yet.

code:
int greatest_v1(int* itemptr, int size) {
    int i = 0;
    int largest = *(itemptr + i);
    while (i < size) {
        if (*(itemptr + i) > largest) {
            largest = *(itemptr + i);
        }
        i++;
    }

    return largest;
}

int greatest_v2(int* itemptr, int* endptr) {
    int i = 0;
    int largest = *(itemptr + i);

    while (*(itemptr + i) < *(endptr)) {
        if (*(itemptr + i) > largest) {
            largest = *(itemptr + i);
        }
        i++;
    }

    return largest;
}

void copy(int* itemptr, int* endptr, int* targetptr) {
    unsigned long int n = 0;
    int largest = *(itemptr + n);

    // calculate size of array
    while (*(itemptr + n) < *(endptr)) {
        if (*(itemptr + n) > largest) {
            largest = *(itemptr + n);
        }
        n++;
    }

    // copy array
    if (targetptr != nullptr)
    {
        for (size_t i = 0; i < n; i++) {
            *(targetptr + i) = *(itemptr + i);
        }
    }
}

more falafel please
Feb 26, 2005

forums poster

Even if you are never going to use C++ again, understanding pointers means understanding how data is laid out in memory, how computers allocate memory, etc.

A couple of hints/tips:

1. A pointer is a memory address, which is just an integer. If you think of memory as a very large array of bytes, a pointer is an index into that array. (memory is more complicated than that, but that's effectively the model that's presented to you, the programmer):

code:
char* MEMORY[VERY_LARGE_NUMBER];
int pointer = 42;

printf("%c", MEMORY[pointer]);
is equivalent to:

code:
char* pointer = 42;
printf("%c", *pointer);
This means you can do arithmetic on them, because they're integers. You can do ==, <, >, etc. comparisons on them, because they're integers.

2. Use some kind of debugging: this can be spamming your code with printfs, using a debugger that attaches to your program and lets you inspect values and step through code, or both. It's hard to reason about what code is doing without seeing what it does, especially if you're new to this. Whenever possible, don't guess what the program is doing, make the program tell you what it's doing. You're writing the program, it can do anything you want!

3. Pointers in C and C++ are typed, so they increment and decrement by the sizeof their pointed-to-type.

code:
char* c_ptr = 42;
int* i_ptr = 42;

++c_ptr;
++i_ptr;

printf("%d %d", (int)c_ptr, (int)i_ptr);
This will print (on most machines) "43 46", because a char is 1 byte, and an int is 4 bytes.

edit: vvv yeah, this.

more falafel please fucked around with this message at 20:46 on Nov 15, 2022

cheetah7071
Oct 20, 2010

honk honk
College Slice
looking at your example, I would actually review the core syntax of how one uses pointers and de-references them, rather than looking at pointer arithmetic, which you seem to understand well enough

you're gonna absolutely kick yourself when you find the bug

Vanadium
Jan 8, 2005

Also don't just put a * anywhere you're dealing with a pointer, think about whether you want to actually * the pointer at that moment.

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

Ihmemies posted:

It's supposed to print something like this:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 6 9
9
9
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 6 9

but I manage to get

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 6 9
9
0
-1887031104 32521 6 0 6 16 1240251648 592132058 -1889350432 32521 -1887035560 32521 -1887035768 32521 -1887034912 32521 -1887035200 32521 -1887915361 32521 -1887035560 32521 -1888119286 32521 -1887035192 32521 -1887035768 32521 -1887037120 32521

So it's not going too hot yet.

code:
int greatest_v1(int* itemptr, int size) {
    int i = 0;
    int largest = *(itemptr + i);
    while (i < size) {
        if (*(itemptr + i) > largest) {
            largest = *(itemptr + i);
        }
        i++;
    }

    return largest;
}

int greatest_v2(int* itemptr, int* endptr) {
    int i = 0;
    int largest = *(itemptr + i);

    while (*(itemptr + i) < *(endptr)) {
        if (*(itemptr + i) > largest) {
            largest = *(itemptr + i);
        }
        i++;
    }

    return largest;
}

void copy(int* itemptr, int* endptr, int* targetptr) {
    unsigned long int n = 0;
    int largest = *(itemptr + n);

    // calculate size of array
    while (*(itemptr + n) < *(endptr)) {
        if (*(itemptr + n) > largest) {
            largest = *(itemptr + n);
        }
        n++;
    }

    // copy array
    if (targetptr != nullptr)
    {
        for (size_t i = 0; i < n; i++) {
            *(targetptr + i) = *(itemptr + i);
        }
    }
}

A hint here is that pointers are comparable, they are values and also point to other values. Think about what a pointer is, what the value of a pointer represents. You can print pointers using stdout (do not deref it) and see how the two you have compare.

You aren’t far off, just a minor issue in your code which can be seen if you add some printing to understand what the values are.

Ihmemies
Oct 6, 2012

Well Actually I still have no idea but this seems to work better, while I was randomly trying to add more special characters to the mayhem:

code:
while (&(*(itemptr + i)) < endptr) {

cheetah7071
Oct 20, 2010

honk honk
College Slice
yeah that would work, but once the test is over I would absolutely review what & and * actually do

I'm happy to explain it in a few hours once it isn't doing your homework for you

Ihmemies
Oct 6, 2012

cheetah7071 posted:

yeah that would work, but once the test is over I would absolutely review what & and * actually do

I'm happy to explain it in a few hours once it isn't doing your homework for you

I'd rather write python at this point. Luckily there's only 2 weeks left in this most horrible course.

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Ihmemies posted:

I'd rather write python at this point. Luckily there's only 2 weeks left in this most horrible course.

Python doesn't alleviate the need to understand what the computer is doing. You'll be better at programming in all languages if you understand how memory gets laid out and how your environment interacts with it.

Foxfire_
Nov 8, 2010

Ihmemies posted:

I'd rather write python at this point. Luckily there's only 2 weeks left in this most horrible course.
Python has pointers. It just makes (almost) everything a pointer and calls them reference types instead. You still have to deal with the core "thing itself vs name referring to a thing" distinction that confuses people

Ihmemies
Oct 6, 2012

Don’t worry. Next task is to implement a Book class with 16 mandatory methods, using pointers and recursion. The only choice is that I can use smart pointers instead of regular pointers if I want to. I have one week. :shepicide:

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
C++ ergonomics question: How can I return an instance of a derived class from a function that returns a type of a base class without using pointers, and without copying?

C++ code:
struct Base {
    virtual ~Base() = default;
    virtual void foo() = 0;
};

struct Derived : Base {
    ~Derived() = default;
    void foo() override;
};

Base buildABase() {
    return Derived();
}
This doesn't work, because Base is an abstract class, and I can't return an instance of an abstract class.

C++ code:
Base& buildABase() {
    return Derived();
}
Doesn't work because I'm returning a reference to a temporary, or worse I'm returning a reference to something on the stack if I make a non-temporary.

How I can hack around this with what I know:

C++ code:
std::unique_ptr<Base> buildABase() {
    return std::make_unique<Derived>();
}
Also, because I'm a novice at this, if I can get by with default destructors then I want a virtual default destructor on Base and Derived, right? I was expecting to have a pure virtual destructor on the Base type, because Base doesn't have any members of its own, but it looks like pure virtual destructors need a body, and my types can all get away with default destructors.

robostac
Sep 23, 2009
You can't. The calling function needs to know exactly what is being returned to allocate space for it, so if you are returning a rvalue it has to be of that exact class. Returning pointers is fine as they have a fixed size no matter what they point too.

Yes, your destructor needs an implementation because the default destructor on the child will call it's parents class destructor. The destructor has to be virtual otherwise deleting a pointer to a base class wouldn't destruct any derived parts.

robostac fucked around with this message at 23:32 on Nov 15, 2022

cheetah7071
Oct 20, 2010

honk honk
College Slice
I'm pretty sure you can't. In C++, polymorphism only works on pointers. As I understand it, the reasons for this are pretty deep in the weeds on how polymorphism actually works, on a compiler level.

OddObserver
Apr 3, 2009
Don't need to go that deep. With value semantics, the subclass and parent class having a different size pose a problem of how you would allocate space, for one.

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

OddObserver posted:

Don't need to go that deep. With value semantics, the subclass and parent class having a different size pose a problem of how you would allocate space, for one.

There's not necessarily a size difference between a parent and child class. But that's doesn't remove the issues with destructors etc

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Twerk from Home posted:

Also, because I'm a novice at this, if I can get by with default destructors then I want a virtual default destructor on Base and Derived, right? I was expecting to have a pure virtual destructor on the Base type, because Base doesn't have any members of its own, but it looks like pure virtual destructors need a body, and my types can all get away with default destructors.
unique_ptr is the common and appropriate way to do returning a subclass as an instance of the base class.
You can't have a pure virtual destructor because destructors execute up the chain of destructors (otherwise any members of the base class wouldn't be destroyed). So =default is also the common and appropriate way to do that.

Xerophyte
Mar 17, 2008

This space intentionally left blank
Note that if your base class wasn't abstract and you wrote
C++ code:
Base buildABase() {
    return Derived();
}
then you would probably be unhappy with the results. The returned type would be an instance of Base constructed using Base's copy constructor, which has to throw away whatever Derived adds on. Value semantics mean your declared type is the actual most derived type, so code like this (or just Base x = Derived()) will not result in objects that behave like Derived:s even when legal.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I have so many Java habits to unlearn. Building on the above, some more things surprised me:

C++ code:
std::unique_ptr<Base> buildABase() {
    return std::make_unique<Derived>(Derived());
}

struct Consumer {
    Consumer(std::unique_ptr<Base> &&inputBase) : myMember{std::move(inputBase)} {};
    std::unique_ptr<Base> myMember;
};
I have a Consumer that wants a working instance of Base, which is actually a Derived class. I feel like I'm having to repeat myself in the initializer list, but even in the copy constructor, it doesn't compile if I just say myMember{inputBase}, I have to specifically tell it to move.

This makes me realize that there's absolutely some extraneous copies elsewhere in these applications, and I've been assuming the compiler is better at figuring out when things can be moved than it really is.

Twerk from Home fucked around with this message at 05:11 on Nov 16, 2022

Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


Ihmemies posted:

Don’t worry. Next task is to implement a Book class with 16 mandatory methods, using pointers and recursion. The only choice is that I can use smart pointers instead of regular pointers if I want to. I have one week. :shepicide:

Once you understand how pointers work that's really not that much work at all, TBH. Even if you're new and having trouble you should be able to finish it in 2 afternoons or so.

more falafel please
Feb 26, 2005

forums poster

Twerk from Home posted:

I have so many Java habits to unlearn. Building on the above, some more things surprised me:

C++ code:
std::unique_ptr<Base> buildABase() {
    return std::make_unique<Derived>(Derived());
}

struct Consumer {
    Consumer(std::unique_ptr<Base> &&inputBase) : myMember{std::move(inputBase)} {};
    std::unique_ptr<Base> myMember;
};
I have a Consumer that wants a working instance of Base, which is actually a Derived class. I feel like I'm having to repeat myself in the initializer list, but even in the copy constructor, it doesn't compile if I just say myMember{inputBase}, I have to specifically tell it to move.

This makes me realize that there's absolutely some extraneous copies, and I've been assuming the compiler is better at figuring out when things can be moved than it really is.

The Java model (to be fair, it's been a long time since I've written Java) would be to return new Derived() from the factory method, with the consumer taking implicit ownership of it. This means not only relying on GC to clean up, but the possibility of circular references, or a dangling reference to an object sitting somewhere long-lived and causing a leak.

unique_ptr ensures that only one pointer refers to that object (assuming you're following best practices and not grabbing a raw pointer and shoving it somewhere), and that the object will be destroyed when its unique_ptr goes out of scope. That's why it has to be moved, because it can't be copied.

cheetah7071
Oct 20, 2010

honk honk
College Slice

Twerk from Home posted:

C++ code:
std::unique_ptr<Base> buildABase() {
    return std::make_unique<Derived>(Derived());
}

this is probably not doing what you want. It's calling the constructor for Derived which takes the output of Derived() as an argument: in other works, the copy constructor. If you want to construct in place with the default constructor, just do std::make_unique<Derived>()

The syntax for what you're trying to do in Consumer is a little weird because unique_ptr is specifically for cases where only one portion of the code ever needs access to the object. If the calling function constructs the unique_ptr and then passes it to Consumer's constructor, then two parts of the code have access to it simultaneously, which breaks the core guarantee of the class, so it doesn't compile. The two options are to use a shared_ptr instead (if you really want to allow both the calling function and the Consumer object to access it) or to have the constructor for Consumer instead take the arguments you need for it to construct its own unique_ptr

If you really need to do complicated manipulations on the object in the calling function, so passing an actual pointer to a Base object is really needed, and also you really want the funcitonality of unique_ptr, you do have an option, but it's kind of ugly. You can do something like:

C++ code:
struct Consumer {
	Consumer(Base* po_base) : myMember(po_base) {};
	std::unique_ptr<Base> myMember;
}
In this version, the constructor takes a raw pointer, builds a unique pointer around it, and then assumes responsibility for it. If you wanted to get really fancy you could do something like:

C++ code:
struct Consumer {
	Consumer(Base*& po_base) : myMember(po_base) {po_base = nullptr;}
	std::unique_ptr<Base> myMember;
}
where you take control of the pointer and delete the caller's copy of it to really emphasize that it's yours now

this is still kind of dangerous because if the pointer refers to a stack-allocated object this doesn't do at all what you want and in 99% of cases it'll just be easier to avoid wanting to do this than to try to make it work

cheetah7071
Oct 20, 2010

honk honk
College Slice
to maybe emphasize a bit more clearly why the code snippet you posted is something that's kind of awkward, you're envisioning a scenario where the Consumer object is responsible for deleting the Base object, but where the Base object already exists in the calling function. Like, you can make it work with careful coding, but it's awkward

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

cheetah7071 posted:

this is probably not doing what you want. It's calling the constructor for Derived which takes the output of Derived() as an argument: in other works, the copy constructor. If you want to construct in place with the default constructor, just do std::make_unique<Derived>()

The syntax for what you're trying to do in Consumer is a little weird because unique_ptr is specifically for cases where only one portion of the code ever needs access to the object. If the calling function constructs the unique_ptr and then passes it to Consumer's constructor, then two parts of the code have access to it simultaneously, which breaks the core guarantee of the class, so it doesn't compile. The two options are to use a shared_ptr instead (if you really want to allow both the calling function and the Consumer object to access it) or to have the constructor for Consumer instead take the arguments you need for it to construct its own unique_ptr

If you really need to do complicated manipulations on the object in the calling function, so passing an actual pointer to a Base object is really needed, and also you really want the funcitonality of unique_ptr, you do have an option, but it's kind of ugly. You can do something like:

C++ code:
struct Consumer {
	Consumer(Base* po_base) : myMember(po_base) {};
	std::unique_ptr<Base> myMember;
}
In this version, the constructor takes a raw pointer, builds a unique pointer around it, and then assumes responsibility for it. If you wanted to get really fancy you could do something like:

C++ code:
struct Consumer {
	Consumer(Base*& po_base) : myMember(po_base) {po_base = nullptr;}
	std::unique_ptr<Base> myMember;
}
where you take control of the pointer and delete the caller's copy of it to really emphasize that it's yours now

this is still kind of dangerous because if the pointer refers to a stack-allocated object this doesn't do at all what you want and in 99% of cases it'll just be easier to avoid wanting to do this than to try to make it work

Thank you! Don't these two do the same thing? I thought that by calling std::move on the unique_ptr passed in, this is basically what I was doing:

C++ code:
    Consumer(std::unique_ptr<Base> &&inputBase) : myMember{std::move(inputBase)} {};
    Consumer(Base*& po_base) : myMember(po_base) {po_base = nullptr;}
Also, I thought that compilers were smart enough to automatically use the move constructor if you pass a temporary into the constructor and a move constructor exists? I'll be more careful about make_unique for sure.

In my specific case, to get concrete, I have something that's processing streaming data, and I don't want to put the whole "is it compressed? We support multiple types of compression. Is it even coming from a file?" logic into the Consumer. Instead I want to pass in a ready-to-go std::istream, and I do want ownership of it to move to the consumer. This way we can have the logic to turn paths or sockets or whatever into working istreams in one place, and the different ways to actually consume that data in another.

cheetah7071 posted:

to maybe emphasize a bit more clearly why the code snippet you posted is something that's kind of awkward, you're envisioning a scenario where the Consumer object is responsible for deleting the Base object, but where the Base object already exists in the calling function. Like, you can make it work with careful coding, but it's awkward


Yes, I'm setting up istreams and passing them into things that will read through all of the istreams and then they can go away once the Consumer is done with it. The consumer internally is breaking the data into chunks and then passing a unique_ptr to each chunk onto a synchronized queue, that lots of worker threads are pulling from, so we're already doing something somewhat like this elsewhere in the application. Things get std::move-ed into the queue, and std::move-d out.

Twerk from Home fucked around with this message at 05:32 on Nov 16, 2022

Adbot
ADBOT LOVES YOU

cheetah7071
Oct 20, 2010

honk honk
College Slice
The main difference between those two snippets is that rvalues don't have a very long lifespan. If you need to manipulate the object over multiple lines before passing it to the constructor, it'll be very difficult or impossible to preserve its rvalue status.

It occurs to me that the swap function is probably the actual best thing in this case:

C++ code:
struct Consumer {
	Consumer(std::unique_ptr& input) {
		myMember.swap(input);
	}
	std::unique_ptr<Base> myMember;
};
which basically does the same thing as my second example, but with smart pointers instead of raw pointers. I'm at best an intermediate C++ coder myself so I didn't know that a function that did exactly what you wanted already existed off the top of my head.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply