Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Raenir Salazar
Nov 5, 2010

College Slice

rjmccall posted:

If it's a vector of pointers (std::vector<MyBaseClass*>), yes. The original object is still there, and the vector just happens to hold a pointer to the part of it that represents the base class.

If it's a vector of objects (std::vector<MyBaseClass>), no. The vector holds a copy of the base-class portion of the object; the original object is still around somewhere, but it's completely different from the object in the vector, which is dynamically just an object of the base type.

Excellent! Thanks, polymorphism sometimes seems like magic to me as this video points out why.

Adbot
ADBOT LOVES YOU

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

JawKnee posted:

I'm getting a memory leak but I'm unsure what's causing it.
It's also worth noting that, illegal malloc shenanigans aside, you'll run into far fewer memory issues in general if you use C++11 smart pointers like std::unique_ptr appropriately, as this will result in intended behavior automatically being produced in the destructor/move constructor/etc without any further boilerplate. In modern C++ you should think very hard before using a raw pointer for anything that has any kind of ownership.

Vanadium
Jan 8, 2005

The_Franz posted:

You can use malloc and free with classes too. Just use placement new on the allocated pointer to run the constructor and call the destructor manually with obj->~MyObject() before freeing it.

Fine, I'll change my position to "you have to use the new operator instead of or in addition to malloc". :colbert:

hackbunny
Jul 22, 2007

I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av
Holy poo poo don't use placement new, ever. You need a drat good reason to use placement new

JawKnee
Mar 24, 2007





You'll take the ride to leave this town along that yellow line

Ralith posted:

It's also worth noting that, illegal malloc shenanigans aside, you'll run into far fewer memory issues in general if you use C++11 smart pointers like std::unique_ptr appropriately, as this will result in intended behavior automatically being produced in the destructor/move constructor/etc without any further boilerplate. In modern C++ you should think very hard before using a raw pointer for anything that has any kind of ownership.

I'm wrapping my head around these currently, or attempting to anyhow. Should I only be using smart pointers when something needs to be dynamically allocated?

nielsm
Jun 1, 2009



JawKnee posted:

I'm wrapping my head around these currently, or attempting to anyhow. Should I only be using smart pointers when something needs to be dynamically allocated?

It's a very good idea to do so.

What's even more important is that you establish a logical place of ownership of any allocation. That is, determine what part of the program is responsible for that object and controls the "master pointer".
For instance, in a 3rd person 3D game, is the player character's geometry owned by the player character object, or by the game world object? And what would that affect? Ownership most becomes most important when it's time to tear down data structures again.

In C++11 STL, ownership is usually indicated by a std::unique_ptr object. A unique_ptr can get a value either by constructing an object directly for it, or by it claiming ownership from another unique_ptr object, which then becomes invalid afterwards. And when a unique_ptr object goes out of scope, it destroys the wrapped object.

By clearly establishing logical ownership, and enforcing it using correct smart pointers, you can avoid memory leaks from forgotten deletes, use-after-delete bugs, and multiple deletion bugs.

sarehu
Apr 20, 2007

(call/cc call/cc)

JawKnee posted:

I'm wrapping my head around these currently, or attempting to anyhow. Should I only be using smart pointers when something needs to be dynamically allocated?

Do things such that raw pointers are "non-owning" pointers.

Beef
Jul 26, 2004
Address sanitizer is awesome, but holy poo poo did I suffer trying to get working into the spaghetti-build makefile of the project they shoved me on.

Did anyone here try both thread sanitizer and Intel parallel inspector? How do they compare?

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

JawKnee posted:

I'm wrapping my head around these currently, or attempting to anyhow. Should I only be using smart pointers when something needs to be dynamically allocated?
If something does not otherwise need to be dynamically allocated, then of course don't add gratuitous smart pointers into the mix. They're there to manage ownership, and ownership is generally not an issue for values that are not dynamically allocated (i.e. are on the stack, are a regular member of another object, or are global).

Sauer
Sep 13, 2005

Socialize Everything!
Are there any times when using a naked pointer to pass around an object is preferable to just using a reference?

sarehu
Apr 20, 2007

(call/cc call/cc)
Yes, when it's not a const reference. Also when it's a field in any object.

Chuu
Sep 11, 2004

Grimey Drawer
Any advice for the best way to go about trying to figure out why GDB is constantly lying to me about the return values of functions called on standard collections?

As an example, I have a std::deque that I am constantly adding/removing items from. The logic works correctly, and I've added a ton of asserts to make sure the contents are consistent with what I expect them to be.

In GDB, I always get very weird results when interacting with it. For example, on the first iteration -- where I know the collection has zero elements -- I'll get .size() == 1, .empty() == true, and .size() stringstream'd and then .str()'d results in "0" like I'd expect. Every time an iteration occurs, i.e. every time I hit the breakpoint again -- size() is always 0 or 1 (which is what I expect) when checked via asserts, but gdb always reports a value that is incremented from the previous iteration -- i.e. 1,2,3,....

Anyone run into something like this before? I've wasted hours trying to track this down.

I'm using RHEL devtoolset-4 for reference. That's gcc 5.2.1 and gdb 7.20. I've tried many permutations of compiler flags, but will take any advice on particular values of interest. Code must be compiled targeting gnu-c++14 or std-c++14.

feedmegin
Jul 30, 2008

Chuu posted:

Any advice for the best way to go about trying to figure out why GDB is constantly lying to me about the return values of functions called on standard collections?

As an example, I have a std::deque that I am constantly adding/removing items from. The logic works correctly, and I've added a ton of asserts to make sure the contents are consistent with what I expect them to be.

In GDB, I always get very weird results when interacting with it. For example, on the first iteration -- where I know the collection has zero elements -- I'll get .size() == 1, .empty() == true, and .size() stringstream'd and then .str()'d results in "0" like I'd expect. Every time an iteration occurs, i.e. every time I hit the breakpoint again -- size() is always 0 or 1 (which is what I expect) when checked via asserts, but gdb always reports a value that is incremented from the previous iteration -- i.e. 1,2,3,....

Anyone run into something like this before? I've wasted hours trying to track this down.

I'm using RHEL devtoolset-4 for reference. That's gcc 5.2.1 and gdb 7.20. I've tried many permutations of compiler flags, but will take any advice on particular values of interest. Code must be compiled targeting gnu-c++14 or std-c++14.

Are you compiling with optimisations?

If you literally want to see what the actual return value was, do info registers eax (32-bit) or rax (64-bit), which the SysV ABI specifies as the return register on x86, immediately after the function call.

Chuu
Sep 11, 2004

Grimey Drawer

feedmegin posted:

Are you compiling with optimisations?

If you literally want to see what the actual return value was, do info registers eax (32-bit) or rax (64-bit), which the SysV ABI specifies as the return register on x86, immediately after the function call.

I've tried various combinations of explicitly setting -O0, -g, -0g, and -ggdb with no success. I've been manually looking at the makesfiles that cmake generates to verify the settings are being used.

The funny thing is, the debugger is significantly better behaved when I compile using cmake's "optimized with debug info", which I believe is -g -O2.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

stepi might shed some light.

What flags does cmake use?

netcat
Apr 29, 2008
nvm

netcat fucked around with this message at 17:05 on Jan 17, 2016

Beef
Jul 26, 2004
I had terrible experiences trying to coerce CMake to insert specific flags, experiences like CMake completely ignoring some user-defined variables such as CFLAGS in favor of it's own guesses.

Be paranoid, print out every build command.

MrMoo
Sep 14, 2000

It's not too crazy, just have to more work to example than from the documentation. A Make maker system by definition ends up more verbose though due to the different modes it supports. For example each build type has a set of flags and there are separate flags for each build step, here is link time optimization for MSVC:
code:
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GL")
set(CMAKE_EXE_LINKER_FLAGS_RELEASE "${CMAKE_EXE_LINKER_FLAGS_RELEASE} /LTCG")
set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "${CMAKE_SHARED_LINKER_FLAGS_RELEASE} /LTCG")
set(CMAKE_MODULE_LINKER_FLAGS_RELEASE "${CMAKE_MODULE_LINKER_FLAGS_RELEASE} /LTCG")

The Gay Bean
Apr 19, 2004
Assume I have several small bits of work to be done, that I want to parallelize. Each bit of work is small, and will be done at ~30 fps; the work must also be done sequentially (video encoding, if you're curious, but I'm curious about the general case). My current instinct on how to approach this would be one of two ways:

code:
for (...)
{
  std::thread thread1(&myWorker1,mywork1);
  std::thread thread2(&myWorker2,mywork2);
  ...

  thread1.join();
  thread2.join();
  ...
}
Or, using OpenMP (which unfortunately doesn't work in the version of Clang that Apple ships with). My questions are:

1. Is there anything braindead about either of the above approaches?
2. Is there a better way?

The Gay Bean fucked around with this message at 03:21 on Jan 20, 2016

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

The Gay Bean posted:

Assume I have several small bits of work to be done, that I want to parallelize. Each bit of work is small, and will be done at ~30 fps; the work must also be done sequentially (video encoding, if you're curious, but I'm curious about the general case). My current instinct on how to approach this would be one of two ways:

code:
std::thread thread1(&myWorker1,mywork1);
std::thread thread2(&myWorker2,mywork2);
...

thread1.join();
thread2.join();
Or, using OpenMP (which unfortunately doesn't work in the version of Clang that Apple ships with). My questions are:

1. Is there anything braindead about either of the above approaches?
2. Is there a better way?

Assuming this is purely CPU-bound work, you probably want to use a thread pool with a work queue. The thread pool can spin up exactly the number of threads that your hardware can execute most efficiently without any excess overhead.

The Gay Bean
Apr 19, 2004
Yeah, I've used that approach in other cases, but the problem here (and something I omitted before) is that myWorker1, ... in this example all have an attached object/state for a video encoder. The encoder has to receive frames in order. I'm treating the encoder as a black box in this case - there are a lot of ways to parallelize video encoding but I'm not concerned with that for the sake of this example. (in actuality, each encoder state it spinning up threads internally as far as I can tell).

So another approach is to maintain N threads for N encoders and feed queues for all of these encoders. Given the two cases - N persistent threads for N encoders vs. creating a thread for each loop - is there a large difference in speed?

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

The Gay Bean posted:

Yeah, I've used that approach in other cases, but the problem here (and something I omitted before) is that myWorker1, ... in this example all have an attached object/state for a video encoder. The encoder has to receive frames in order. I'm treating the encoder as a black box in this case - there are a lot of ways to parallelize video encoding but I'm not concerned with that for the sake of this example. (in actuality, each encoder state it spinning up threads internally as far as I can tell).

So another approach is to maintain N threads for N encoders and feed queues for all of these encoders. Given the two cases - N persistent threads for N encoders vs. creating a thread for each loop - is there a large difference in speed?
I'm confused. Are you trying to perform strictly sequential operations on a single object? Then don't use threads at all, especially if it's already taking advantage of hardware concurrency internally. Are you trying to write a tool that can encode a bunch of unrelated video in parallel? I'd just write a tool (or, better yet, use an existing one like ffmpeg) that can encode a single thing at a time, and launch however many of them is necessary to saturate your hardware.

The Gay Bean
Apr 19, 2004
I'm sorry for being vague, I was just trying to come up with the smallest example of the problem possible.

We have 4 sources of video frames and we want to encode these frames at 30 fps in real time. These are delivered by a C++ library. We have wrapped ffmpeg encoding in a C++ object, and are essentially doing what you're suggesting - running several instances ffmpeg - to encode them.` The below code spins at 25 FPS while occupying 35% of the (4-core) CPU:

code:
while (...)
{
  m_encoder1.encodeFrameInPlace((const char *)wrapperFocus.data1);
  m_encoder2.encodeFrameInPlace((const char *)wrapperFocus.data2);
  m_encoder3.encodeFrameInPlace((const char *)wrapperFocus.data3);
  m_encoder4.encodeFrameInPlace((const char *)wrapperFocus.data4);
}
The below code spins at 35 FPS while occuping 60% of the CPU:

code:
while (...)
{
  std::thread thread1(std::bind(&VideoFileEncoder::encodeFrameInPlace,&m_encoder1,(const char *)wrapperFocus.data1));
  std::thread thread2(std::bind(&VideoFileEncoder::encodeFrameInPlace,&m_encoder2,(const char *)wrapperFocus.data2))
  std::thread thread3(std::bind(&VideoFileEncoder::encodeFrameInPlace,&m_encoder3,(const char *)wrapperFocus.data3));
  std::thread thread4(std::bind(&VideoFileEncoder::encodeFrameInPlace,&m_encoder4,(const char *)wrapperFocus.data4));;
  thread1.join();
  thread2.join();
  thread3.join();
  thread4.join();
}
My question is whether or not there is some theoretical reason why the above would be worse than having 4 persistent threads.

The Gay Bean fucked around with this message at 03:45 on Jan 20, 2016

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

There is overhead to thread creation and destruction, but you'd have to profile it to see whether it mattered for your workload. Doing it 4x per frame you might well notice, I suppose. There are lots of libraries that will do thread pooling for you, if you don't want to do it yourself.

The Gay Bean
Apr 19, 2004
Thanks guys. Well, we've hit the 30 FPS mark, anyway, and I know something that I can come back and optimize if there is a need later.

Chuu
Sep 11, 2004

Grimey Drawer
If anyone was curious what the debugging issue turned out to be, in a resource class someone set up a diamond in the class hierarchy, and GDB and GCC were accessing different objects with the same name.

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

The Gay Bean posted:

My question is whether or not there is some theoretical reason why the above would be worse than having 4 persistent threads.
Oh, I see. I would definitely set up 4 persistent threads.

nielsm
Jun 1, 2009



The Gay Bean posted:

I'm sorry for being vague, I was just trying to come up with the smallest example of the problem possible.

We have 4 sources of video frames and we want to encode these frames at 30 fps in real time. These are delivered by a C++ library. We have wrapped ffmpeg encoding in a C++ object, and are essentially doing what you're suggesting - running several instances ffmpeg - to encode them.` The below code spins at 25 FPS while occupying 35% of the (4-core) CPU:


This sounds like the obvious use case for a barrier, used to sync four persistent threads between each frame.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!
Does anyone know how to make Visual Studio (2013/2015) do custom build steps based on the filename extension rather than having to manually specify the custom build step for each file as you add it?
I've managed to make a .targets file be getting imported as part of the project, but every tutorial I've found for doing this kind of thing seems to be for an older version where the suggested syntax is no longer supported.

Specifically I'm trying to make it build .proto files to the intermediate .pb.h and .pb.cc files.

Illusive Fuck Man
Jul 5, 2004
RIP John McCain feel better xoxo 💋 🙏
Taco Defender
I can't find what i'm looking for in <algorithm>. I have two std::set and want to know if any elements of set A are in set B, in linear time. I'd rather not use set_intersection because I don't actually care what the element is, or if there are more than one. Am I missing something or do I need to write this myself? I guess it's simple enough.

nielsm
Jun 1, 2009



roomforthetuna posted:

Does anyone know how to make Visual Studio (2013/2015) do custom build steps based on the filename extension rather than having to manually specify the custom build step for each file as you add it?
I've managed to make a .targets file be getting imported as part of the project, but every tutorial I've found for doing this kind of thing seems to be for an older version where the suggested syntax is no longer supported.

Specifically I'm trying to make it build .proto files to the intermediate .pb.h and .pb.cc files.

Making proper new build steps for MSBuild, especially the C++ compilation framework in it, can be quite a task.
One I've worked on and which might be useful as an example is the Yasm targets from Aegisub.
Note that that file doesn't stand on its own, it uses a custom task implemented in the "tasks.props" file next to it. It should be possible to simplify it by removing the logic that can cut up paths and paste together relative paths under the intermediate output directory.

I can write some additional annotations to the file if you want.

nielsm
Jun 1, 2009



Illusive gently caress Man posted:

I can't find what i'm looking for in <algorithm>. I have two std::set and want to know if any elements of set A are in set B, in linear time. I'd rather not use set_intersection because I don't actually care what the element is, or if there are more than one. Am I missing something or do I need to write this myself? I guess it's simple enough.

I don't think anything better than O(m*log(n)) (m and n being sizes of the two sets) worst case is possible without somehow digging into the tree structure of the implementation. Like, you could trivially detect one case of no intersection if minimum in one set is greater than maximum in the other. You can maybe also do some heuristics to improve the average case, based on the distribution of elements apparent from traversing the tree.
But in the end, there should always be cases where you need to do a lookup of every element in the first set, into the other set. And that's O(m*log(n)).

Illusive Fuck Man
Jul 5, 2004
RIP John McCain feel better xoxo 💋 🙏
Taco Defender
No, it's pretty simple to write an algorithm linear in the sizes of the sets (use iterators), but it sounds like the kind of thing the standard library would have, and possibly be better than what I write.

edit: I'm assuming set iterators are at least constant amortized time to increment. That is true, isn't it?

edit2: I found a way to make one of them an unordered set so w/e problem solved

Illusive Fuck Man fucked around with this message at 17:33 on Jan 20, 2016

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Set_intersection with an out iterator that just sets a bool to true on increment would be O(N) and much faster than anything involving an unordered set.

Xarn
Jun 26, 2015

Illusive gently caress Man posted:

No, it's pretty simple to write an algorithm linear in the sizes of the sets (use iterators), but it sounds like the kind of thing the standard library would have, and possibly be better than what I write.

edit: I'm assuming set iterators are at least constant amortized time to increment. That is true, isn't it?

edit2: I found a way to make one of them an unordered set so w/e problem solved

Yeah, linear time is easy and it is up to you to implement it. I would also expect it to be faster than having one of the set be unordered and do lookups in it.

Illusive Fuck Man
Jul 5, 2004
RIP John McCain feel better xoxo 💋 🙏
Taco Defender

Plorkyeran posted:

Set_intersection with an out iterator that just sets a bool to true on increment would be O(N) and much faster than anything involving an unordered set.

That sounds like what I want, thanks. I'll look into that.

I assumed unordered set lookups were constant time and very fast. Is that wrong?

nielsm
Jun 1, 2009



Illusive gently caress Man posted:

That sounds like what I want, thanks. I'll look into that.

I assumed unordered set lookups were constant time and very fast. Is that wrong?

Unordered sets are usually implemented as a hash table. Best case lookup is constant time, but worst case is linear time.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Asymptotic performance isn't everything. A hash table lookup's constant factors (e.g. the time spent hashing the value...) dwarfs that of incrementing an iterator.

Beef
Jul 26, 2004

The Gay Bean posted:

Assume I have several small bits of work to be done, that I want to parallelize. Each bit of work is small, and will be done at ~30 fps; the work must also be done sequentially (video encoding, if you're curious, but I'm curious about the general case). My current instinct on how to approach this would be one of two ways:
...

It sounds like you are trying to manually implement parallel pipeline parallelism. It is typically both more efficient and less programming effort to use a tasking library. Intel Threading Building Blocks even has a pipeline construct that you can use out of the box.

From my years of parallel programming experience, I can definitely tell you that there are very few reasons to do manual threading. Using OpenMP, Cilk, TBB, ... are easier to use and typically a lot faster than anything you whip out by hand.

Adbot
ADBOT LOVES YOU

Beef
Jul 26, 2004

Plorkyeran posted:

Asymptotic performance isn't everything. A hash table lookup's constant factors (e.g. the time spent hashing the value...) dwarfs that of incrementing an iterator.

Hashing a value is basically free from a CPU core point of view, the performance penalty comes from doing a random unpredictable memory access. You basically have 300 cycles that your cpu core sits on its rear end, waiting for the DRAM access to complete.

It is pretty hard to beat simple linear search under 10k elements. And even if hashing beats you under that number, it's because you are doing a silly microbenchmark and all the buckets get loaded into your cache hierarchy.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply