Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Stevor
Feb 18, 2004

THIS IS VERY BIZARRE

shrughes posted:


Gerblyn posted:

I think...

I think you are right Gerblyn, but your explanation really makes sense to me. The instance of the class is created in the main function, and what shrughes said I could implement without a compiler error, but the problem is obviously in the functions I'm calling. That being said, I understand more about instances of classes now, and what they do. I will take these things in mind and try to rework my code.
Thanks guys.

e: after encapsulating all of the code for the sprite (drawing, mechanics, etc) and understanding some of how the constructor works and instancing a class, I've gotten the application to do what I wanted, and I understand it even better now.

Stevor fucked around with this message at 16:07 on Feb 7, 2011

Adbot
ADBOT LOVES YOU

Theseus
Jan 15, 2008

All I know is if there is a God, he's laughin' his ass off.
I have what I hope is a stupid, easily-answered question.

I have a union such that every instance of it must be allocated on a 16-byte boundary. I'm using C++ and the g++ compiler. Unfortunately, I'm having some issues with it: the __attribute__ ((aligned(16))) directive doesn't seem to work. My instances of the union seem to have ended up on 8-byte boundaries instead! For performance reasons, they're being declared on the stack, which I assume is the source of the issue: I've read about a bit to try to find a workaround and there seems to be general consensus that alignment of variables on the stack is not guaranteed. I would make them static to force heap allocation, but I need to make the application multithreaded in the future, so that's not an option. Does anyone have any suggestions for forcing alignment on the stack? Instances of the union are 16 bytes in size themselves, but I'm not adverse to increasing that to as much as 32 bytes if needed.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!
One way you can force alignment of a thing that will work on the stack is to allocate your-alignment-size of extra space, then set a pointer to (start_of_space + alignment) &~ (alignment - 1)

I don't know if the reference point being a pointer will make your performance worse though.

(I found this method used in a chess engine, which aligned board-data to a boundary so that shifts and bitwise operations on a pointer could be used to get board coordinates, back in the day when shifts and bitwise operations were unambiguously faster than mathematical ones.)
code:
//For example
#define ALIGNMENT 16
 char buffer[sizeof(YourUnion)+ALIGNMENT];
 YourUnion *pUnion=(buffer+ALIGNMENT)&~(ALIGNMENT-1);
 //done, pUnion is now an aligned YourUnion object.
If you were making an array of them you wouldn't need to align them individually, you could just make the buffer like
char buffer[sizeof(YourUnion)*ARRAYSIZE+ALIGNMENT];

vvvv Edited to fix my mistake - not my math, just my typing. I had it right but then I replaced hardcoded 16 with ALIGNMENT and accidentally dropped my -1.

roomforthetuna fucked around with this message at 17:07 on Feb 11, 2011

shrughes
Oct 11, 2008

(call/cc call/cc)

Theseus posted:

Does anyone have any suggestions for forcing alignment on the stack?

What architecture are you targeting? What version of GCC are you using? I have a hard time _not_ getting 16-byte alignment.

When using a union u { char z; }, I get proper 16-byte alignment whether i put it on the union, or on the field 'z', or on a particular variable allocated on the stack. When using a union u { struct { int i, j, k, l; } z; }, for example, I can't seem to avoid 16-byte alignment. Even when I use a union u { char z[16]; }.

Edit: Also, roomforthetuna's math is off.

Theseus
Jan 15, 2008

All I know is if there is a God, he's laughin' his ass off.

shrughes posted:

What architecture are you targeting? What version of GCC are you using? I have a hard time _not_ getting 16-byte alignment.

When using a union u { char z; }, I get proper 16-byte alignment whether i put it on the union, or on the field 'z', or on a particular variable allocated on the stack. When using a union u { struct { int i, j, k, l; } z; }, for example, I can't seem to avoid 16-byte alignment. Even when I use a union u { char z[16]; }.

Edit: Also, roomforthetuna's math is off.

I am targeting 32-bit x86 architecture. On many 64- bit machines, 16-byte allocation is the default, but this is not the case on 32-bit machines. My GCC version is 4.4.3, though I have a system locked to 3.4.6 that I also want to run it on.

Hughlander
May 11, 2005

Theseus posted:

I have what I hope is a stupid, easily-answered question.

I have a union such that every instance of it must be allocated on a 16-byte boundary. I'm using C++ and the g++ compiler. Unfortunately, I'm having some issues with it: the __attribute__ ((aligned(16))) directive doesn't seem to work. My instances of the union seem to have ended up on 8-byte boundaries instead! For performance reasons, they're being declared on the stack, which I assume is the source of the issue: I've read about a bit to try to find a workaround and there seems to be general consensus that alignment of variables on the stack is not guaranteed. I would make them static to force heap allocation, but I need to make the application multithreaded in the future, so that's not an option. Does anyone have any suggestions for forcing alignment on the stack? Instances of the union are 16 bytes in size themselves, but I'm not adverse to increasing that to as much as 32 bytes if needed.

What I've done is:
code:
struct SomeStruct
{

};

union
{
    struct SomeStruct;
    char Padding[sizeof(SomeStruct) + sizeof(SomeStruct) % 16];
};

Hughlander fucked around with this message at 19:34 on Feb 11, 2011

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

roomforthetuna posted:

I don't know if the reference point being a pointer will make your performance worse though.
Now I'm curious - someone more low-level than me in modern architecture, would this hurt your performance? Other than the initial assignment I mean.

I suppose I can run a quick test myself!

And I have, and the results are, frankly, weird.
code:
  char buffer[2048];
  char *pbuffer=(char*)(((DWORD)buffer+15)&~15);
  clock_t tm1=clock();
  for (int i=0; i<1000000000; i++) {
    buffer[32]='a';
  }
  clock_t tm2=clock();
  for (int i=0; i<1000000000; i++) {
    pbuffer[32]='a';
  }
  clock_t tm3=clock();
  TRACE(_T("tm2-tm1 = %d\ntm3-tm2 = %d\n"),tm2-tm1,tm3-tm2);
(Forgive my non-64-bit-compatible Windowsisms, and also, apparently my earlier example of how you can manually force alignment wouldn't compile because you have to cast away from a pointer before you can use a bitwise and, then cast back, but you can do that yourself if you decide to use the method!)

So anyway, the results of this test were
tm2-tm1 = 3581
tm3-tm2 = 2920
- reference by the array buffer was actually slower than by the pointer.

I thought this might be because one was run while things were still starting up, so I switched them around, pbuffer first, which gave the results
tm2-tm1 = 3334
tm3-tm2 = 3519
- array buffer still slower than pointer.

So then I thought, maybe it's *because* the pointer is better aligned (it was; buffer=0x0012f6cc, pbuffer=0x0012f6d0), so I changed it to pbuffer[28] so they'd be working on the exact same byte. Results (pbuffer still going first)
tm2-tm1 = 3343
tm3-tm2 = 3528
- array buffer still slower than pointer.

So then I thought, how about if they're both actually the same value! So I added some padding bytes before buffer, to bring it to a round 16. Both buffer and pbuffer were now 0x0012f6d0, and referencing pbuffer[32] and buffer[32]. results:
tm2-tm1 = 3323
tm3-tm2 = 3544
- array buffer still slower than pointer. (Reminder - after the first one, the results are pbuffer first then buffer.)

So, er, what's up with this? Is it because the pointer is already in a register, but using the array means it gets loaded into a register every time before you can add 32 to it? Is this something that would optimise away under a speed optimization (I used a "no optimize" build)?

Anyway, in conclusion, using this method to force alignment of your data appears to, at the very least, not significantly hinder performance.

raminasi
Jan 25, 2005

a last drink with no ice
I have another move semantics question (I'm still feeling pretty :saddowns: about all this). Given my (hopefully fixed) class definition from before:
code:
class material {
	std::string _name;

	material(const material & src); // it turns out I don't want copying
	material & operator = (const material & src);

public:
	const std::string & name() const { return _name; }

	material(material && src) : name(std::move(src._name)) { } // derp derp
	explicit material(const cppw::Instance * inst);
	
	materal & operator = (material && src) { _name = std::move(src._name); }

	friend bool operator == (const material & lhs, const material & rhs);
};
Why, later, does
code:
void some_consumer::test(material && m) {
	material n(m);
}
fail to compile with error C2248: 'material::material' : cannot access private member declared in class 'material' (with Intellisense telling me that the copy constructor is inaccessible)? I thought that I wouldn't need to use std::move because m is already an rvalue reference, so I'm clearly missing something.

Paniolo
Oct 9, 2007

Heads will roll.
Adding std::move should fix it. I ran into a similar thing myself, it seems that you always need to add std::move. Someone who understands the mechanics a little better can probably explain why.

That Turkey Story
Mar 30, 2003

GrumpyDoctor posted:

I thought that I wouldn't need to use std::move because m is already an rvalue reference, so I'm clearly missing something.

The type of m there is an rvalue reference type, however, using a named rvalue reference in an expression always yields an lvalue. You only "see" rvalues when they are actual temporaries or when you are directly working with the return of a function whose return type is an rvalue reference type (such as with std::move).

Sneftel
Jan 28, 2009
What I think is going on there is that m is itself an lvalue (despite being an rvalue-reference), causing it to prefer to bind to the (private) lvalue-taking constructor. By shoving a std::move in there, you get an rvalue version of it, which binds to the rvalue-taking constructor.

But I could be wrong about all that.

EDIT: But if so, I'm in good company!

Optimus Prime Ribs
Jul 25, 2007

Is it possible to access types created with typedef inside of a templated class?
I tried doing it like this:
code:
template <class _Ty>
struct MyClass
{
	typedef int MyFooTest;
	MyFooTest	getFoo();
};

template <class _Ty>
MyClass<_Ty>::MyFooTest MyClass<_Ty>::getFoo()
{
	return 0;
}
But I just get the error: missing ';' before 'MyClass<_Ty>::getFoo'.
I'm not that great with templates so I imagine I'm doing something pretty wrong here.

OddObserver
Apr 3, 2009
You need to say
'typename MyClass<_Ty>::MyFooTest' when referring to such a type in templated contexts.

Optimus Prime Ribs
Jul 25, 2007

Well that was simple.
Thanks buddy. :)

litghost
May 26, 2004
Builder

Optimus Prime Ribs posted:

Well that was simple.
Thanks buddy. :)

Just for a little more detail, the problem here is non-depedent typenames.

Jam2
Jan 15, 2008

With Energy For Mayhem
I picked up "The C Programming Language" and I'm just getting started with the language. I want to start using C to tackle programming puzzles and develop skills along the way.

Which environment is a better development environment for C, Windows or OS X?

Brecht
Nov 7, 2009

Jam2 posted:

Which environment is a better development environment for C, Windows or OS X?
OS X, unquestionably.

HFX
Nov 29, 2004

Brecht posted:

OS X, unquestionably.

Cygwin / Mygwin and Eclipse make it a bit more of a tossup, but I would probably agree with Brecht for the most part.

Gerblyn
Apr 4, 2007

"TO BATTLE!"
Fun Shoe

roomforthetuna posted:

So, er, what's up with this? Is it because the pointer is already in a register, but using the array means it gets loaded into a register every time before you can add 32 to it? Is this something that would optimise away under a speed optimization (I used a "no optimize" build)?

Anyway, in conclusion, using this method to force alignment of your data appears to, at the very least, not significantly hinder performance.

My best guess would be that the processor can access memory aligned on a 16 byte boundary faster than one on an 4 byte boundary, though I don't know enough about processor architecture to say for sure. If you run the code in a debugger, you should be able to examine the assembly that the compiler has produced for each loop. You might be able to spot a difference in the way that the pointer arithmetic works between loops which could explain the difference as well...

Jam2 posted:

Which environment is a better development environment for C, Windows or OS X?

I use MS Visual C++ and I find it a pretty solid system to work in. It's bloody expensive though, so you may prefer using Eclipse, which is free. I've never used it for C++ myself, but I know it's a very popular choice.

Optimus Prime Ribs
Jul 25, 2007

Gerblyn posted:

I use MS Visual C++ and I find it a pretty solid system to work in. It's bloody expensive though, so you may prefer using Eclipse, which is free.

Visual Studio is what I use for C++ development as well. I don't like VS2010 one bit, but VS2008 does everything I need it to and I've never had a reason to use anything else. But I got lucky and got it for free through school. v:shobon:v
It's certainly not a bad choice, but as for the "better" choice then yeah I'd go with OSX.

shrughes
Oct 11, 2008

(call/cc call/cc)

roomforthetuna posted:

So, er, what's up with this? Is it because the pointer is already in a register, but using the array means it gets loaded into a register every time before you can add 32 to it? Is this something that would optimise away under a speed optimization (I used a "no optimize" build)?

Generally speaking there would be two differences: with pbuffer you'll be accessing memory relative to some register with the value of the pointer (e.g. accessing %rax+32 with a hard-coded offset, if the pointer is stored in %rax), but using buffer directly you might be accessing memory relative to the %rbp register. Since your buffer is 2048 bytes, you'll get an instruction writing to %rbp-2016 or something.

Since 2016 doesn't fit in a byte, this takes a longer instruction, and probably a more expensive instruction, than one that writes to %rax+32. It's certainly a different instruction.

Or maybe it's moving %rsp down to the bottom of the array, and then accessing %rsp+32. The instruction encoding is different for the %rsp register for some reason.

For example, writing 'a' to %rax-80, %rbp-80, and %rsp-80:
code:
c6 40 b0 61             movb   $0x61,-0x50(%rax)
c6 45 b0 61             movb   $0x61,-0x50(%rbp)
c6 44 24 b0 61          movb   $0x61,-0x50(%rsp)
Maybe the %rsp-using instruction is slower. I've never understood the purpose of using both the %rbp and %rsp registers for stack frames, and I don't know what VC++ would output.

Duke of Straylight
Oct 22, 2008

by Y Kant Ozma Post

Jam2 posted:

Which environment is a better development environment for C, Windows or OS X?

It's C. It probably works on your toaster. Just use whatever environment you're comfortable with and whatever IDE or editor works best for you.

Jam2
Jan 15, 2008

With Energy For Mayhem
What do I need to get started writing code and compiling C on windows? What about on OS X?

Mustach
Mar 2, 2003

In this long line, there's been some real strange genes. You've got 'em all, with some extras thrown in.

Gerblyn posted:

I use MS Visual C++ and I find it a pretty solid system to work in. It's bloody expensive though, so you may prefer using Eclipse, which is free. I've never used it for C++ myself, but I know it's a very popular choice.
Visual C++ Express Edition doesn't cost any money.

On OS X, you would use XCode, which is also free.

Also, installing either of those two gives you access to command-line compilers, which you may prefer over an IDE. cl on Windows and clang on OS X

Mustach fucked around with this message at 16:38 on Feb 13, 2011

nielsm
Jun 1, 2009



Jam2 posted:

What do I need to get started writing code and compiling C on windows?

Visual C++ Express

Jam2 posted:

What about on OS X?

Xcode Tools from your OS X install DVD.

Brecht
Nov 7, 2009

Mustach posted:

Visual C++ Express Edition doesn't cost any money.

On OS X, you would use XCode, which is also free.

Also, installing either of those two gives you access to command-line compilers, which you may prefer over an IDE. cl on Windows and clang on OS X
And gcc, which is what you should be using if you're just learning the language.

edit: clang is good too though

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

roomforthetuna posted:

So, er, what's up with this? Is it because the pointer is already in a register, but using the array means it gets loaded into a register every time before you can add 32 to it? Is this something that would optimise away under a speed optimization (I used a "no optimize" build)?

Microbenchmarks without optimization are meaningless — the compiler doesn't necessarily even use the same instruction selection and register allocation algorithms for non-optimized builds. Even -O1 kills your loops in both these cases, or at least it does in clang.

That said, I agree with shrughes's analysis of your results; it's almost certainly some vagary of instruction selection.

Mustach
Mar 2, 2003

In this long line, there's been some real strange genes. You've got 'em all, with some extras thrown in.

Brecht posted:

And gcc, which is what you should be using if you're just learning the language.

edit: clang is good too though
I think clang is better for a beginner, because while it supports all of the gcc flags they're likely to see while googling things, it gives monstrously better error messages.

Scaevolus
Apr 16, 2007

shrughes posted:

Since 2016 doesn't fit in a byte, this takes a longer instruction, and probably a more expensive instruction, than one that writes to %rax+32. It's certainly a different instruction.

Wouldn't they probably be the same size when converted to uOps?

pseudorandom name
May 6, 2007

At the very least, it'll be more expensive in the sense that the instruction is longer, with all the implications that has for the I-cache, decoder, etc.

Scaevolus
Apr 16, 2007

Speculating about a test like this without actually reading the assembly is pointless.

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Scaevolus posted:

Speculating about a test like this without actually reading the assembly is pointless.

So you learned your lesson from the minecraft project :v:

volatile bowels
Sep 7, 2009

All-Star
what does z=x++ + y mean?

i know z= ++x+y means x=x+1 and then add y to the new x. I'm a little confused on the first one...I know I could just throw it into a compiler, but I need to figure it out by hand for a test at some point


vvv Thanks!

volatile bowels fucked around with this message at 06:56 on Feb 15, 2011

DeciusMagnus
Mar 16, 2004

Seven times five
They were livin' creatures
Watch 'em come to life
Right before your eyes
z is equal to the current (before increment) value of x added to y. After the next sequence point, x will be incremented by one.

Scaevolus
Apr 16, 2007

Otto Skorzeny posted:

So you learned your lesson from the minecraft project :v:

I was running tests, and reading the assembly. :colbert:

shrughes
Oct 11, 2008

(call/cc call/cc)

DeciusMagnus posted:

z is equal to the current (before increment) value of x added to y. After the next sequence point, x will be incremented by one.

No, x will be incremented before the next sequence point. After the next sequence point, it will have been incremented by one. And if x is a non-primitive type, it will be incremented before the expression x++ returns the original value of x.

that awful man
Feb 18, 2007

YOSPOS, bitch

shrughes posted:

Or maybe it's moving %rsp down to the bottom of the array, and then accessing %rsp+32. The instruction encoding is different for the %rsp register for some reason.

For example, writing 'a' to %rax-80, %rbp-80, and %rsp-80:
code:
c6 40 b0 61             movb   $0x61,-0x50(%rax)
c6 45 b0 61             movb   $0x61,-0x50(%rbp)
c6 44 24 b0 61          movb   $0x61,-0x50(%rsp)

The encoding is longer when using ESP/RSP than any other register because its code is used as an escape in the Mod R/M byte to indicate that a SIB byte follows.

shrughes posted:

I've never understood the purpose of using both the %rbp and %rsp registers for stack frames

I don't have a definitive answer for this, but I point out:
  • The ENTER and LEAVE instructions assume the use of EBP as a frame pointer.
  • On the 8086, you could only address memory relative to a base register (BX or BP), an index register (SI or DI), or the sum of a base register and an index register. When 32-bit mode was introduced things improved, with the interpretation of the Mod R/M byte being simplified and the introduction of the SIB byte. But 32-bit routines could still call 16-bit routines so you have to be backward-compatible...
  • You could get away with using only ESP, and indeed some RISC machines only use a stack pointer, but every time you pushed something onto the stack the offsets for the arguments/locals would change. This wouldn't be a huge problem today, but compiler technology was not so advanced in the early 80s and once you've got libraries that use that sort of linkage...

There are probably more reasons, but it's late so :effort:

pseudorandom name
May 6, 2007

Using BP also makes stack unwinding really easy.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
There's a standard code-generation optimization called frame pointer elimination that does exactly what you're suggesting; it can actually be a fairly nice win on x86-32, given the paucity of registers. Since it, by definition, destroys the chain of stack frames, it usually does nasty things to utilities that rely on walking that, e.g. stack trace dumpers and other debugging tools. Most exceptions implementations use metadata schemes which are capable of walking through FPE frames, but not all of them; IIRC, FPE breaks Windows SEH (or would if SEH didn't just disable it).

The only time you *can't* do FPE is when a function dynamically varies its stack usage, e.g. because it uses variable-length arrays or alloca(); in that case you're forced to keep the frame pointer around so that you have a stable reference to the locals.

rjmccall fucked around with this message at 10:01 on Feb 15, 2011

Adbot
ADBOT LOVES YOU

Ciaphas
Nov 20, 2005

> BEWARE, COWARD :ovr:


A dumb question about debuggers. Are they designed to work only with executables output by particular compilers, or at least a limited set of compilers?

I ask because I've been asked if I'd like to use Visual Studio at work instead of Sun Studio, the caveat being that we still have to use SunCC for compiling for now. So I'd like to use the Visual Studio debugger if possible.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply