Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
floWenoL
Oct 23, 2002

That Turkey Story posted:

Wait, what? If anything I can think of cases where unsigned operations can be optimized whereas signed operations cannot, not the other way around. Maybe you know something I don't, but either way, that doesn't change the fact that using an unsigned type may be correct whereas a signed type isn't (or vice versa). Pick your type based on what makes your code more correct.

Overflow behavior for signed integers is undefined, and so that enables the compiler to pretty much assume that signed integers don't overflow and make optimizations appropriately. Consider,

code:
bool foo(int a) {
  return (a + 3) > a;
}
The compiler can assume that a + 3 doesn't overflow and thus can compile the function to simply:

code:
_Z3fooi:
.LFB2:
        movl    $1, %eax
        ret
whereas replacing 'int' with 'unsigned int' would force rollover behavior, and thus the compiler cannot assume that a + 3 > a:

code:
.globl _Z3fooj
        .type   _Z3fooj, @function
_Z3fooj:
.LFB2:
        leal    3(%rdi), %eax
        cmpl    %eax, %edi
        setb    %al
        movzbl  %al, %eax
        ret
Of course, there may be optimizations that work for unsigned only (>=0 tests can be eliminated, etc.) but I'm pretty sure those are far less applicable in most code.

In any case, the recommendation (as I understand it) applies to what to pick as a default signedness. If you need unsigned of course you should use it, but the number of use cases where you really need unsigned (other than bitfields) and not just a bigger int is rare.

Adbot
ADBOT LOVES YOU

Cheesus
Oct 17, 2002

Let us retract the foreskin of ignorance and apply the wirebrush of enlightenment.
Yam Slacker
While I've worked primarily with Perl and PHP in the past 9 years, before that I worked primarily in C. I have a fair knowlege of objects in the above languages and C++ (circa 1992) but for all intents and purposes, I'm pretty much a C++ noob.

I think the biggest thing thats tripping me up is templates but I'm slowly getting there...

Currently I'm working with the Boost libraries and Spirit in particular to write a parser that will be used from a Perl script to parse log file entries. I have this "working" but am convinced I'm doing many things "wrong" and would appreciate some advice on organization.

I've figured out how to chain Boost/Spirit grammars together but it's making for a mighty unwieldy single file. So I'm breaking out the grammers including their related functions into other files.

What is the recommended method for doing this? I'm currently extracting the grammar definitions into .hpp files and including them into my main .cpp file. That seems nasty and wrong but I don't see a more reasonable way of doing it.

JoeNotCharles
Mar 3, 2005

Yet beyond each tree there are only more trees.

Cheesus posted:

Currently I'm working with the Boost libraries and Spirit in particular to write a parser that will be used from a Perl script to parse log file entries. I have this "working" but am convinced I'm doing many things "wrong" and would appreciate some advice on organization.

I've figured out how to chain Boost/Spirit grammars together but it's making for a mighty unwieldy single file. So I'm breaking out the grammers including their related functions into other files.

Wow, how complex are these log files that you can't just use a line-by-line reader and boost::regex?

Vanadium
Jan 8, 2005

floWenOl posted:


01:52 <@floWenoL> something like 'if you're calling out to C++ _from perl_ to
                  parse log files you don't know perl as well as you think you
                  do'
I like C++ and all but perhaps you should stick to perl for your text processing needs.

ZorbaTHut
May 5, 2005

wake me when the world is saved

That Turkey Story posted:

First, before going into anything at all, I'd recommend using iterators here which as a side-effect avoids the issue of sign entirely, and if for some reason you didn't do that, I'd still say say use string::size_type instead of int and just don't write an algorithm which relies on negative values. The reason is, if you write your loop correctly and your loop uses the proper size type, your code works for all strings. If, on the other hand, you use int, your loop variable will risk overflow on larger strings no matter how you write it.

It would be hard for me to disagree with more points here :v:

First, the only thing I agree with: Yes, using iterators bypasses the issue of sign nicely. Sometimes, however, that's just not practical - I'm giving a small example of a situation where it breaks in an unexpected and subtle way, not a fullfledged example of a case where it's an inevitable bug (since I don't believe those exist.)

However, I strongly disagree that using a signed int will cause a problem here. The vast majority of 32-bit systems aren't even capable of making a single 2gb allocation, due to reserved chunks of the address space, and even if they're capable of doing it, trying to get a string to do it is one of the most dubious things I've heard of (what happens when you append to it and the string decides to reallocate? What if you use .c_str() and the string decides to reallocate, which it is allowed to do?) It's just not something that ever comes up - I consider this to be well within the bounds of "this never happens". And even if it does, I'd say a better solution would be to use a longer signed type, like "long" (which will be 64-bit on virtually any sane system where .size() can be 1<<31) or "long long" (which will be 64-bit and which is nearly part of C++), not an unsigned type, for the same power-distribution reasons I mentioned beforehand.

Also, saying "if you write it correctly, it always works! :hurr:" is kind of meaningless, because obviously it'll work if you write it correctly. What I'm getting at here is that unsigned variables make it much easier to introduce subtle unexpected bugs, thereby either reducing the chance that you'll write it correctly or increasing the amount of effort and time it takes to write it correctly (take your pick) while signed variables avoid a small but significant class of errors.

quote:

Either way, your for loop example is hardly indicative of why you should never use unsigned types even if that code actually were able to handle all strings. If anything, at least just recommend defaulting to signed if you are unsure rather than disallowing unsigned types completely for arithmetic calculations, though even that I'd disagree with. Use the type that makes sense for the job. Period.

And this is why I believe in style guides, not style requirements. Yes, I agree with what you're saying - however I also think that unsigned is very rarely the right decision. I'd phrase it more as "don't use unsigned for arithmetic unless you're absolutely sure of what you're doing, and comment it if so".

more falafel please
Feb 26, 2005

forums poster

ZorbaTHut posted:

It would be hard for me to disagree with more points here :v:

First, the only thing I agree with: Yes, using iterators bypasses the issue of sign nicely. Sometimes, however, that's just not practical - I'm giving a small example of a situation where it breaks in an unexpected and subtle way, not a fullfledged example of a case where it's an inevitable bug (since I don't believe those exist.)

And it's a bad example, because the simpler, more correct solution would be to use iterators, in which case you would have had to go way out of your way to introduce a bug like that.

ZorbaTHut posted:

It's just not something that ever comes up - I consider this to be well within the bounds of "this never happens".

It's those subtle bugs -- like the ones that can be easily introduced by using offsets instead of iterators, that make "this never happens" turn into "But I never thought that would happen". Every simple programming error that can be made, will probably be made, often by you. Use a simpler and less error-prone solution where possible.

ZorbaTHut posted:

And even if it does, I'd say a better solution would be to use a longer signed type, like "long" (which will be 64-bit on virtually any sane system where .size() can be 1<<31)

I'm not aware of any 32-bit system in which long is 64-bit, is that what you're saying?

"ZorbaTHut posted:

or "long long" (which will be 64-bit and which is nearly part of C++), not an unsigned type, for the same power-distribution reasons I mentioned beforehand.

Well, "long long" technically isn't a part of C++. Compilers/runtimes in general can deal with it, but 64-bit long long on 32 bit systems generally means software manipulation, which is significantly slower. But the real problem here is that you're saying "adding 1 more bit to make sure you don't overflow/underflow is silly, you should add 32 bits." It is silly to add one bit, but just extending that to 32 bits is almost as silly -- it's a variation of throwing hardware at the problem, when the problem only exists because you're not using the correct method in the first place.

ZorbaTHut posted:

Also, saying "if you write it correctly, it always works! :hurr:" is kind of meaningless, because obviously it'll work if you write it correctly. What I'm getting at here is that unsigned variables make it much easier to introduce subtle unexpected bugs, thereby either reducing the chance that you'll write it correctly or increasing the amount of effort and time it takes to write it correctly (take your pick) while signed variables avoid a small but significant class of errors.

What he's saying is that using the correct approach means that your chance of not writing it correctly is decreased. Use signed ints for ints that will always be in a particular range that can be negative, use unsigneds for ints that will always be in a particular nonnegative range, and don't use ints when there's a more appropriate and less error-prone abstraction available to you.

ZorbaTHut
May 5, 2005

wake me when the world is saved

more falafel please posted:

And it's a bad example, because the simpler, more correct solution would be to use iterators, in which case you would have had to go way out of your way to introduce a bug like that.

Okay I want to start this off by saying something:

I have never claimed that iterators shouldn't be used.

Okay? Can we end this part of the discussion? The point I'm making is that signed should almost always be preferred over unsigned. When I bring this up, the counter-argument is usually "but the STL containers are unsigned, so they can in theory store more than a signed value's worth of things", or "well, what if you know it can't be negative, like the number of items in a container". That's what I'm debating.

If your response to "you should use signed instead of unsigned" is "you're wrong, you should only use iterators!" then we're not in a debate, we're not even talking about the same thing.

As for my example, I've annoyingly frequently dealt with algorithms where I needed to compute things for adjacent pairs of elements. In this case, yes, you can use iterators (with a rather ugly itr[1] or *(itr + 1) for the next element, or an equally ugly technique including two iterators, and in both cases an extra conditional needed to start the loop) - but in several cases I've also needed to worry about the indexes of values, either because I need to store indices in a manner which is preserved across multiple copies/versions of the container, or because I need to append things to the container later, or because I need to store information in some way which is transferrable to someone who doesn't own that particular instance. I suppose I could technically store iterators, and then turn *that* into indices at the transfer point, or traverse with iterators and then get indices out of that, but it honestly seems kind of dumb when I can just stick with indices in the first place.

I didn't bother to write up an entire algorithm because it just didn't seem applicable to "signed versus unsigned". There are cases where you want to use indices instead of iterators, and sometimes you also want a loop inside them, and sometimes you want to do math on indices (in fact, frequently you want to do math on indices, as frequently this is why you need indices in the first place.)

more falafel please posted:

It's those subtle bugs -- like the ones that can be easily introduced by using offsets instead of iterators, that make "this never happens" turn into "But I never thought that would happen". Every simple programming error that can be made, will probably be made, often by you. Use a simpler and less error-prone solution where possible.

more falafel please posted:

I'm not aware of any 32-bit system in which long is 64-bit, is that what you're saying?

Range overflows on in-memory containers due to signed variables are a bug that really can't exist on any common 32-bit system. 32-bit Windows is, as I remember, simply incapable of allocating a contiguous 2gb of memory. It just can't do it, period, /3gb or not. I seem to remember that 32-bit Linux is the same unless you go and jump through a lot of hoops. Both of these are due to small chunks taken out of the memory space at various locations. Try malloc(0x80000000) on your compiler and, unless you've seriously tweaked your system, it'll return 0.

Meanwhile, on 64-bit systems, "long" is easily sufficient for any size that you might need - once again, no processor being made today or even in the projected far future is even capable of addressing more than a tiny fraction of the address space. In both cases, the "signed" version is sufficient for the actual range you'll be using, avoiding the nasty problems with basic math that "unsigned" is occasionally prone to.

more falafel please posted:

Well, "long long" technically isn't a part of C++. Compilers/runtimes in general can deal with it, but 64-bit long long on 32 bit systems generally means software manipulation, which is significantly slower. But the real problem here is that you're saying "adding 1 more bit to make sure you don't overflow/underflow is silly, you should add 32 bits." It is silly to add one bit, but just extending that to 32 bits is almost as silly -- it's a variation of throwing hardware at the problem, when the problem only exists because you're not using the correct method in the first place.

It technically isn't, but it's the next best thing. It's in c++0x, it has been for years, it's absolutely going to be in the final standard, it's part of C99, every serious compiler on the planet supports it. Unless you know you're going to be using a compiler from 2001 you can really rely on its existence now.

Adding one more bit to make sure you don't overflow/underflow isn't silly. Adding one more bit to make sure you don't overflow, while adding a similarly large (or, IMHO, even larger) chance of underflow, is silly. I'm not saying "signed solves all your problems" or even "signed doesn't create problems", I'm saying "signed solves far more problems than it creates".

more falafel please posted:

What he's saying is that using the correct approach means that your chance of not writing it correctly is decreased. Use signed ints for ints that will always be in a particular range that can be negative, use unsigneds for ints that will always be in a particular nonnegative range

And I am saying that is the incorrect approach. Unsigned is dangerous and errorprone if you're doing any sort of math to it, and very few people sit down and think "okay, I'm doing math . . . am I doing math to an unsigned variable? Am I absolutely sure this math is safe, when my instincts tell me it is? Oh no, it's not, it might become negative one!"

My point, in summary:

It's rare that a value is too large for signed someinttype but small enough for unsigned someinttype. It's relatively common that math involves comparisons or subtraction that can cause unsigned to accidentally underflow or behave unexpectedly. It's extremely rare that the range of "unsigned" is required and that just making the type longer isn't an acceptable solution. Mixing signed and unsigned can lead to weird results (assert((unsigned int)0xa0000000 > (int)0xb0000000)) fails on my system, for example) and that, in lieu of a magic bullet that can make all these problems go away, using "signed" and only resorting to unsigned when it is absolutely necessary will save you from more grief than any other option.

Basically what I'm saying here is along similar lines to "you probably shouldn't be using reinterpret_cast as a common feature in your code."

Zombywuf
Mar 29, 2008

ZorbaTHut posted:

(assert((unsigned int)0xa0000000 > (int)0xb0000000)) fails on my system, for example)

Any decent compiler ought to give a warning on that. Which is a good reason to use unsigned for data that should never be negative. When you get Warning: comparison between signed and unsigned types on line XXX you know it's time to put in something like this:

code:
int safe_unsigned_to_int(unsigned int in) {
  if (in > (unsigned int)std::numeric_limits<int>::max())
    throw std::range_error("overflow");
  else
    return in;
}

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

ZorbaTHut posted:

If your response to "you should use signed instead of unsigned" is "you're wrong, you should only use iterators!" then we're not in a debate, we're not even talking about the same thing.

So your argument basically boils down to "but what if I really want to use the wrong tool for this job?"

ZorbaTHut posted:

Range overflows on in-memory containers due to signed variables are a bug that really can't exist on any common 32-bit system. 32-bit Windows is, as I remember, simply incapable of allocating a contiguous 2gb of memory.

Lazy lists. :colbert:

ZorbaTHut posted:

Meanwhile, on 64-bit systems, "long" is easily sufficient for any size that you might need - once again, no processor being made today or even in the projected far future is even capable of addressing more than a tiny fraction of the address space.

Not with the LLP64 data model, which MSVC uses (longs are still 32-bit there).

crazypenguin
Mar 9, 2005
nothing witty here, move along
Zorba provides a slightly contrived situation but a very good example of the kind of subtle and unintuitive bugs unsignedness can produce, and all you can do is rave about how the situation is slightly contrived?

The point is you can do perfectly normal things with signed integers because it's pretty easy to avoid +/- 2 billion (all you have to consider is whether 32 bits is big enough). You pretty much always have to start thinking about modular arithmetic with unsigned integers because it's drat hard to avoid 0.

greatn
Nov 15, 2006

by Lowtax
I'm having to do a little int to string conversion and I'm getting a silly little error I can't figure out.

The background here is that this is some firmware where there is no room for including the standard library or extraneous h files for conversion, so I'm converting it myself. It isn't too bad because my only possible numbers are between 0 and 200. All I have to do is split the numbers up and convert them a character at a time, by adding '0'.

Let's say my value is 199, I'll always get "1991" though and I don't know why.

Here's my code:

code:
int encodercount = GetEncoderCount() ;
int hundreds = encodercount / 100 ;
int tens = (encodercount % 100) / 10 ;
int ones = (encodercount %100) % 10 ;

char hundred = (char)hundreds + '0' ;
char ten = (char)tens + '0' ;
char one = (char)ones + '0' ;

char number[3] ;
number[0] = hundred ;
number[1] = ten ;
number[2] = one ;

PutString(number) ;
Now this code pretty much works, except there is a phantom '1 put in at the end for some reason, and the value of '100' looks really really weird(which I just realized is because I'm adding '0' to whatever the value of 10 is which wouldn't work, I can fix that easily).

Any ideas?

greatn fucked around with this message at 21:57 on Jul 3, 2008

Standish
May 21, 2001

greatn posted:

Now this code pretty much works, except there is a phantom '1 put in at the end for some reason
Don't you need to null-terminate the "number" string?

csammis
Aug 26, 2003

Mental Institution

greatn posted:

code:
char number[3] ;
[b]char[/b] number[0] = hundred ;
[b]char[/b] number[1] = ten ;
[b]char[/b] number[2] = one ;

I don't think you mean this. Also number should be declared with size 4, and the last entry set to '\0'

greatn
Nov 15, 2006

by Lowtax

csammis posted:

I don't think you mean this. Also number should be declared with size 4, and the last entry set to '\0'

Yeah, I mistyped that, no chars there. Ah, I forgot I needed to do that, thanks, with the '\0'. It's been so long since I dealt with having to actually null terminate I forgot you even had to.

floWenoL
Oct 23, 2002

Avenging Dentist posted:

So your argument basically boils down to "but what if I really want to use the wrong tool for this job?"

Yes, iterators are the right tool for the job 100% of the time! :downs:

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

floWenoL posted:

Yes, iterators are the right tool for the job 100% of the time! :downs:

Then show me an example where both iterators are inappropriate and the unsigned-ness of size_t is an issue. :colbert:

floWenoL
Oct 23, 2002

That Turkey Story posted:

First, before going into anything at all, I'd recommend using iterators here which as a side-effect avoids the issue of sign entirely, and if for some reason you didn't do that, I'd still say say use string::size_type instead of int and just don't write an algorithm which relies on negative values.

It's worth pointing out that using iterators here isn't correct either. For an empty string, begin() == end(), and so you'd be comparing one before the beginning of the array, which isn't necessarily valid (according to the standard). Not to mention the fact that if you used == instead of < (as is common with iterators) that would be an error, too.

floWenoL
Oct 23, 2002

Avenging Dentist posted:

Then show me an example where both iterators are inappropriate and the unsigned-ness of size_t is an issue. :colbert:

It's not that iterators are entirely inappropriate, it's just that sometimes using indices is clearer; recommending "always use iterators" is the C++ equivalent to the (equally fallacious) mantra of "always use pointers". If you have a vector and you know you won't have enough elements to run into size issues, using iterators is unnecessary, verbose, and in fact may introduce bugs due to the fact that you have to repeat the name of the container twice, which is exacerbated by the fact that said verbosity encourages copying-and-pasting iterator-using for loops as to avoid having to type out vector<blah blah>::const_iterator or "typename T::const_iterator" (don't forget the typename!) yet again.

Entheogen
Aug 30, 2004

by Fragmaster
i just use iterators for FOREACH in C++ and use signed integers for everything else.

code:
#define FOREACH(_it,_l) for(__typeof((_l).begin()) _it=((_l).begin());(_it)!=(_l).end();(_it)++)
the reason to use signed integers for indecies of arrays is that you can set it to -1 to signify it not pointing to anything. Which sometimes you have to do. Otherwise what is the issue? both take up the same space.

That Turkey Story
Mar 30, 2003

crazypenguin posted:

Zorba provides a slightly contrived situation but a very good example of the kind of subtle and unintuitive bugs unsignedness can produce, and all you can do is rave about how the situation is slightly contrived?
You need to reread this conversation if that is what you pulled out. The problem is the signed version won't work for all strings whereas the unsigned version will. You're trading writing a proper algorithm for the ability to avoid an amateur mistake that could easily be picked up in testing. One is correct, the other is not.

crazypenguin posted:

The point is you can do perfectly normal things with signed integers because it's pretty easy to avoid +/- 2 billion (all you have to consider is whether 32 bits is big enough). You pretty much always have to start thinking about modular arithmetic with unsigned integers because it's drat hard to avoid 0.
What the heck are you people talking about? I'm no genius here, but avoiding going below 0 is not exactly rocket science. Maybe if someone were coming from a language without unsigned values I can see them being in the habit of not thinking about sign when jumping into C++, but you have to get over that. In C++ you are working with standard data-structures that deal with particular types and you are simply wrong if your algorithms that deal with them are using a mismatched type (unless you force certain constraints that are unnecessary for an algorithm written with the intended types). I don't understand how we can even be having this discussion.

floWenoL posted:

It's worth pointing out that using iterators here isn't correct either. For an empty string, begin() == end(), and so you'd be comparing one before the beginning of the array, which isn't necessarily valid (according to the standard). Not to mention the fact that if you used == instead of < (as is common with iterators) that would be an error, too.
Iterators are perfectly fine here. Of course you can't just swap out signed types for iterators and have the algorithm work the way you wrote it, just like you can't swap out signed values for unsigned and have it "work." You have to write the algorithm appropriately for the type. The difference is that when you write the code correctly with size_type or iterators, your code will work for all strings. If instead you write it with int as was done in that example, it simply won't work for all strings. It boggles my mind that I have to argue with computer scientists because they simply insist on using the incorrect type for their algorithm. It's not hard to simply write your algorithm to not rely on negative values. Again, here is your choice -- write your code correctly, or write it incorrectly just to avoid an amateur mistake that can easily be found in testing anyway.

At least Avenging Dentist and falafel are sane.

floWenoL posted:

It's not that iterators are entirely inappropriate, it's just that sometimes using indices is clearer; recommending "always use iterators" is the C++ equivalent to the (equally fallacious) mantra of "always use pointers".
It's not that you should always use iterators, just that it's a more generic and safer default than using random access, especially if you are doing so with the wrong type. Why? Firstly, it gets rid of this whole data-type issue which I honestly didn't even realize people had a problem with until this thread, and secondly, if you write your algorithm with iterators it's a much more generic framework for the algorithm that can easily be adapted to work with any range that has similar requirements. It's not any more difficult than working with indices at the high level, it's just different.

floWenoL posted:

If you have a vector and you know you won't have enough elements to run into size issues, using iterators is unnecessary, verbose, and in fact may introduce bugs due to the fact that you have to repeat the name of the container twice, which is exacerbated by the fact that said verbosity encourages copying-and-pasting iterator-using for loops as to avoid having to type out vector<blah blah>::const_iterator or "typename T::const_iterator" (don't forget the typename!) yet again.
Those errors you are talking about would be caught by the compiler before you even run the program, and I know you hate hearing it, but the STL, TR1, and boost have plenty of tools available that take away redundancy. We'd all love to have auto and decltype I know, but for now it's the best one can do.

Entheogen posted:

the reason to use signed integers for indecies of arrays is that you can set it to -1 to signify it not pointing to anything. Which sometimes you have to do. Otherwise what is the issue? both take up the same space.
:sigh: Maybe everyone really should just use java. I don't know how to explain this anymore clearly.

That Turkey Story fucked around with this message at 01:55 on Jul 4, 2008

Entheogen
Aug 30, 2004

by Fragmaster
hey, i never had issues with C++ iterators and STL containers. I use Java for bigger projects because its easier to work with. C++ is good for small programs when you just want to calculate some stuff though.

oh ok, is all the fuss about because many functions take size_t instead of int? i don't see how this could become an issue unless you are dealing with large indecies.

Entheogen fucked around with this message at 01:54 on Jul 4, 2008

floWenoL
Oct 23, 2002

Entheogen posted:

I use Java for bigger projects because its easier to work with. C++ is good for small programs when you just want to calculate some stuff though.

Yeah, C++ is a toy language that will never be used for large-scale applications or for applications that require performance.

quote:

oh ok, is all the fuss about because many functions take size_t instead of int? i don't see how this could become an issue unless you are dealing with large indecies.

I like how you jumped in the discussion without knowing anything of what's being discussed.

It's spelled "indices", btw.

That Turkey Story
Mar 30, 2003

Quit being a jerk floWenoL. Let's all just relax and have a nice discussion about how much C++ owns and how everyone is a terrible programmer except for me, you and AD.

That Turkey Story fucked around with this message at 02:27 on Jul 4, 2008

crazypenguin
Mar 9, 2005
nothing witty here, move along

That Turkey Story posted:

You need to reread this conversation if that is what you pulled out. The problem is the signed version won't work for all strings whereas the unsigned version will. You're trading writing a proper algorithm for the ability to avoid an amateur mistake that could easily be picked up in testing. One is correct, the other is not.
There's a difference between library code and application code.

In library code, you need to be able to handle everything. There's a lot of great examples of how hard this is to do, the most famous is Bentley's paper on writing a binary search algorithm. Amusingly it had an error that nobody noticed for two decades. Unsignedness is important for correctness there because you need to work correctly with almost unknown inputs.

(As an aside, I wrote a compiler recently that comes with a really fun complication example. The language had simple constructs like "for i := a..b" and it turns out you have to contort the translation of for loops in unexpected ways to accommodate b = MAX_INT. Exercise left for the reader!)

In application code, you know your inputs a lot better. You know you're never going to be working with data structures 2GB in size on a 32 bit machine. You're not that loving insane. You also need to do applicationy things, like iterate over all elements of a list except the last one because that's a special case. Well, okay let's go to size()-1! Perfectly reasonable thing to do.

You should simply NOT write application code like you would library code or you will never finish the project. Not as long as we're still using languages like our current ones, anyway. If the above blog post isn't a good enough example of how every piece of library code needs to be written in weird ways to handle odd corners cases, I'm not sure what is.

It's really quite simple, for signed integers, all you have to do is ask "will the magnitude of this value ever exceed 2 billion" and you know whether the code is correct. For unsigned values, you have to ask "will the magnitude of this value ever exceed 4 billion, oh and for every single calculation I make with it, could it possibly underflow?"

KISS principle. Don't over engineer things. That's all I'm saying, anyway.

That Turkey Story
Mar 30, 2003

I find it hard to believe that using the proper datatype is over-engineering.

crazypenguin
Mar 9, 2005
nothing witty here, move along

That Turkey Story posted:

I find it hard to believe that using the proper datatype is over-engineering.

The preceding paragraph explains the conceptual complexity, the preceding examples show some of the practical complexity, and the fact that this discussion started with Google's recommended coding guidelines suggests that they've probably run into exactly these same problems internally.

sarehu
Apr 20, 2007

(call/cc call/cc)

schnarf posted:

code:
#define EVENBITS(a) (a & 1) | ((a & (1<<2)) >> 1) | ((a & (1<<4)) >> 2) | ((a & (1<<6)) >> 3)

Here's an algorithm that scales better with respect to integer size. I don't think it's a good idea to use a macro.

code:
uint32_t evenbits(uint64_t x) {
  x = (x & 0x1111111111111111LL) | ((x & 0x4444444444444444LL) >> 1);
  x = (x & 0x0303030303030303LL) | ((x & 0x3030303030303030LL) >> 2);
  x = (x & 0x000f000f000f000fLL) | ((x & 0x0f000f000f000f00LL) >> 4);
  x = (x & 0x000000ff000000ffLL) | ((x & 0x00ff000000ff0000LL) >> 8);
  x = (x & 0x000000000000ffffLL) | ((x & 0x0000ffff00000000LL) >> 16);
  return (uint32_t) x;
}

vanjalolz
Oct 31, 2006

Ha Ha Ha HaHa Ha
Wow what the hell, how does that even work.

Hubis
May 18, 2003

Boy, I wish we had one of those doomsday machines...
what the crap happened in this thread?

Using "integral indices" instead of iterators only ever even *remotely* "makes sense" because people learned how to deal with containers from examples based on C or Address+Offset arrays. Iterators are a hell of a lot more intuitive (especially when you consider that a given container might have an internal implementation utterly divorced from Address+Offset, such as a linked list) and the only reason they don't seem that way is because programmers have the c-style array approach engrained in their brains.

Also, using "-1" as an "invalid flag" is fairly bad form, since it's semantically meaningless and is effectively a magic number. Yes, it allows you to write code like
code:
if (idx < 0)
    return NULL;
else
    return pointer_list[idx];
But guess what? If you didn't explicitly know that you were dealing with a base+offset memory-mapped array and that -1 (or, for that matter, -anything) was a sentinel value, then that code is utterly opaque and not at all self-documenting.

POKEMAN SAM
Jul 8, 2004

sarehu posted:

Here's an algorithm that scales better with respect to integer size. I don't think it's a good idea to use a macro.

Why not? It can be verified that it works properly just like the function you posted can.

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

Ugg boots posted:

Why not? It can be verified that it works properly just like the function you posted can.

Macros are evil, and unless you absolutely have to, it's preferable to use an inline function to enforce type safety and all that good stuff.

crazypenguin
Mar 9, 2005
nothing witty here, move along

Nuke Mexico posted:

what the crap happened in this thread? Using "integral indices" instead of iterators...

crazypenguin posted:

Zorba provides a slightly contrived situation but a very good example of the kind of subtle and unintuitive bugs unsignedness can produce, and all you can do is rave about how the situation is slightly contrived?
That's what happened. :colbert: I wonder where this discussion might have gone if the example hadn't been a indexing into a data structure at all. Still not a real world example, but at least we can stop arguing about iterators! You can't have negative beers on the wall, right?!
code:
for(unsigned i = 99; i >= 0; --i)
   cout << i << " bottles of beer on the wall.." << endl;

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

crazypenguin posted:

code:
for(unsigned i = 99; i >= 0; --i)
   cout << i << " bottles of beer on the wall.." << endl;

That Turkey Story posted:

Again, here is your choice -- write your code correctly, or write it incorrectly just to avoid an amateur mistake that can easily be found in testing anyway.

The argument is that you're substituting something that fails on a known edge case (or in this example, fails 100% of the time) and is easy to diagnose if you actually, you know, run the code, in order to save yourself from the minimal effort to write code that doesn't fail for 50% of the valid use cases.

Entheogen
Aug 30, 2004

by Fragmaster

sarehu posted:

Here's an algorithm that scales better with respect to integer size. I don't think it's a good idea to use a macro.

code:
uint32_t evenbits(uint64_t x) {
  x = (x & 0x1111111111111111LL) | ((x & 0x4444444444444444LL) >> 1);
  x = (x & 0x0303030303030303LL) | ((x & 0x3030303030303030LL) >> 2);
  x = (x & 0x000f000f000f000fLL) | ((x & 0x0f000f000f000f00LL) >> 4);
  x = (x & 0x000000ff000000ffLL) | ((x & 0x00ff000000ff0000LL) >> 8);
  x = (x & 0x000000000000ffffLL) | ((x & 0x0000ffff00000000LL) >> 16);
  return (uint32_t) x;
}

What does L stand for in there?

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

Entheogen posted:

What does L stand for in there?

LL is the suffix for long long.

Entheogen
Aug 30, 2004

by Fragmaster
what does even bits mean? is it an AND operation with something that has 1's only in even places so the product only has 1's in even positions?

vvvv thanks that makes perfect sense now vvvvv

For what purposes would you want to use evenbits function for?

Entheogen fucked around with this message at 20:01 on Jul 4, 2008

sarehu
Apr 20, 2007

(call/cc call/cc)
It might be more kosher to use ULL for unsigned long longs, but they were less than 0x8000000000000000 anyway.

The 'even' bits are those in even-numbered positions, starting at zero. Take the number 74 for example, and write it in binary: 01001010. The even-numbered bits are the ones that represent "2 to the Nth power", for even numbers N.

code:
01001010
 ^ ^ ^ ^  the even bits
The expression evenbits(74) should return the number 8, which has binary representation 1000.

crazypenguin
Mar 9, 2005
nothing witty here, move along

Avenging Dentist posted:

The argument is that you're substituting something that fails on a known edge case (or in this example, fails 100% of the time) and is easy to diagnose if you actually, you know, run the code, in order to save yourself from the minimal effort to write code that doesn't fail for 50% of the valid use cases.

It is intended to be obvious that it fails, while being perfectly reasonable code if you had just not insisted on unsigned integers for a variable that "can't be negative."

I guess I'll give one more example a shot. In the linux kernel, where it IS perfectly reasonable to use unsigned integers, there have been many security vulnerabilities (and this isn't just confined to linux) related to underflowing unsigned integers. The vast majority of these could have been avoided by using a large sized, signed integer, though obviously that's not always a potential solution for something like a kernel. If high profile, security sensitive things like kernels can make subtle mistakes involving underflow of unsigned integers, why would you recommend using unsigned more than is absolutely strictly necessary, inflicting this entire class of bugs on application code?

Really, if you want to convince me I'm wrong, instead of focusing on the illustrative example, find something horribly wrong with this:

crazypenguin posted:

It's really quite simple, for signed integers, all you have to do is ask "will the magnitude of this value ever exceed 2 billion" and you know whether the code is correct. For unsigned values, you have to ask "will this value ever exceed 4 billion, oh and for every single calculation I make with it, could it possibly underflow?"

ehnus
Apr 16, 2003

Now you're thinking with portals!

Avenging Dentist posted:

Macros are evil, and unless you absolutely have to, it's preferable to use an inline function to enforce type safety and all that good stuff.

Inline functions are all fine and dandy until your compiler decides that, no, it doesn't actually want to inline any more.

Adbot
ADBOT LOVES YOU

Zombywuf
Mar 29, 2008

ehnus posted:

Inline functions are all fine and dandy until your compiler decides that, no, it doesn't actually want to inline any more.

Your compilers guess as to whether it should inline is probably as good as yours. It's a bit of a black art.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply