Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
piratepilates
Mar 28, 2004

So I will learn to live with it. Because I can live with it. I can live with it.




Because the way to do it in C, or maybe just the way he would do it that somehow I also know to do, is to have a loop and two index variables, one for each end, and to just compare each one while moving inwards in the string, effectively cutting the string in half and comparing the two halves to each other.

Adbot
ADBOT LOVES YOU

HORATIO HORNBLOWER
Sep 21, 2002

no ambition,
no talent,
no chance

Tesseraction posted:

Interesting - thanks for the clarification. It's a good example of a compiler not following its supposed principles (i.e. not changing semantics) due to ambiguous syntax. Good eye!

This interpretation could not possibly be more wrong.

shrughes
Oct 11, 2008

(call/cc call/cc)

shrughes posted:

You said the compiler changed semantics when in fact it didn't.

And also, the problem was not ambiguous syntax.

FamDav
Mar 29, 2008

piratepilates posted:

Because the way to do it in C, or maybe just the way he would do it that somehow I also know to do, is to have a loop and two index variables, one for each end, and to just compare each one while moving inwards in the string, effectively cutting the string in half and comparing the two halves to each other.

no, im pretty sure indexing forward and backwards through a string is entirely different from making a copy of the first half, a copy of the second half, reversing the second half, then comparing those.

NFX
Jun 2, 2008

Fun Shoe

Isilkor posted:

As soon as your calculation produces a pointer that lies outside of that (real or fictional) array, even as a temporary value, the behavior is undefined - unless the pointer points one past the last element, which is allowed as a special exception.

Is it really UB if the pointer isn't dereferenced?

Consider:
code:
#include <stdlib.h>

void test(int* arr, size_t len)
{
    int* ptrA;
    int* ptrB = 0;
    int* ptrC = ptrA + len;
    int* ptrD = arr;
    int* ptrE = arr - 5;
    ptrD--;
    int crash = *ptrE + *ptrD + *ptrC + *ptrB + *ptrA;
}

int main()
{
    int* dummy = malloc(sizeof(int)*10);
    if (!dummy)
        return 1;
    test(dummy, 10);
    free(dummy);
    return 0;
}
At what point does this cause undefined behaviour?

Is there any difference between C and C++ in that regard?

shrughes
Oct 11, 2008

(call/cc call/cc)

NFX posted:

At what point does this cause undefined behaviour?

int* ptrC = ptrA + len;

The value of ptrA is indeterminate and now you're trying to use it -- so there you have undefined behavior.

int* ptrE = arr - 5;

This would be undefined behavior.

ptrD--;

This would be undefined behavior.

int crash = *ptrE + *ptrD + *ptrC + *ptrB + *ptrA;

If undefined behavior hadn't already happened, and if ptrA through ptrE had values somewhere in [arr, arr + len), you would still have undefined behavior because you're using the value of the object returned by a call to malloc.

Also, if you did initialize the array, a signed integer overflow could hypothetically happen in that expression, which would be undefined behavior (if it did happen).

shrughes fucked around with this message at 08:25 on Feb 5, 2014

Opinion Haver
Apr 9, 2007

If you have char foo[5], then char* bar = foo + 6; is undefined behavior. char* bar = foo + 5 isn't (although you'll obviously get garbage/might fault if you try to dereference bar), which is why this idiom works:

code:
void doSomething(char *arr, int len) {
    char *p;
    for (p = arr; p < arr + len; p++) {
        // stuff
    }
}
The relevant part of the standard here is section 6.5.6:

quote:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Gazpacho
Jun 18, 2004

by Fluffdaddy
Slippery Tilde

NFX posted:

Is it really UB if the pointer isn't dereferenced?
Yes really, that's what the standard says very explicitly. The address calculation, and not only dereferencing it, is undefined. The C++ standard has the same limitations, although they are worded differently.

In a program with several instances of undefined behavior, there's no exact requirement of just how it will fail and which part of the program will appear to be responsible. It's just beyond the scope of the standard.

Coffee Mugshot
Jun 26, 2010

by Lowtax
With that last example, I almost feel like we're talking past each other. Just because your compiler figures it out fine on your particular architecture, it doesn't mean the behavior is defined by the language specification. I'm sure gcc4.6+ won't blow up on the snippet posted, but that's probably because the forward-propagate RTL pass is turned on by default at every optimization level. But it's even strange to assume two different C/C++ compilers would return the same results for that snippet, although they might.

SurgicalOntologist
Jun 17, 2004

My professor just sent code that included this (Matlab, so horror already):

code:
for i_col in 1:n_col
    total = 0;
    for i_row in 1:n_row
        total = total + data(i_row, i_col);
    end
    means(i_col) = total / n_row;
end
I emailed him and said "Hey Prof, Just so you know, you can just do mean(data) and it will automatically give you the means for every column. You don't have to loop over the whole dataset."

He said, "Thanks, but I like doing things my way. The faster way is usually more confusing." :doh:

He regularly gives us 1000-line scripts that could be written in about 10 lines. This is in a class where we're supposed to be learning applied data analysis skills.

Relatedly, last week we were talking about probability distributions. He had prepared a text file with 1000 realizations of rolling two six-sided dice. We spent twenty minutes helping everyone load the text file into Matlab. I tried to suggest sum(randint(6,2,1000)) but he wouldn't listen.

SurgicalOntologist fucked around with this message at 19:48 on Feb 5, 2014

Tesseraction
Apr 5, 2009

HORATIO HORNBLOWER posted:

This interpretation could not possibly be more wrong.

shrughes posted:

And also, the problem was not ambiguous syntax.

After sleeping on it and coming back I realise I misread the original code, but I'm still not sure how changing a jlt instruction to a cmp then jz isn't considered changed semantics?

seiken
Feb 7, 2005

hah ha ha

Tesseraction posted:

After sleeping on it and coming back I realise I misread the original code, but I'm still not sure how changing a jlt instruction to a cmp then jz isn't considered changed semantics?

The language does not define semantics in terms of instructions.

Steve French
Sep 8, 2003

Okay, let me recap and make sure I'm understanding this correctly. Please correct any misunderstandings or inaccuracies.

The C standard says, in 6.5.6, regarding additive operators:

quote:

7: For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

quote:

8: When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
(emphasis added)

So, in my code, the char *s parameter, in the context of that function, does not point to an element in an array, so it behaves like a pointer to the first element of an array of length one. Results of the operation that point to anything that is not in the array or one past the end of the array are undefined, so s - 1 is undefined.

Is this correct?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
It's not that "s - 1" is "undefined", it's that "s - 1" is an "undefined operation".

That means that anything can happen when that code is executed. Your program can crash. Your computer can launch the missiles. After you execute an undefined operation, all bets are off.

Obviously, C compilers do not commonly compile undefined to those things, but it means that you can't reason about the program anymore.

Steve French
Sep 8, 2003

Sure, that's what I meant by undefined in this context. But otherwise, you agree with my summary and interpretation?

QuarkJets
Sep 8, 2008

SurgicalOntologist posted:

My professor just sent code that included this (Matlab, so horror already):

code:
for i_col in 1:n_col
    total = 0;
    for i_row in 1:n_row
        total = total + data(i_row, i_col);
    end
    means(i_col) = total / n_row;
end
I emailed him and said "Hey Prof, Just so you know, you can just do mean(data) and it will automatically give you the means for every column. You don't have to loop over the whole dataset."

He said, "Thanks, but I like doing things my way. The faster way is usually more confusing." :doh:

He regularly gives us 1000-line scripts that could be written in about 10 lines. This is in a class where we're supposed to be learning applied data analysis skills.

Relatedly, last week we were talking about probability distributions. He had prepared a text file with 1000 realizations of rolling two six-sided dice. We spent twenty minutes helping everyone load the text file into Matlab. I tried to suggest sum(randint(6,2,1000)) but he wouldn't listen.

Your professor is literally teaching coding horrors to students. Was it a class on Matlab specifically? If so, then you need to formally complain about how he is loving up his Matlab instruction. Was it a class on general programming? If so, then you need to formally complain about how he's teaching Matlab in a general programming course.

Tesseraction
Apr 5, 2009

seiken posted:

The language does not define semantics in terms of instructions.

Okay, but how does changing "x > y" to "x != y" not count as a fundamental change in the behaviour of a program?

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Tesseraction posted:

Okay, but how does changing "x > y" to "x != y" not count as a fundamental change in the behaviour of a program?

Over the set of values for which the program behaviour is defined, in this context those two operations are the same. For all other values, because the behaviour isn't defined anyway, it doesn't matter which operation you choose. There is no situation where the behaviour both is defined, and differs between the two operations, so there is no fundamental change. Hence, a legal optimization.

SurgicalOntologist
Jun 17, 2004

QuarkJets posted:

Your professor is literally teaching coding horrors to students. Was it a class on Matlab specifically? If so, then you need to formally complain about how he is loving up his Matlab instruction. Was it a class on general programming? If so, then you need to formally complain about how he's teaching Matlab in a general programming course.

No, it's basically a math class. And I actually taught a Matlab workshop to about half the students in the class, they all know that this is ridiculous. My main complaint that he is wasting so much time. This guy is your stereotypical aspy math professor, he has no people skills and does things his own way. But he's a technical genius and is the only person in our department qualified to teach this stuff. I would prefer if he would just drop the Matlab portion of the class altogether but that's not going to happen.

Dylan16807
May 12, 2010

Suspicious Dish posted:

That means that anything can happen when that code is executed. Your program can crash. Your computer can launch the missiles. After you execute an undefined operation, all bets are off.
And even that is an understatement. Once the program will inevitably perform an undefined operation at some point in the future, all bets are off. So it can crash and launch the missiles without even reaching the bad line.

Tesseraction
Apr 5, 2009

ShoulderDaemon posted:

Over the set of values for which the program behaviour is defined, in this context those two operations are the same. For all other values, because the behaviour isn't defined anyway, it doesn't matter which operation you choose. There is no situation where the behaviour both is defined, and differs between the two operations, so there is no fundamental change. Hence, a legal optimization.

Thanks. This actually stemmed from me not registering that they passed length as size_t and so was unsigned/wouldn't underflow. I'm used to signed-only. Apologies for the gently caress-up.

NFX
Jun 2, 2008

Fun Shoe

shrughes posted:

int crash = *ptrE + *ptrD + *ptrC + *ptrB + *ptrA;

If undefined behavior hadn't already happened, and if ptrA through ptrE had values somewhere in [arr, arr + len), you would still have undefined behavior because you're using the value of the object returned by a call to malloc.

I considered throwing in a memset(), but hey, if it's gonna crash it's gonna crash.

I had no idea that illegal pointer math is undefined, but I can understand why it's practical.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Dylan16807 posted:

And even that is an understatement. Once the program will inevitably perform an undefined operation at some point in the future, all bets are off. So it can crash and launch the missiles without even reaching the bad line.

No, I didn't think that was the case. I thought the C specification was defined in terms of an "abstract machine", which could only go off the rails when the undefined operation was hit. I'm probably wrong, though.

Dylan16807
May 12, 2010

Tesseraction posted:

Thanks. This actually stemmed from me not registering that they passed length as size_t and so was unsigned/wouldn't underflow. I'm used to signed-only. Apologies for the gently caress-up.

It doesn't matter if it's signed or unsigned. It breaks the rules to do s-1 or s-2 or s-3, so the compiler only has to care about the case where size is 1 or larger.

Suspicious Dish posted:

No, I didn't think that was the case. I thought the C specification was defined in terms of an "abstract machine", which could only go off the rails when the undefined operation was hit. I'm probably wrong, though.

The abstract machine only specifies the behavior of valid C programs, which does not include programs that invoke undefined behavior. If your program is guaranteed to hit an undefined operation, it no longer has any meaning.

Another way of looking at it: Compilers are free to reorder parts of code as long as they preserve the semantics of the abstract machine, and they don't have to take undefined behavior into account when doing so. So a printf followed by calculating an invalid pointer may or may not print, depending on the compiler's mood.

http://blog.regehr.org/archives/232 talks about this issue, and shows a way to delay the point at which undefined behavior becomes inevitable.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Thanks for the excellent reference! Glad to be wrong.

JawnV6
Jul 4, 2004

So hot ...
Compilers can propagate the undefined operation's lack of constraint to an earlier point in the execution.

Is that the glibbest?

raminasi
Jan 25, 2005

a last drink with no ice
I've always just thought about it as "operations aren't undefined, programs are."

Steve French
Sep 8, 2003

Ok I have to admit at this point that I was feigning ignorance a bit in my last post. That explanation cannot possibly be right, or there's a big problem with the standard library, because how else would you write a strlen function with the correct behavior?

Steve French posted:

So, in my code, the char *s parameter, in the context of that function, does not point to an element in an array, so it behaves like a pointer to the first element of an array of length one. Results of the operation that point to anything that is not in the array or one past the end of the array are undefined, so s - 1 is undefined.

C code:
size_t strlen(const char *s);
So, in this code, the const char *s parameter, in the context of this function, does not point to an element in an array, so it behaves like a pointer to the first element of an array of length one. Results of an operation that point to anything that is not in the array or one past the end of the array result in undefined behavior, so s + 2 has undefined behavior.

So 6.5.6.7, as quoted above, that dictates that s is treated like a pointer to an array of size one, must not quite apply directly in either of these cases.

So how is the pointer treated in the context of 6.5.6.8? Because if it's the above, then seemingly any operation on a string function argument accessing more than just the first two characters has undefined behavior (the standard doesn't seem to treat going more than one past the end of an array any differently than going before the start of the array). The compiler can't always know what the array being passed into the function looks like, either. If I compiled my palindrome function into a shared library, it seems that in order to guarantee correct behavior on elements that *are* within the array bounds (unknown to the compiler, and even to the compiled code at runtime), the compiled output must have sane behavior for all possible pointer inputs (except for those that result in overflows, of course).

At any rate, even if this is actually undefined behavior and bad, someone should probably tell the folks in charge of glibc:

C code:
char *
STRNCPY (char *s1, const char *s2, size_t n)
{
  char c;
  char *s = s1;

  --s1;
...

Dessert Rose
May 17, 2004

awoken in control of a lucid deep dream...

Steve French posted:

C code:
size_t strlen(const char *s);
So, in this code, the const char *s parameter, in the context of this function, does not point to an element in an array

If you pass a pointer to something that isn't an array of characters to this function, it's you causing the undefined behavior, not the function.

edit for more detail:
The compiler is not sniffing around for things that are undefined and deliberately causing mischief when it finds them (though that's a thing it is, by the standard, allowed to do).

The compiler is required to generate code that operates correctly under defined conditions, without taking into account undefined conditions. So, in the context of this function, the only way this function even has meaning is if you passed it a pointer to an array of char. If you pass it a pointer to anything else then the code the compiler generated doesn't have to do anything sensible at all.

strlen is a particularly good example because it's a very simple function (count how many times you can increment s until *s == 0) and it can go horribly wrong if its preconditions are not met (pass it a pointer to something that isn't an array, or to an array of char that isn't null-terminated, and who knows what will happen?) but whether or not the behavior is undefined is entirely up to the caller.

Dessert Rose fucked around with this message at 23:01 on Feb 5, 2014

Steve French
Sep 8, 2003

Dessert Rose posted:

If you pass a pointer to something that isn't an array of characters to this function, it's you causing the undefined behavior, not the function.

You're missing the point, which is that according to the above interpretation of the standard that I believe to be wrong, s + 2 is undefined. So, you know, strlen("foo") would result in undefined behavior.

pseudorandom name
May 6, 2007

Its also worth noting that the C standard's abstract machine is written so that you can implement C without a single contiguous address space.

I think it may even not have to be byte addressable, although now that threading is in there that may have changed.

Furthermore, all bets are off when you're dealing with the implementation of the C runtime library, because that's targeting a specific C compiler, not the abstract machine.

Oh, and it may surprise you to learn that your strlen() function routinely reads 3, 7 or even 255 bytes past the end of your char array, with no ill effect. Also, it isn't written in C.

Dessert Rose
May 17, 2004

awoken in control of a lucid deep dream...

Steve French posted:

You're missing the point, which is that according to the above interpretation of the standard that I believe to be wrong, s + 2 is undefined. So, you know, strlen("foo") would result in undefined behavior.

No, because strlen("foo") results in us entering the body of strlen with a const char *s pointing to the first element of a four-element char array ['f', 'o', 'o', 0]. In this context s+2 is defined.

Steve French
Sep 8, 2003

Dessert Rose posted:

No, because strlen("foo") results in us entering the body of strlen with a const char *s pointing to the first element of a four-element char array ['f', 'o', 'o', 0]. In this context s+2 is defined.

Which would be awfully nice for the compiler to know, but it doesn't.

Dessert Rose
May 17, 2004

awoken in control of a lucid deep dream...

Steve French posted:

Which would be awfully nice for the compiler to know, but it doesn't.

It doesn't have to. When generating the code for strlen [if it were written in C], it has to generate code that produces correct results for defined behavior. The only way this code:

code:
int strlen(const char *s)
{
  int len = 0;
  while (*s++ != 0)
  {
    len++;
  }
  return len;
}
has any meaning is if s points to a 0-terminated array of char (or a single char 0). So the compiler generates code that works off of that assumption. If you violate the preconditions, it is you that has caused the program to stumble into undefined-behavior land.

Steve French
Sep 8, 2003

Dessert Rose posted:

It doesn't have to. When generating the code for strlen [if it were written in C], it has to generate code that produces correct results for defined behavior. The only way this code:

code:
int strlen(const char *s)
{
  int len = 0;
  while (*s++ != 0)
  {
    len++;
  }
  return len;
}
has any meaning is if s points to a 0-terminated array of char (or a single char 0). So the compiler generates code that works off of that assumption. If you violate the preconditions, it is you that has caused the program to stumble into undefined-behavior land.

You need to read my statements more carefully. I am *not* saying it is actually undefined behavior.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Steve French posted:

Which would be awfully nice for the compiler to know, but it doesn't.

This isn't how undefined assumptions work in compilers. The compiler does know that, because if you didn't pass a large enough array into the function, then the result would be undefined, and the compiler is allowed to assume that your program is well-defined. Compilers don't check to make sure that a program is well-defined, they simply assume that it is and perform optimizations within that context, because it doesn't matter if the optimizations break not-well-defined programs; those programs were already undefined according to spec anyway.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Steve French posted:

So, in this code, the const char *s parameter, in the context of this function, does not point to an element in an array, so it behaves like a pointer to the first element of an array of length one.
Why do you think s does not point to an element in an array? The compiler can see that you're doing things which are legal if it points to an array and illegal if it does not, therefore it knows that s points to an array. If you then proceed to pass a non-array as s you've formed an invalid program.

Deus Rex
Mar 5, 2005

I don't understand, and feel like I am missing something obvious (I do not really write very much C).

quote:

For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.

The interpretation of this function:

code:
void myfun(const char *s) {
  // for purposes of the standard, s is treated as a pointer to an array of length one
  s = s + 1; // well-defined; one past the end.
  // s is again a pointer to an object that is not an element of an array, so it is
  // treated as a pointer to an array of length one
  s = s + 1; // again, one past the end 
}
which would presumably be equivalent to code used in a naive strlen implementation, seems pretty different to me than:

code:
void myfun(const char *s) {
  // for purposes of the standard, s is treated as a pointer to a one element array
  s = s + 2; // undefined access to element more than one past the last element of an array
}

Deus Rex fucked around with this message at 23:21 on Feb 5, 2014

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
s + 1 is not a pointer to an object. "a pointer to an object that is not an element of an array" is not the same thing as "a pointer not pointing at an element of an array".

Adbot
ADBOT LOVES YOU

Vanadium
Jan 8, 2005

How does all of this interact with taking a char* pointer to the beginning of a large struct and then accessing it byte for byte? :allears:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply