Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Pookster
Jul 4, 2007
I just stick this near the beginning of my script (I route everything on a site through one script) and then get on with things as normal.
php:
<?
// Kill magic quotes because it's stupid.
if (get_magic_quotes_gpc()) {
    function stripslashes_deep($value) {
        if (is_array($value)) {
            $value = array_map('stripslashes_deep', $value);
        }
        else {
            $value = stripslashes($value);
        }
        return $value;
    }
    $_POST = array_map('stripslashes_deep', $_POST);
    $_GET = array_map('stripslashes_deep', $_GET);
    $_COOKIE = array_map('stripslashes_deep', $_COOKIE);
    $_REQUEST = array_map('stripslashes_deep', $_REQUEST);
}
?>

Adbot
ADBOT LOVES YOU

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

MrMoo posted:

It looks like mb_check_encoding() would be a good choice,

http://hk2.php.net/manual/en/function.mb-check-encoding.php

The comments on that manual page make me unsure what to think. They suggest that the commenters think that the function is lacking, but I don't feel like I know enough about this stuff to evaluate their objections. I feel like I'm not able to work out what I need to do to make sure my app is safe.

DarkLotus
Sep 30, 2001

Lithium Hosting
Personal, Reseller & VPS Hosting
30-day no risk Free Trial &
90-days Money Back Guarantee!

Pookster posted:

I just stick this near the beginning of my script (I route everything on a site through one script) and then get on with things as normal.
php:
<?
// Kill magic quotes because it's stupid.
if (get_magic_quotes_gpc()) {
    function stripslashes_deep($value) {
        if (is_array($value)) {
            $value = array_map('stripslashes_deep', $value);
        }
        else {
            $value = stripslashes($value);
        }
        return $value;
    }
    $_POST = array_map('stripslashes_deep', $_POST);
    $_GET = array_map('stripslashes_deep', $_GET);
    $_COOKIE = array_map('stripslashes_deep', $_COOKIE);
    $_REQUEST = array_map('stripslashes_deep', $_REQUEST);
}
?>

FYI, you can use $_POST = stripslashes_deep($_POST); since your function already does an array_map.

Pookster
Jul 4, 2007

DarkLotus posted:

FYI, you can use $_POST = stripslashes_deep($_POST); since your function already does an array_map.

That's very true. Luckily I haven't had to use servers with magic quotes enabled for a while so this code wasn't even getting called.

isagoon
Aug 31, 2009

by Peatpot

Hammerite posted:

For information, and in relation to the discussion we were having about sanitising user input, this is the function I have adopted for sanitising user input that is expected to be of type string. (PHP_MAJOR_VERSION is defined elsewhere.)

php:
<?

define( 'STRING_TO_UPPERCASE'   ,  1 );
define( 'STRING_TO_LOWERCASE'   ,  2 );
define( 'STRING_NO_TRIM'        ,  4 );
define( 'STRING_NO_STRIP_CR'    ,  8 );
define( 'STRING_STRIP_NEWLINES' , 16 );
define( 'STRING_ESCAPE_HTML'    , 32 );
function sanitise_string ($x, $flags = 0) {
    $x = (string)$x;
    if ( PHP_MAJOR_VERSION < 6 and get_magic_quotes_gpc() ) {
        $x = stripslashes($x);
    }
    if ( $flags & STRING_TO_UPPERCASE ) {
        $x = strtoupper($x);
    } else if ( $flags & STRING_TO_LOWERCASE ) {
        $x = strtolower($x);
    }
    if ( ( $flags & STRING_NO_TRIM ) == 0 ) {
        $x = trim($x);
    }
    if ( ( $flags & STRING_NO_STRIP_CR ) == 0 ) {
        $x = str_replace("\r", '', $x);
    }
    if ( $flags & STRING_STRIP_NEWLINES ) {
        $x = str_replace("\n", '', $x);
    }
    if ( $flags & STRING_ESCAPE_HTML ) {
        $x = htmlspecialchars($x);
    }
    return $x;
}

?>

:psyduck: No offense, but this is a pretty good example of how not to write code. So much of this should be reformatted or otherwise changed.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

isagoon posted:

:psyduck: No offense, but this is a pretty good example of how not to write code. So much of this should be reformatted or otherwise changed.

I'd love to respond to this comment, but you seem to have gone out of your way to provide absolutely no specific criticisms whatsoever.

gwar3k1
Jan 10, 2005

Someday soon

Hammerite posted:

I'd love to respond to this comment, but you seem to have gone out of your way to provide absolutely no specific criticisms whatsoever.

It confuses me a bit but maybe its my lack of understanding of PHP than your code. Here's some thoughts though:

$flags = 0 should be $flags=null if you want it to be optional, otherwise $flags becomes a constant (0).

Your ifs use inconsistent notation: "and" "&". Shouldn't ands be && or is & something specific.

$flags & STRING_TO_UPPERCASE will always equal true: you have defined both and your if isn't looking for a specific value.
$flags & STRING_TO_LOWERCASE will never be called because of the above.

Why would you remove new lines and not replace them with some identifier so that the formatting matches when the string is drawn from the database? I know this pisses off users just from seeing people's Facebook status complaining about the fact line breaks used to be removed.

Again, I'm hobbyist amateur with PHP so these are just things I've picked up as contradictory to my knowledge/experience.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

gwar3k1 posted:

Your ifs use inconsistent notation: "and" "&". Shouldn't ands be && or is & something specific.

& is "bitwise and".

A call to this function might look like this:

$mystring = sanitise_string(@$_POST['mystring'], STRING_TO_UPPERCASE | STRING_ESCAPE_HTML)

STRING_TO_UPPERCASE is in binary ...000001 and STRING_ESCAPE_HTML is in binary ...100000, so bitwise or-ing them together with | yields ...100001. Now, the conditions $flags & STRING_TO_UPPERCASE and $flags & STRING_ESCAPE_HTML are true, but $flags & STRING_STRIP_NEWLINES is false.

Additionally, you're mistaken when you say that

quote:

$flags & STRING_TO_UPPERCASE will always equal true

bearing in mind that you were under the impression that "&" means the same thing as "&&", you're incorrect because if $flags is equal to zero (as per its default value), $flags && STRING_TO_UPPERCASE will be false (zero interpreted as a Boolean is false).

gwar3k1
Jan 10, 2005

Someday soon
Thanks for the explaination.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

gwar3k1 posted:

Why would you remove new lines and not replace them with some identifier so that the formatting matches when the string is drawn from the database? I know this pisses off users just from seeing people's Facebook status complaining about the fact line breaks used to be removed.
Precaution for inputs that aren't supposed to permit line breaks anyway, like in a form <input type="text">. I wouldn't use it for something that came from a <textarea>.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

isagoon posted:

:psyduck: No offense, but this is a pretty good example of how not to write code. So much of this should be reformatted or otherwise changed.

You know, I looked back at that function and decided I didn't like the way it was laid out very much. I have no idea what your objections are to it, but I reformatted it to be more pleasing. Or at least, more pleasing to me. I have no idea where you are coming from.
code:

define( 'STR_TO_UPPERCASE'   ,  1 );
define( 'STR_TO_LOWERCASE'   ,  2 );
define( 'STR_NO_TRIM'        ,  4 );
define( 'STR_NO_STRIP_CR'    ,  8 );
define( 'STR_STRIP_NEWLINES' , 16 );
define( 'STR_ESCAPE_HTML'    , 32 );
function sanitise_str ($x, $flags = 0) {
    $x = (string)$x;
    if ( PHP_MAJOR_VERSION < 6 and get_magic_quotes_gpc() ) {
        $x = stripslashes($x);
    }
    if (  $flags & STR_TO_UPPERCASE   ) { $x = strtoupper($x);            }
    if (  $flags & STR_TO_LOWERCASE   ) { $x = strtolower($x);            }
    if ( ~$flags & STR_NO_TRIM        ) { $x = trim($x);                  }
    if ( ~$flags & STR_NO_STRIP_CR    ) { $x = str_replace("\r", '', $x); }
    if (  $flags & STR_STRIP_NEWLINES ) { $x = str_replace("\n", '', $x); }
    if (  $flags & STR_ESCAPE_HTML    ) { $x = htmlspecialchars($x);      }
    return $x;
}

isagoon
Aug 31, 2009

by Peatpot

Hammerite posted:

You know, I looked back at that function and decided I didn't like the way it was laid out very much. I have no idea what your objections are to it, but I reformatted it to be more pleasing. Or at least, more pleasing to me. I have no idea where you are coming from.
code:

define( 'STR_TO_UPPERCASE'   ,  1 );
define( 'STR_TO_LOWERCASE'   ,  2 );
define( 'STR_NO_TRIM'        ,  4 );
define( 'STR_NO_STRIP_CR'    ,  8 );
define( 'STR_STRIP_NEWLINES' , 16 );
define( 'STR_ESCAPE_HTML'    , 32 );
function sanitise_str ($x, $flags = 0) {
    $x = (string)$x;
    if ( PHP_MAJOR_VERSION < 6 and get_magic_quotes_gpc() ) {
        $x = stripslashes($x);
    }
    if (  $flags & STR_TO_UPPERCASE   ) { $x = strtoupper($x);            }
    if (  $flags & STR_TO_LOWERCASE   ) { $x = strtolower($x);            }
    if ( ~$flags & STR_NO_TRIM        ) { $x = trim($x);                  }
    if ( ~$flags & STR_NO_STRIP_CR    ) { $x = str_replace("\r", '', $x); }
    if (  $flags & STR_STRIP_NEWLINES ) { $x = str_replace("\n", '', $x); }
    if (  $flags & STR_ESCAPE_HTML    ) { $x = htmlspecialchars($x);      }
    return $x;
}


Ok, allow me to explain.

First off, $x = (string)$x; is unneeded. PHP will automatically cast the argument to a string.

Second, single sentence blocks should not be braced.

Third, but nitpick, is that you may want to use && instead of AND. Either will work, but most everybody prefers to use &&.

Since you aren't doing anything with $x after your if statement, you could just return each one of those right then and there instead of reassigning it.

The bitwise stuff is fine.

The logic of the function is sound, but formatting is also an important. A lot of PHP coders don't seem to follow good formatting rules.

isagoon fucked around with this message at 16:33 on Jun 14, 2010

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

isagoon posted:

Ok, allow me to explain.

First off, $x = (string)$x; is unneeded. If the developer supplies an object or something that != a string to your function, it isn't the functions fault and casting the argument to a string isn't going to accomplish much (unless, that is you are passing it a class with a tostring magic method). Even if you did do something like that, I believe that PHP will automatically case the object to a string. Any way you look at it, that line does nothing.
You might well be right, but making sure it's a string when it comes out the other end of the function gives me peace of mind.

quote:

Second, single sentence blocks should not be braced.
I disagree. Every block must be braced.

quote:

Third, but nitpick, is that you may want to use && instead of AND. Either will work, but most everybody prefers to use &&.
If you say so, I guess. I prefer "and". Providing "and" and "or" as keywords is one of the vanishingly few things PHP can claim to have as an advantage over other languages (trivial though it is). :)

quote:

Since you aren't doing anything with $x after your if statement, you could just return each one of those right then and there instead of reassigning it.
This is not true. I might be (probably will be) doing more than one thing to $x, depending on the value of $flags.

i am not zach
Apr 16, 2007

by Ozmaugh
Having an issue with an array and a foreach loop, trying to figure out whats going on...

php:
<?
      print_r($arrFieldData['options']);
      
      $intCount = '0';

      foreach ($arrFieldData['options'] as $strOptionValue => $strOptionDesc) {
            
        if ($strOptionValue == 'default') {
          $arrOptionData['default']['value'] = $strOptionValue;
          $arrOptionData['default']['desc'] = $strOptionDesc;
        }
        
        $arrOptionData[$intCount]['value'] = $strOptionValue;
        $arrOptionData[$intCount]['desc'] = $strOptionDesc;
        
        $intCount++;
      }
            
      print_r($arrOptionData);?>
Output of the first print_r is as follows:
php:
<?
Array
(
    [default] => MacBook
    [0] => MacBook (Unibody)
    [1] => MacBook Pro
    [2] => MacBook Pro (Unibody)
    [3] => MacBook Air
    [4] => iBook
    [5] => PowerBook G4
)
?>
second one is
php:
<?
(
    [default] => Array
        (
            [value] => 0
            [desc] => MacBook (Unibody)
        )

    [0] => Array
        (
            [value] => default
            [desc] => MacBook
        )

    [1] => Array
        (
            [value] => 0
            [desc] => MacBook (Unibody)
        )

    [2] => Array
        (
            [value] => 1
            [desc] => MacBook Pro
        )

    [3] => Array
        (
            [value] => 2
            [desc] => MacBook Pro (Unibody)
        )

    [4] => Array
        (
            [value] => 3
            [desc] => MacBook Air
        )

    [5] => Array
        (
            [value] => 4
            [desc] => iBook
        )

    [6] => Array
        (
            [value] => 5
            [desc] => PowerBook G4
        )

)
?>
My question is how on earth is $arrOptionData['default'] set to '0' and 'MacBook (Unibody)' instead of 'default' and 'MacBook'

edit. And it only happens if the very first value in $arrFieldData['options'] is the default... otherwise works as I intended

i am not zach fucked around with this message at 16:56 on Jun 14, 2010

isagoon
Aug 31, 2009

by Peatpot

Hammerite posted:

You might well be right, but making sure it's a string when it comes out the other end of the function gives me peace of mind.

You aren't making sure of anything. PHP will cast it to a string no matter what it starts out as, and those function calls will return a string no matter what. If something other than a string goes into your function, something severely hosed is going on. Also, if what you pass is not a string, that sentence is not going to do anything to prevent something bad happening later on.

I disagree. Every block must be braced.

If you say so, I guess. I prefer "and". Providing "and" and "or" as keywords is one of the vanishingly few things PHP can claim to have as an advantage over other languages (trivial though it is). :)

and that's exactly why you shouldn't use them :)

This is not true. I might be (probably will be) doing more than one thing to $x, depending on the value of $flags.

As you wish. For that code snippit, you could just return immediately

isagoon fucked around with this message at 17:01 on Jun 14, 2010

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

i am not zach posted:

Having an issue with an array and a foreach loop, trying to figure out whats going on...

My question is how on earth is $arrOptionData['default'] set to '0' and 'MacBook (Unibody)' instead of 'default' and 'MacBook'

edit. And it only happens if the very first value in $arrFieldData['options'] is the default... otherwise works as I intended

Try changing if ($strOptionValue == 'default') to if ($strOptionValue === 'default'). I think you're unintentionally getting a result of "true" from the comparison when PHP tries to compare a non-numeric string to integer 0.

isagoon posted:

You aren't making sure of anything. PHP will cast it to a string no matter what it starts out as, and those function calls will return a string no matter what.
What if I use

$x = sanitise_str(@$_GET['something_or_other'], STR_NO_TRIM | STR_NO_STRIP_CR);

and magic_quotes_gpc is off? Then it'll pass through the function unchanged. And the input might indeed happen not to be a string if I'm using it on GET or POST data.

isagoon posted:

As you wish. For that code snippit, you could just return immediately
I don't understand what you mean. The only one of those "if" statements where I could replace the assignment with a "return" statement and not change the function's behaviour is the last one, the htmlspecialchars() one.

i am not zach
Apr 16, 2007

by Ozmaugh

Hammerite posted:

Try changing if ($strOptionValue == 'default') to if ($strOptionValue === 'default'). I think you're unintentionally getting a result of "true" from the comparison when PHP tries to compare a non-numeric string to integer 0.

That was it... thanks

corgski
Feb 6, 2007

Silly goose, you're here forever.

isagoon posted:

As you wish. For that code snippit, you could just return immediately

The entire point of the snippet is to process the input in one or more ways depending on the flags. The operations aren't exclusive, so you can't return until you've iterated through all the possible operations that could be performed on that string.

corgski fucked around with this message at 20:03 on Jun 14, 2010

Yay
Aug 4, 2007

isagoon posted:

Second, single sentence blocks should not be braced.

No, no no no! This is terrible practice, as it (amongst other reasons) makes it less readable amongst other code, doesn't allow for brace matching, and doesn't allow for additional statements to be added in the future. It goes against pretty much every coding standard PHP has. Drupal, evolt, Zend, PEAR, Some dude, and even PHPBB all document the requirement for braces on one-statement if blocks.

PHP itself doesn't have a coding standard a la the Java standard, but between them, those above provide a pretty comprehensive list of dos-and-donts, most of which tally up.

isagoon
Aug 31, 2009

by Peatpot

Hammerite posted:

Try changing if ($strOptionValue == 'default') to if ($strOptionValue === 'default'). I think you're unintentionally getting a result of "true" from the comparison when PHP tries to compare a non-numeric string to integer 0.

What if I use

$x = sanitise_str(@$_GET['something_or_other'], STR_NO_TRIM | STR_NO_STRIP_CR);

and magic_quotes_gpc is off? Then it'll pass through the function unchanged. And the input might indeed happen not to be a string if I'm using it on GET or POST data.

No, just no. No matter what GET or POST data you are processing, it will either be a string or any array. Manually casting the input to a string whether it's an array or a string will do _nothing_. A string can't be "sort of" or "kind of" a string, it's a string. You cannot send objects through GET or POST, but you can send representations of objects, as well as representations of arrays. If you do such a thing with an array, PHP will automatically convert it to an array.

I don't understand what you mean. The only one of those "if" statements where I could replace the assignment with a "return" statement and not change the function's behaviour is the last one, the htmlspecialchars() one.


Er, oh, I see. I didn't see that those were a bunch of different ifs pushed together.

Peanut and the Gang
Aug 24, 2009

by exmarx

isagoon posted:



If I was about to use a function named sanitise_str(), I would expect to be able to pass any user input to it and get a string as output. If you don't include the typecast, then the return value could be an array. If I'm expecting a string and I get an array, this could very well lead to an exploit.
And why would anyone spend the time to decide whether a typecast is superfluous? Even if it is, you might make a change to the code later to something that would require that typecast.

isagoon
Aug 31, 2009

by Peatpot

barbarianbob posted:

If I was about to use a function named sanitise_str(), I would expect to be able to pass any user input to it and get a string as output. If you don't include the typecast, then the return value could be an array. If I'm expecting a string and I get an array, this could very well lead to an exploit.
And why would anyone spend the time to decide whether a typecast is superfluous? Even if it is, you might make a change to the code later to something that would require that typecast.

:psyduck: I really don't understand this. It isn't permissible in any other language, why should you expect valid output when you input something totally unexpected?

If you code has an exploit in such a situation, you are doing something very, very, very wrong.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

isagoon posted:

:psyduck: I really don't understand this. It isn't permissible in any other language, why should you expect valid output when you input something totally unexpected?
Well, the thing is, if the function is being used to tackle user-supplied input, you're opening it up to input that is totally unexpected. Given that the input is received from a user who's using your site faithfully, you have nothing to worry about. If you're receiving input from a malicious user, it's as well to make sure your public-facing scripts coerce the input to something harmless and do their job as well as possible. It's not as though doing a simple typecast adds a lot of work for the server to do.

isagoon
Aug 31, 2009

by Peatpot

Hammerite posted:

Well, the thing is, if the function is being used to tackle user-supplied input, you're opening it up to input that is totally unexpected. Given that the input is received from a user who's using your site faithfully, you have nothing to worry about. If you're receiving input from a malicious user, it's as well to make sure your public-facing scripts coerce the input to something harmless and do their job as well as possible. It's not as though doing a simple typecast adds a lot of work for the server to do.

No, it doesn't. The user cannot supply anything but a string. The only way a user could supply an array is if he used a script to POST (or GET, for that matter) an array of strings. Remember, there is a difference between a string representation of an object (RE: Serialization) and an actual data type.

isagoon
Aug 31, 2009

by Peatpot
Argh, quote != edit

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

isagoon posted:

No, it doesn't. The user cannot supply anything but a string. The only way a user could supply an array is if he used a script to POST (or GET, for that matter) an array of strings. Remember, there is a difference between a string representation of an object (RE: Serialization) and an actual data type.

If a user sends the request script.php?foo=a&foo=b then $_GET["foo"] will be an honest-to-god array. This is trivial for a user to do, and doesn't require any script or programming or anything on the user's end. If the script is expecting $_GET["foo"] to be a string, then bad things could happen because various operations will not perform as the programmer expected. Explicitly casting it to a string removes this possibility.

Tad Naff
Jul 8, 2004

I told you you'd be sorry buying an emoticon, but no, you were hung over. Well look at you now. It's not catching on at all!
:backtowork:

ShoulderDaemon posted:

If a user sends the request script.php?foo=a&foo=b then $_GET["foo"] will be an honest-to-god array. This is trivial for a user to do, and doesn't require any script or programming or anything on the user's end. If the script is expecting $_GET["foo"] to be a string, then bad things could happen because various operations will not perform as the programmer expected. Explicitly casting it to a string removes this possibility.

What, I always had to do script.php?foo[]=a&foo[]=b to get $_GET['foo']=array('a','b')... did things change? I know I've seen Perl work like the above...

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

FeloniousDrunk posted:

What, I always had to do script.php?foo[]=a&foo[]=b to get $_GET['foo']=array('a','b')... did things change? I know I've seen Perl work like the above...

It's possible I'm misremembering and you're correct; it's been more than a while since I had to use PHP.

Tad Naff
Jul 8, 2004

I told you you'd be sorry buying an emoticon, but no, you were hung over. Well look at you now. It's not catching on at all!
:backtowork:

ShoulderDaemon posted:

It's possible I'm misremembering and you're correct; it's been more than a while since I had to use PHP.

I always hated that PHP wouldn't do that, but I guess it was just too hard to implement. I'll have to give it a shot tomorrow on the 5.3 machine.

isagoon
Aug 31, 2009

by Peatpot

FeloniousDrunk posted:

What, I always had to do script.php?foo[]=a&foo[]=b to get $_GET['foo']=array('a','b')... did things change? I know I've seen Perl work like the above...

You're right, that should produce an array, which is what I have been saying.

A better alternative to simply casting would be to check if it's an array and return false or throw an exception.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

isagoon posted:

You're right, that should produce an array, which is what I have been saying.

A better alternative to simply casting would be to check if it's an array and return false or throw an exception.

No you haven't! You've been insisting all along that the typecast is redundant and will make no difference, and that "the user cannot supply anything but a string" (your words). The best sense I can make of your posts is that you're making some academic point about how if the user sends an array then it will be serialised in the form of a string, which doesn't have any bearing on the fact that it'll be an array by the time my script sees it. Now you're recognising that it can be an array, but saying "err, yeah it could be an array, but you should do something different to deal with the possibility anyway"? As if it matters which of various options I use to deal with (presumably) malicious input, provided that it's made safe.

isagoon
Aug 31, 2009

by Peatpot

Hammerite posted:

No you haven't! You've been insisting all along that the typecast is redundant and will make no difference, and that "the user cannot supply anything but a string" (your words). The best sense I can make of your posts is that you're making some academic point about how if the user sends an array then it will be serialised in the form of a string, which doesn't have any bearing on the fact that it'll be an array by the time my script sees it. Now you're recognising that it can be an array, but saying "err, yeah it could be an array, but you should do something different to deal with the possibility anyway"? As if it matters which of various options I use to deal with (presumably) malicious input, provided that it's made safe.

Ok, we're talking about two different things there.

First off, if you input an array into a function that wants a string, you will get "Array". There is no way to find an exploit off this. An array tostring is just "Array".

Second, if you want to catch that condition, you shouldn't just blindly cast it to a string. You will get the same as above. Try it for yourself if you don't believe me. If you want to catch it, you should be specifically checking if it's an array.

Regardless of if you manually cast it to a string, you will still get "Array".

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

isagoon posted:

Ok, we're talking about two different things there.

First off, if you input an array into a function that wants a string, you will get "Array". There is no way to find an exploit off this. An array tostring is just "Array".

If it is one of PHP's builtin functions, yes, that is true. If I want to pass the supposedly sanitised variable to some function (whether written by me or someone else) that for whatever reason accepts a string or an array, then I probably will not be satisfied with the results of passing it an array when I'm under the impression that I'm passing it a string.

isagoon posted:

Second, if you want to catch that condition, you shouldn't just blindly cast it to a string. You will get the same as above. Try it for yourself if you don't believe me. If you want to catch it, you should be specifically checking if it's an array.

Regardless of if you manually cast it to a string, you will still get "Array".

I know that if you cast an array to string, you get "Array". I'm not sure why this matters; I only want to make the input safe, not make sense of it. I'm also not sure why I'd reject a simple statement (the typecast) that should work in every single contingency in favour of one that would only deal with the specific case where the supplied value is an array, and would leave a (hypothetical) other exploit open. After all, gently caress, we both know PHP itself is full of holes and bad design decisions, so how confident can you be that that would cover every possible exploit? Why wouldn't you go for the simple option that provides peace of mind? This is what I don't understand about the position you're arguing.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



You could just return (string)$x. I think that makes it a lot more obvious to someone reading it that the function is guaranteed to return a string, especially if they're not familiar with the behavior of the PHP built-ins.

edit:

quote:

I know that if you cast an array to string, you get "Array". I'm not sure why this matters

It sort-of matters because as long as you pass something (that might be an array) through one (probably any - I can't see it from the post editing mode) of the PHP functions you're using in that sanitizer, that your input will be mangled into a string automagically before that function even operates on it, so it's useless to do it yourself. Doing useless things can be confusing to other people who might read your code unless there's a policy in place to do them or something.

My bike shed would be orange.

Munkeymon fucked around with this message at 17:10 on Jun 15, 2010

McGlockenshire
Dec 16, 2005

GOLLOCKS!
Rewinding a bit, as I didn't see this addressed.

Hammerite posted:

Should I be checking user input to make sure it is well-formed UTF-8? Is there a builtin function for that? What is the worst-case scenario if I fail to screen for bad UTF-8?
Yes, yes, and hilarious XSS. Someone used broken UTF-7 sequences to make XSS exploits workable on Google for IE.

This is a derived and cleaned up version of what we use on $_GET, $_POST and $_REQUEST at work:
php:
<?
/**
 * Clean an array of null values and corrupt UTF-8 sequences
 * @return array
 **/
    function clean_input_corruption($input) {
        foreach($input as $k => $v) {
            if(is_array($v)) {
                $input[$k] = clean_input_corruption($v);
                continue;
            } // end if
            $input[$k] = iconv('UTF-8', 'UTF-8//IGNORE', str_replace("\r\n", "\n", $v));
            $input[$k] = preg_replace('/\0/', '', $v);
        } // end foreach
        return $input;
    } // end clean_input_corruption?>
The iconv call is what un-breaks broken UTF-8.

This works for us because:
1) We never get input that contains newlines in which we'd want to preserve carriage returns.
2) If the user submits broken UTF-8 to us, we don't care that the default behavior of iconv is to stop on error thus possibly truncating the string at the break.
3) We never get input from users that contains null.

Peanut and the Gang
Aug 24, 2009

by exmarx

quote:

Typecasting to string argument

The thing is, php's built-in functions are inconsistent a lot of the time. Some string functions force a typecast to the string "Array" but others (such as the str_replace calls) allow an array as the input and then return an array. If you use the flags for sanitize_str that run only the str_replace's (or match none of the if conditions at all), then your output can be an array and not a string.
My point assumes that the programmer never wants an array as output, since if I was expecting input as an array, I'd loop through it and run this function separately on each of the elements. I'm assuming the following phpdocs:
code:
/**
 * Sanitizes an input string based on the passed filter flags
 *
 * @param string $x = the string to filter
 * @param int $flags = filter flags based off of STR_* constants
 * @return string the sanitized string
 */
function sanitize_str($x, $flags=0){...}
And here's an example of input that's expected as a string but when given an array can lead to exploitation (control over the admin password): http://milw0rm.com/exploits/9410

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Thanks for the response McGlockenshire.

I propose to additionally screen user input using the following function:

php:
<?
function check_utf8_validity ($x) {
    // Based on code posted to the PHP manual page for mb_check_encoding() by
    // "javalc6 at gmail dot com" on 24/12/2009. Checks that a string is valid
    // UTF-8; also checks that it contains no non-printing characters except
    // for 09 (TAB / horizontal tab), 0A (LF / line feed), and
    // 0D (CR / carriage return). Returns true if input satisfies these
    // criteria, false otherwise.
    $x = (string)$x;
    $length = strlen($x);
    for ($i=0; $i<$length; $i++) {
        $c = ord($x[$i]);
        if ( $c > 126 ) {
            if      ( $c > 244 ) { return false; }
            else if ( $c > 239 ) { $bytes = 4;   }
            else if ( $c > 223 ) { $bytes = 3;   }
            else if ( $c > 193 ) { $bytes = 2;   }
            else                 { return false; }
            if ( $i + $bytes > $length ) { return false; }
            while ( $bytes > 1 ) {
                $i++;
                $b = ord($x[$i]);
                if ( $b < 128 or $b > 191 ) { return false; }
                $bytes--;
            }
        } else if ( $c < 32 and
                    $c != 9 and
                    $c != 10 and
                    $c != 13
                    ) {
            return false;
        }
    }
    return true;
}
?>

Hammerite fucked around with this message at 01:01 on Jun 16, 2010

Trinitrotoluene
Dec 25, 2004

I am having issues getting a function to work that parses some bbcode into HTML.

The BBcode:

code:
[url="http://www.google.co.uk"]Google[/url]
My PHP:

code:
function parseurl ($string) {
$search = array(
    '@\[url\s*=\s*(.*?)\s*\](.*?)\[\/url\]@si'
);
$replace = array(
    '<a href="\\1">\\2</a>'
);
return preg_replace($search , $replace, $string);
}
What is happening in the rendered HTML:

code:
<a href="%22http://www.google.co.uk%22">Google</a><br>
I know %22 is ASCII for " but why the hell is it invading my url and malforming it!!

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Trinitrotoluene posted:

I know %22 is ASCII for " but why the hell is it invading my url and malforming it!!

I do not know why the " is being turned into %22. I do know that you could stop it being part of the matched subexpression by modifying the "search" regex to

'@\[url\s*=\s*"?([^\]\s]*?)"?\s*\](.*?)\[\/url\]@si'

Unfortunately that means that e.g.
code:
[url="http://www.google.co.uk]Google[/url]
(with only a beginning quote or only an ending quote) will be regarded as a valid BBCode URL tag. I am not good enough at regular expressions to help further.

Adbot
ADBOT LOVES YOU

Peanut and the Gang
Aug 24, 2009

by exmarx

Hammerite posted:

Unfortunately that means that e.g.
code:
[url="http://www.google.co.uk]Google[/url]
(with only a beginning quote or only an ending quote) will be regarded as a valid BBCode URL tag. I am not good enough at regular expressions to help further.

You can use a match and a backreference to allow double quotes, single quotes, or no quotes:

code:
function parseurl ($string) {
$search = array(
    '@\[url\s*=([\'"]?)\s*(.*?)\s*\\1\](.*?)\[\/url\]@si'
);
$replace = array(
    '<a href="\\2">\\3</a>'
);
return preg_replace($search , $replace, $string);
}
The also means you'll need to increase the numbers in the references for the replace strings.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply