Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
markerstore
Dec 5, 2003
Canny!

Nevett posted:

This is in C#, in an ASP.NET project I've inherited.

Apart from missing a "break", given a sane implementation of substring and string comparison (e.g. substring doesn't allocate a new string), this isn't really much different from the normal approach, is it?

Adbot
ADBOT LOVES YOU

Nevett
Aug 31, 2001

Well, here's the method that uses it:

code:
	public Boolean CheckDate(String strDate)
	{
		Boolean bValid = false;
		//must match all conditions to be true
		if (strDate.Trim().Length == 10)
		{
			if (strDate.Split('/').Length == 3)
			{
				String[] oDateParts = strDate.Split('/');
				if (oDateParts[0].Length == 2 && oDateParts[1].Length == 2 && oDateParts[2].Length == 4)
				{
					if (IsNumeric(oDateParts[0]) && IsNumeric(oDateParts[1]) && IsNumeric(oDateParts[2]))
					{
						if (Convert.ToInt32(oDateParts[0]) < 32 && Convert.ToInt32(oDateParts[1]) < 13)
						{
							bValid = true;
						}
					}
				}
			}
		}
		return bValid;
	}
:stare:

SirViver
Oct 22, 2008

markerstore posted:

Apart from missing a "break", given a sane implementation of substring and string comparison (e.g. substring doesn't allocate a new string), this isn't really much different from the normal approach, is it?
Substring does allocate a new string. You can read the characters directly from the string, though, making substring completely unnecessary for reading single characters. That said, how would you implement substring without allocating a new string?

Disregarding the stupidity of the actual usage scenario, these would've been saner implementations in my opinion:
code:
public bool HasNumericDigitsOnly(string s)
{
    foreach(char c in s)
    {
        if (c < '0' || c > '9')
            return false;
    }
    return true;
}

public bool IsIntegerParsable(string s)
{
    int tmp;
    return Int32.TryParse(s, out tmp);
}
All that is completely irrelevant considering it's used for broken date parsing. Feb 31st my rear end.

This is how I'd have done it to begin with:
code:
public bool IsValidDate(string dateString)
{
    DateTime date;
    return DateTime.TryParseExact(dateString, "dd/MM/yyyy",
        CultureInfo.InvariantCulture, DateTimeStyles.None, out date); 
}

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Regexes are actually a pretty good idea for this simple validation.

Honestly, return str ~= /^\d+$/ is a much more clear and legible way of expressing "this should consist solely of digits", and return str ~= !^(\d{2})/(\d{2})/\d{4}$! and int($1) < 32 and int($2) < 13 is a much better way of expressing that CheckDate thingy.

They're still simpler and clearer even after translating them to a language that isn't so straightforward with regards to regex matching. Of course, TryParse is really the better way of handling it. You can even use the parsed result as an out parameter, since you know the calling code is probably immediately going to parse it once you tell it it's good.

zeekner
Jul 14, 2007

SirViver posted:

Substring does allocate a new string. You can read the characters directly from the string, though, making substring completely unnecessary for reading single characters. That said, how would you implement substring without allocating a new string?

...

He may be referring to Java, where substring uses the parent's char array. http://www.javamex.com/tutorials/memory/string_memory_usage.shtml

Lysandus
Jun 21, 2010

Nevett posted:

This is in C#, in an ASP.NET project I've inherited.

code:
	public Boolean IsNumeric(String strString)
	{
		Boolean bValid = true;
		for (Int32 nI = 0; nI < strString.Length; nI++)
		{
			String strChar = strString.Substring(nI, 1);
			if (strChar != "0" && strChar != "1" && strChar != "2"
			 && strChar != "3" && strChar != "4" && strChar != "5"
			 && strChar != "6" && strChar != "7" && strChar != "8"
			 && strChar != "9")
			{
				bValid = false;
			}
		}
		return bValid;
	}


Run this thread through there and see how long it takes.

SirViver
Oct 22, 2008

Geekner posted:

He may be referring to Java, where substring uses the parent's char array. http://www.javamex.com/tutorials/memory/string_memory_usage.shtml
Ah okay, makes sense for languages with immutable strings. Though couldn't that lead to cases where a large string is unnecessarily kept in memory because a "sub-string" is still referencing it? That could be worked around, though, I guess.

Jabor posted:

Regexes are actually a pretty good idea for this simple validation.

Honestly, return str ~= /^\d+$/ is a much more clear and legible way of expressing "this should consist solely of digits", and return str ~= !^(\d{2})/(\d{2})/\d{4}$! and int($1) < 32 and int($2) < 13 is a much better way of expressing that CheckDate thingy.
Maybe you're right, but to be honest, personally I avoid regexes as much as possible. Unless you're well versed in regex syntax and use them constantly they tend to become maintenance nightmares quickly. For people who learn regex only to solve a problem and immediately forget how it works afterwards (like me) they also end up completely indecipherable, regardless how simple. But yes, for very simple cases that are unlikely to need correction and if using static precompiled ones (for better performance) they should be acceptable.

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

markerstore posted:

Apart from missing a "break", given a sane implementation of substring and string comparison (e.g. substring doesn't allocate a new string), this isn't really much different from the normal approach, is it?
Well, only roughly, even if you were limiting yourself to nonnegative integers (since the function will report that neither "-3" nor "1.3" are numeric) :stare:

Though, indeed, SirViver is almost sure to be on the mark since ASCII 0-9 are contiguous, the normal approach is surely to check something like
if (a[i] >= '0' && a[i] <= '9') { ... }. But I disagree on the regex point; I mean, /-?\d+(\.\d+)?/ isn't the goriest regex if you wanted to do it right, anyway.

b0lt
Apr 29, 2005

SirViver posted:

Substring does allocate a new string. You can read the characters directly from the string, though, making substring completely unnecessary for reading single characters. That said, how would you implement substring without allocating a new string?

Strings are immutable in C# like in java, why would it allocate a new string?

king_kilr
May 25, 2007

b0lt posted:

Strings are immutable in C# like in java, why would it allocate a new string?

Because otherwise you need to keep a ref to the old string, which depending on the relative sizes of the strings could be bad.

Flobbster
Feb 17, 2005

"Cadet Kirk, after the way you cheated on the Kobayashi Maru test I oughta punch you in tha face!"

SirViver posted:

Maybe you're right, but to be honest, personally I avoid regexes as much as possible. Unless you're well versed in regex syntax and use them constantly they tend to become maintenance nightmares quickly. For people who learn regex only to solve a problem and immediately forget how it works afterwards (like me) they also end up completely indecipherable, regardless how simple.

Then do something like this:

code:
public bool IsValidDate(string str)
{
    return str.Matches("\d{1,2}/\d{1,2}/\d{4}");
    // or whatever, I don't remember the exact method names
}
Put the test in a separate function with a descriptive name, and put the regex inside it. That way the function name tells you what's going on, even if you forget what the regex does. Not being able to keep the regex syntax in your brain isn't an excuse to avoid using them when they're the best tool for the job.

Of course, in the case of dates, they're not the best tool for the job, I was just using this as an example. Dates should always be parsed/tested using whatever calendar facilities your framework provides.

Lexical Unit
Sep 16, 2003

code:
// foo.h:

#ifndef FOO_H
#define FOO_H

struct foo
{
  foo();
  static void bar();
  static void baz();
  manager m;

  // other stuff ...
};

static foo* HACK;

#endif


// foo.cc:

#include "foo.h"

foo::foo(int id)
{
  HACK = this;
  m.register_event (bar, 3);
  m.register_event (baz, 4);
}

foo::bar(int id)
{
  // uses HACK
}

foo::baz(int id)
{
  // uses HACK
}
Note that it's completely possible and straight forward to use non-static methods for events.

No Pants
Dec 10, 2000

SirViver posted:

Substring does allocate a new string. You can read the characters directly from the string, though, making substring completely unnecessary for reading single characters. That said, how would you implement substring without allocating a new string?

Disregarding the stupidity of the actual usage scenario, these would've been saner implementations in my opinion:
That the first is implemented at all is a bad thing, since the framework provides Char.IsNumber(Char).

Darth Nemesis
Nov 11, 2003

I <3 majins,
omg disgaea fag

No Pants posted:

Char.IsNumber(Char)
That's going to match things like ⅞ and ௰. Do you really want it to accept all of these?

No Pants
Dec 10, 2000

Darth Nemesis posted:

That's going to match things like ⅞ and ௰. Do you really want it to accept all of these?
Fine. ch <= '\x00ff' && char.IsDigit(ch), then.

Edit: That actually makes things worse. Carry on. :(

No Pants fucked around with this message at 21:49 on Aug 31, 2010

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Lexical Unit posted:

Note that it's completely possible and straight forward to use non-static methods for events.

You tend to see stuff like this a lot when someone couldn't figure out the syntax for taking a member pointer. Which is not completely unreasonable.

Smugdog Millionaire
Sep 14, 2002

8) Blame Icefrog

No Pants posted:

Fine. ch <= '\x00ff' && char.IsDigit(ch), then.

Edit: That actually makes things worse. Carry on. :(

http://dotnetpad.net/ViewPaste/v5vLYwZDTUStflE3PsCHbA :confused:

Darth Nemesis
Nov 11, 2003

I <3 majins,
omg disgaea fag
IsDigit is better, but still not restrictive enough:

http://dotnetpad.net/ViewPaste/Lp6md7ZXDkmeyXlb-uNjmA

Spectral Elvis
Jul 23, 2007

This, this here, this is what a coding horror looks like. Macros and functions renamed, and several hundred lines of sub-horror snipped out (largely to protect certain secrets, and partly to actually make it a little clearer, if there is such a thing).

This is a chunk of functionality programmed entirely by macros. The general pattern is (and there are a number of different implementation files):

code:
#define CODE_TO_EXECUTE_ON_SUCCESS \
  ... blah blah blah
#define CODE_TO_EXECUTE_ON_FAILURE \
  ... blah blah blah
#include "that_file_you_want.c"
That may seem a weird enough thing to do to begin with, and you'd be right, it is downright ridiculous. The real horror, however, is in the implementations. Here, for example, is a nice piece of 'client' code.

code:
#ifndef SOME_BEHAVIOUR
int CuriouslyNamedFunction(char *prod,void (*err_func)(char *msg),void (*okay_func)())
#endif
{
... 
200 lines of awkward pre-amble full of macro spaghetti.
...
#ifdef SOME_OBSCURE_BEHAVIOUR1
#define DO_SOMETHING_MACRO_ALL \
if(!strncmp(xxxid, "magic_string" MAJOR_VERSION "_" XXX_ARCH ";",sizeof(XXX_ARCH)-1+11)) found=1;

   found=0;
#include "another_c_module.c"
   if(!found) {
      syslog(LOG_CRIT, "This box is not a XXXX.",namebuf);
      exit(1);
   }

... snip  150 lines ...

#else /* SOME_OBSCURE_BEHAVIOUR1 */
#define DO_SOMETHING_MACRO_ALL \
{ \
  int ii,imv; \
  CONFUSING_MACRO(imv,xxx); \
  for(ii=0;mangleData[ii].mv;++ii) \
    if(mangleData[ii].mv == imv) \
      break; \
  if(mangleData[ii].sem < 0) { \
    p = strrchr(xxx,'-'); \
    if(!p) { \
      syslog(LOG_CRIT,"something bad" ERROR3); \
      exit(1); \
    } \
    if((p2 = strchr(p,';')) && ((time_t)time(0) > (time_t)atoi(p2+1))) { \
      syslog(LOG_ERR, "something bad"); \
    } else if(!(p2 = strchr(xxx,'.')) || p2[1]!='r' || p2[2]!=*MAJOR_VERSION) { \
      ;                         /* Wrong xxxx version */ \
    } else { \
      imv = atoi(p+1); \
      imv = atoi(p+1); \
      ii = mangleData[ii].sem; \
      ANOTHER_MACRO(-ii,imv); \
    } \
  } \
}
#define ANOTHER_MACRO(xxxv,xxxnum) \
{ \
  if(xxxv == 1) { \
    for(mv=0,i=2;i<SOME_MAGIC_NUMBER;++i) { \
      un.array[i] += xxxnum; \
    } \
    un.array[1] += xxxnum; \
  } else { \
    un.array[xxxv] += xxxnum; \
    un.array[1] += xxxnum; \
  } \
}
#include "another_c_module.c"

... snip 100 ...

#define DO_SOMETHING_MACRO_ALL \
{ \
  int ii,imv; \
  CONFUSING_MACRO(imv,xxx); \
  for(ii=0;mangleData[ii].mv;++ii) \
    if(mangleData[ii].mv == imv) \
      break; \
  if(mangleData[ii].sem < 0) { \
    if((p = strchr(xxx,';')) && p[1]=='i') \
      crtMode=1; \
    p = strrchr(xxx,'-'); \
    if(!p) { \
      syslog(LOG_CRIT,"something else bad... this feels suspiciously familiar" ERROR3); \
      exit(1); \
    } \
    if((p2 = strchr(p,';')) && ((time_t)time(0) > (time_t)atoi(p2+1))) { \
      syslog(LOG_ERR,"something else bad"); \
    } else if(!(p2 = strchr(xxx,'.')) || p2[1]!='r' || p2[2]!=*MAJOR_VERSION) { \
      ;                         /* Wrong xxxx version */ \
    } else { \
      imv = atoi(p+1); \
      ii = mangleData[ii].sem; \
      ANOTHER_MACRO(-ii,imv); \
    } \
  } \
}
#undef ANOTHER_MACRO
#define ANOTHER_MACRO(xxxv,xxxnum) \
{ \
  if(xxxv == 1) { \
    for(mv=0,i=2;i<SOME_MAGIC_NUMBER;++i) { \
      xsems[i] += xxxnum; \
    } \
    xsems[1] += xxxnum; \
  } else { \
    xsems[xxxv] += xxxnum; \
    xsems[1] += xxxnum; \
  } \
}

#define lblnext lblnext2
#define done_all done_all2
#include "another_c_module.c"

... more horror
}
While you're digesting that, and some of the beautiful sub-horrors within, remember this is just the client code. The implementation code is truly a thing of hideous beauty.

And there's bonus horror! A lot of this code is handling semaphores badly. The implementation code handles semaphores badly. Debugging this kind of code is near enough impossible.

Oh, in case you're wondering. Yes, there is a reason it is written this way. And, no, it isn't a good reason.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



SirViver posted:

Maybe you're right, but to be honest, personally I avoid regexes as much as possible. Unless you're well versed in regex syntax and use them constantly they tend to become maintenance nightmares quickly. For people who learn regex only to solve a problem and immediately forget how it works afterwards (like me) they also end up completely indecipherable, regardless how simple. But yes, for very simple cases that are unlikely to need correction and if using static precompiled ones (for better performance) they should be acceptable.

My regex-allergic coworkers will write hundreds of lines of tedious, impenetrable if soup to grab some information from a coded string and I'll come along and make a regular expression that's less fragile* and does the same job - often more. Of course, it depends on what you do all day, but they're super loving helpful if you get the hang of them.

*As in, written so they won't break if there's an extra character anywhere in the string. This is important here because when their spaghetti code can't handle a tiny variation in the input, it costs someone else (who is already really loving busy) time to sort it out and if that happens every day, then someone is wasting tons of time dealing with it.

Spectral Elvis
Jul 23, 2007

I figured it would be unfair not to give an example of one implementation.

This is a genuine reduction of the C file included 3 times in the 'client' example.

code:
#define HAPPY_FUN_MACRO(buf,n) { if(((unsigned char )buf[n]) \
  (HAPPY_FUN_MAGIC2>>((n-4)*8))) & 0xff)) XXX_ERROR("Byte " #n " does not match in HAPPY_FUN_MACRO?!"); }

#define DOING_MACRO0(buf) HAPPY_FUN_MACRO(buf,0)
...
#define DOING_MACRO7(buf) HAPPY_FUN_MACRO(buf,7)


#ifndef SOMETHING_IVE_NEVER_UNDERSTOOD
int main() {
#endif

  ... snip ...

#include "yet_another_c_file_full_of_this_kind_of_crap.c"

#if defined(SOMETHING_WERE_INTERESTED_IN) && defined(DO_SOMETHING_PRE_ANOTHER_THING)
    if(!strcmp(SOMETHING_WERE_INTERESTED_IN,(char *)(fbuffer+12))) {
      DO_SOMETHING_PRE_ANOTHER_THING;
    }
#endif

  ... snip ...

    if(xxxbuf[0]=='@') {
#ifdef DO_SOMETHING_MACRO_IF_NOT_THE_OTHER_ONE
      {
        DO_SOMETHING_MACRO_IF_NOT_THE_OTHER_ONE;
      }
#endif
#ifndef DO_SOMETHING_ELSE_MACRO_I_DONT_REMEMBER
      continue;         
#endif
    }

    DOING_MACRO0(xxxbuf,base);
    DOING_MACRO3(xxxbuf,base);
    DOING_MACRO5(xxxbuf,base);
    DOING_MACRO7(xxxbuf,base);
#ifdef SORT_OF_DEBUGGING
    DOING_MACRO1(xxxbuf,base);
    DOING_MACRO2(xxxbuf,base);
    DOING_MACRO4(xxxbuf,base);
    DOING_MACRO6(xxxbuf,base);
#endif

#if defined(SOMETHING_WERE_INTERESTED_IN) && defined(DO_SOMETHING_POST_ANOTHER_THING)
    if(!strcmp(SOMETHING_WERE_INTERESTED_IN,(char *)(fbuffer+12))) {
      DO_SOMETHING_POST_ANOTHER_THING;
    }
#endif
#ifdef DO_SOMETHING_MACRO_ALL
    {
      DO_SOMETHING_MACRO_ALL;
    }
#endif
 lblnext:
    ;
#ifndef NOT_SURE_ABOUT_THIS_MACRO
  ...
#endif
 done_all:
  ;
#else  /* The Windoze version (sigh) */

  ... a few hunded lines of bad things to do with DCOM if you write this kind of code.
#endif
And, yes, this here:

code:
#else  /* The Windoze version (sigh) */
Is a genuine non-ironic comment in the codebase.

SirViver
Oct 22, 2008

Flobbster posted:

Then do something like this:
[...]
Put the test in a separate function with a descriptive name, and put the regex inside it. That way the function name tells you what's going on, even if you forget what the regex does.
It didn't even occur to me to not do that to begin with :). Still, I'm wary of them and don't use them unless the parsing complexity really requires a regex. If the code is ten times longer than a regex but easier to read and maintain for me and more importantly my coworkers, I'll prefer the non-regex solution any day. One thing I've learned during my years is that writing "smart" or compact code at the expense of clarity does not help at all maintaining it in the long run. Regexes, unless clearly the better solution (I'm not arguing against their use in general, just using them where there is no distinct advantage), make the code look a lot more hostile, even if the parsing that is being done is very simple.

That said, I have to admit that I very rarely run into the need to parse strings anyway. Maybe if I had to do a lot of that my opinion would be different v:shobon:v

Zombywuf
Mar 29, 2008

SirViver posted:

That said, I have to admit that I very rarely run into the need to parse strings anyway. Maybe if I had to do a lot of that my opinion would be different v:shobon:v

Yeah, maybe you'd learn to regex. They are not some deep mystery only comprehensible by the gods, they're the most trivial way of expressing a string pattern.

ErIog
Jul 11, 2001

:nsacloud:

SirViver posted:

It didn't even occur to me to not do that to begin with :). Still, I'm wary of them and don't use them unless the parsing complexity really requires a regex. If the code is ten times longer than a regex but easier to read and maintain for me and more importantly my coworkers, I'll prefer the non-regex solution any day. One thing I've learned during my years is that writing "smart" or compact code at the expense of clarity does not help at all maintaining it in the long run. Regexes, unless clearly the better solution (I'm not arguing against their use in general, just using them where there is no distinct advantage), make the code look a lot more hostile, even if the parsing that is being done is very simple.

That said, I have to admit that I very rarely run into the need to parse strings anyway. Maybe if I had to do a lot of that my opinion would be different v:shobon:v

Your point makes sense, but I'm not sure it's applicable here. At some point brevity becomes its own simplicity. A single line regex with maybe a comment wrapped in a well-named function that calls the regex is probably a whole lot easier for other programmers to follow than an if..then soup spaghetti which can mask subtle bugs with the validation that will be hard to track down. The limits of the regex are almost always explicitly spelled out in the regex, and a regex is very simple to pull out to test separately if it does become an issue.

It's all about shades of grey here, but most regexes aren't really that complicated. They're handy.

ErIog fucked around with this message at 14:17 on Sep 1, 2010

Mustach
Mar 2, 2003

In this long line, there's been some real strange genes. You've got 'em all, with some extras thrown in.

porkfactor posted:

This is a genuine reduction of the C file included 3 times in the 'client' example.
I think Wes Anderson said that horrible things can have a kind of beauty to them. This is one of those things.

It's like they saw "x-macros" and the original Bourne shell's code and thought "those guys didn't go nearly far enough."

Captain Capacitor
Jan 21, 2008

The code you say?

ErIog posted:

Your point makes sense, but I'm not sure it's applicable here. At some point brevity becomes its own simplicity. A single line regex with maybe a comment wrapped in a well-named function that calls the regex is probably a whole lot easier for other programmers to follow than an if..then soup spaghetti which can mask subtle bugs with the validation that will be hard to track down. The limits of the regex are almost always explicitly spelled out in the regex, and a regex is very simple to pull out to test separately if it does become an issue.

It's all about shades of grey here, but most regexes aren't really that complicated. They're handy.

Regexes rule! I totally built an HTML parsing library using them! :haw:

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope

Zombywuf posted:

Yeah, maybe you'd learn to regex. They are not some deep mystery only comprehensible by the gods, they're the most trivial way of expressing a string pattern.

Yeah, this. You really should learn them, and they're really not that hard. They have applications other than programming too.

trex eaterofcadrs
Jun 17, 2005
My lack of understanding is only exceeded by my lack of concern.

Wheany posted:

Yeah, this. You really should learn them, and they're really not that hard. They have applications other than programming too.

:mmmhmm: Hey baby, s/your pants//g
:nyd: gently caress off, nerd

quiggy
Aug 7, 2010

[in Russian] Oof.


TRex EaterofCars posted:

:mmmhmm: Hey baby, s/your pants//g

Is it bad that this would work on me?

qntm
Jun 17, 2009

quiggy posted:

Is it bad that this would work on me?

The best chat-up lines are the insanely geeky ones because anybody they work on is an insane geek.

NotShadowStar
Sep 20, 2000

porkfactor posted:

Oh, in case you're wondering. Yes, there is a reason it is written this way. And, no, it isn't a good reason.

Please tell me because it's generated from some weird-rear end-language to C parser.

Spectral Elvis
Jul 23, 2007

NotShadowStar posted:

Please tell me because it's generated from some weird-rear end-language to C parser.

Oh, if only. That would even make sense.

Apparently the author thought that writing like this would be an awesome way to prevent someone from reverse engineering the code.

There are times I swear some of the crap I have to work with is some kind of elaborate practical joke.

spinflip
Sep 11, 2001

O_o Helo U

Captain Capacitor posted:

Regexes rule! I totally built an HTML parsing library using them! :haw:

Best stackoverflow post:

quote:

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the n​erves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg​ex parsers for HTML will ins​tantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection wil​l devour your HT​ML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fi​ght he com̡e̶s, ̕h̵i​s un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope

TRex EaterofCars posted:

:mmmhmm: Hey baby, s/your pants//g
:nyd: gently caress off, nerd

Well, I couldn't actually think of that many examples, and even those are close to programming.

Anyway: learn regular expressions, they're really loving useful and not hard at all.

(try http://www.weitz.de/regex-coach/ )

Captain Capacitor
Jan 21, 2008

The code you say?

I was hoping someone was going to post that.

I love working with regexes in Python, especially taking advantage of the string concatenation.

code:
>>> pattern = (
...     "^"                 # beginning of string
...     "M{0,4}"            # thousands - 0 to 4 M's
...     "(CM|CD|D?C{0,3})"  # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
...                         #            or 500-800 (D, followed by 0 to 3 C's)
...     "(XC|XL|L?X{0,3})"  # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
...                         #        or 50-80 (L, followed by 0 to 3 X's)
...     "(IX|IV|V?I{0,3})"  # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
...                         #        or 5-8 (V, followed by 0 to 3 I's)
...     "$"                 # end of string
... )
>>> print pattern
"^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"

CHRISTS FOR SALE
Jan 14, 2005

"fuck you and die"
code:
.nav_list_link:hover, .nav_list_link:hover {
Why would it ever make sense to do this ever?!?

:suicide:

Lumpy
Apr 26, 2002

La! La! La! Laaaa!



College Slice

CHRISTS FOR SALE posted:

code:
.nav_list_link:hover, .nav_list_link:hover {
Why would it ever make sense to do this ever?!?

:suicide:

I'm not sure hitting paste twice is really a coding horror, especially since it doesn't affect functionality or introduce any display bugs.

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Captain Capacitor posted:

I was hoping someone was going to post that.

I love working with regexes in Python, especially taking advantage of the string concatenation.

code:
>>> pattern = (
...     "^"                 # beginning of string
...     "M{0,4}"            # thousands - 0 to 4 M's
...     "(CM|CD|D?C{0,3})"  # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
...                         #            or 500-800 (D, followed by 0 to 3 C's)
...     "(XC|XL|L?X{0,3})"  # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
...                         #        or 50-80 (L, followed by 0 to 3 X's)
...     "(IX|IV|V?I{0,3})"  # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
...                         #        or 5-8 (V, followed by 0 to 3 I's)
...     "$"                 # end of string
... )
>>> print pattern
"^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"

In Perl, you can do this a bit easier by using the /x switch on the regex; since Python regexes are PCRE to my knowledge, it's likely the /x switch is implemented and you don't have to simulate it yourself. Cf.
code:
$re = qr{
    \(
        (?:
	    (?> [^()]+ )	# Non-parens without backtracking
	 |
	    (??{ $re })	        # Group with matching parens
	)*
    \)
}x;

CHRISTS FOR SALE
Jan 14, 2005

"fuck you and die"

Lumpy posted:

I'm not sure hitting paste twice is really a coding horror, especially since it doesn't affect functionality or introduce any display bugs.
Go delete my post.

Adbot
ADBOT LOVES YOU

Jonnty
Aug 2, 2007

The enemy has become a flaming star!

Captain Capacitor posted:

I was hoping someone was going to post that.

I love working with regexes in Python, especially taking advantage of the string concatenation.

code:
>>> pattern = (
...     "^"                 # beginning of string
...     "M{0,4}"            # thousands - 0 to 4 M's
...     "(CM|CD|D?C{0,3})"  # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
...                         #            or 500-800 (D, followed by 0 to 3 C's)
...     "(XC|XL|L?X{0,3})"  # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
...                         #        or 50-80 (L, followed by 0 to 3 X's)
...     "(IX|IV|V?I{0,3})"  # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
...                         #        or 5-8 (V, followed by 0 to 3 I's)
...     "$"                 # end of string
... )
>>> print pattern
"^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"

I'm sure you probably already know this, but it's good practice to use r"raw strings" for regexp in python so you don't have to escape anything (not that you had to here, of course).

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply