Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Hiowf
Jun 28, 2013

We don't do .DOC in my cave.
The people who think they can detect valid phone numbers via regexes are the same people who believe you can't have dashes in your name. This example is a horror, regexes or not.

Adbot
ADBOT LOVES YOU

nielsm
Jun 1, 2009



Reminder that this regex discussion was spawned by this:

Coylter posted:

code:
Private Function IsZipCodeValid(ByRef strZipCode As String) As Boolean
		IsZipCodeValid = False
		strZipCode = strZipCode.Replace(" ", "")
        If strZipCode.Length = 6 Then
            strZipCode = strZipCode.ToLower
            If IsString(strZipCode.Substring(0, 1)) Then
                If IsNumber(strZipCode.Substring(1, 1)) Then
                    If IsString(strZipCode.Substring(2, 1)) Then
                        If IsNumber(strZipCode.Substring(3, 1)) Then
                            If IsString(strZipCode.Substring(4, 1)) Then
                                If IsNumber(strZipCode.Substring(5, 1)) Then
                                    IsZipCodeValid = True
                                End If
                            End If
                        End If
                    End If
                End If
            End If
        End If
	End Function

	Public Function IsNumber(ByVal chrNombre As Char) As Boolean
		If chrNombre >= "0" And chrNombre <= "9" Then
			IsNumber = True
		Else
			IsNumber = False
		End If
	End Function

	Private Function IsString(ByVal chrString As Char) As Boolean
		If chrString >= "a" And chrString <= "z" Then
			IsString = True
		Else
			IsString = False
		End If
	End Function
Well, let's just say this made my day. This gentlemens, is part of the server code of something important.

An example of a DFA implemented in an ugly way, which would have been more readable as a regular expression.

down with slavery
Dec 23, 2013
STOP QUOTING MY POSTS SO PEOPLE THAT AREN'T IDIOTS DON'T HAVE TO READ MY FUCKING TERRIBLE OPINIONS THANKS

Skuto posted:

The people who think they can detect valid phone numbers via regexes are the same people who believe you can't have dashes in your name. This example is a horror, regexes or not.

The truth is that detecting valid phone numbers is a horror in of itself. Or better put "what is a valid phone number" is a question that's way harder to answer than appears at first glance. See also: recognizing an address

Polio Vax Scene
Apr 5, 2009



down with slavery posted:

The truth is that detecting valid phone numbers is a horror in of itself. Or better put "what is a valid phone number" is a question that's way harder to answer than appears at first glance. See also: recognizing an address

gently caress that sideways, throw the address to Bing's API and if they can't deal then it's their fault not yours.

Soricidus
Oct 21, 2010
freedom-hating statist shill

nielsm posted:

Reminder that this regex discussion was spawned by this:

An example of a DFA implemented in an ugly way, which would have been more readable as a regular expression.
Yeah, I don't think anyone same disagrees that they make sense for simple cases. But input validation is rarely simple, and often unnecessary.

Soricidus fucked around with this message at 21:45 on Jul 17, 2014

EAT THE EGGS RICOLA
May 29, 2008

down with slavery posted:

The truth is that detecting valid phone numbers is a horror in of itself. Or better put "what is a valid phone number" is a question that's way harder to answer than appears at first glance. See also: recognizing an address

Oh hay this is like validating names.

quote:

So, as a public service, I’m going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.

-People have exactly one canonical full name.
-People have exactly one full name which they go by.
-People have, at this point in time, exactly one canonical full name.
-People have, at this point in time, one full name which they go by.
-People have exactly N names, for any value of N.
-People’s names fit within a certain defined amount of space.
-People’s names do not change.
-People’s names change, but only at a certain enumerated set of events.
-People’s names are written in ASCII.
-People’s names are written in any single character set.
-People’s names are all mapped in Unicode code points.
-People’s names are case sensitive.
-People’s names are case insensitive.
-People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
-People’s names do not contain numbers.
-People’s names are not written in ALL CAPS.
-People’s names are not written in all lower case letters.
-People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
-People’s first names and last names are, by necessity, different.
-People have last names, family names, or anything else which is shared by folks recognized as their relatives.
-People’s names are globally unique.
-People’s names are almost globally unique.
-Alright alright but surely people’s names are diverse enough such that no million people share the same name.
-My system will never have to deal with names from China.
-Or Japan.
-Or Korea.
-Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
-That Klingon Empire thing was a joke, right?
-Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
-There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
-I can safely assume that this dictionary of bad words contains no people’s names in it.
-People’s names are assigned at birth.
-OK, maybe not at birth, but at least pretty close to birth.
-Alright, alright, within a year or so of birth.
-Five years?
-You’re kidding me, right?
-Two different systems containing data about the same person will use the same name for that person.
-Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
-People whose names break my system are weird outliers. They should have had solid, acceptable names, like 田中太郎.
-People have names.

Steve French
Sep 8, 2003

quote:

There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)

What the hell is this intended to mean? Because I can think of endless ways to "transform" any data losslessly.

shrughes
Oct 11, 2008

(call/cc call/cc)
How many of your regexes can recognize the following phone number?

code:
same
Guess what folks, sometimes you've just got to validate phone numbers, and also email addresses, and regexes are a great way to do that.

TheresaJayne
Jul 1, 2011

shrughes posted:

How many of your regexes can recognize the following phone number?

code:
same
Guess what folks, sometimes you've just got to validate phone numbers, and also email addresses, and regexes are a great way to do that.

Phone numbering systems change.

For instance it used to be something like
High Wycombe 4882

Then
0494 4882

Then they changed it to

0494 724882

Then

01494 724882

and they now are talking about changing it to something like

020 494 724882

now what next.....
1100 1001 :)

Westie
May 30, 2013



Baboon Simulator

TheresaJayne posted:

Phone numbering systems change.

For instance it used to be something like
High Wycombe 4882

Then
0494 4882

Then they changed it to

0494 724882

Then

01494 724882

and they now are talking about changing it to something like

020 494 724882

now what next.....
1100 1001 :)

To add to the UK numbering poo poo, you can have three digit area codes to six digit area codes.

And unless you're a mobile and/or living in Bournemouth, your the last bit of your number can't start with 0 or 1, something silly like that.

edit: gently caress validating phone numbers, just call them.

Soricidus
Oct 21, 2010
freedom-hating statist shill

shrughes posted:

Guess what folks, sometimes you've just got to validate phone numbers, and also email addresses, and regexes are a great way to do that.
:laffo:

Email address validation with a regex doesn't get you anything. It's very likely you'll reject valid addresses, and likewise it's very likely users will input "valid" addresses that have typos in them. The way to validate an email address is to accept whatever the gently caress the user types and then send it an email with a validation link in it.

Phone numbers? gently caress landlines, send them a validation SMS.

shrughes
Oct 11, 2008

(call/cc call/cc)

Soricidus posted:

Email address validation with a regex doesn't get you anything. It's very likely you'll reject valid addresses, and likewise it's very likely users will input "valid" addresses that have typos in them. The way to validate an email address is to accept whatever the gently caress the user types and then send it an email with a validation link in it.

You're just repeating something that some retard wrote that you read somewhere. If you can't imagine a situation where you don't want to be sending emails, try thinking of ten of them. Then maybe you'll fall short and think of a few.

Hiowf
Jun 28, 2013

We don't do .DOC in my cave.
The idea that you can validate a phone number with a regex remains outright silly and misguided. You will know if it's valid if you CALL it, at which point "same" is going to fail just as hard as something that matches the regex but fails with "this number is no longer connected".

The situation for emails is slightly better because there's at least an RFC specifying the acceptable format. And indeed you could even check if you can resolve the MX for the domain specified. But knowing whether it'll actually works? The moment you can actually send the email.

Soricidus
Oct 21, 2010
freedom-hating statist shill

shrughes posted:

You're just repeating something that some retard wrote that you read somewhere. If you can't imagine a situation where you don't want to be sending emails, try thinking of ten of them. Then maybe you'll fall short and think of a few.
Tell us more about these situations where it is incredibly important that an email address be well-formed, but you don't give a gently caress whether or not it actually exists. :allears:

(And even in a situation like that, you still shouldn't be writing a regex. Use a library.)

shrughes
Oct 11, 2008

(call/cc call/cc)

Skuto posted:

The idea that you can validate a phone number with a regex remains outright silly and misguided.

That's weird because I worked on software that validated phone numbers with a regex and it worked fine. Except it didn't work with "same". So we fixed it so that it worked properly with "same". Then it handled "same" properly.

Skuto posted:

You will know if it's valid if you CALL it, at which point "same" is going to fail just as hard as something that matches the regex but fails with "this number is no longer connected".

But then their phone will ring, and you don't want to bother them.

Skuto posted:

The situation for emails is slightly better because there's at least an RFC specifying the acceptable format.

The RFC is irrelevant. There's stuff outside the RFC that you should accept and stuff inside that you should reject.

shrughes
Oct 11, 2008

(call/cc call/cc)

Soricidus posted:

Tell us more about these situations where it is incredibly important that an email address be well-formed, but you don't give a gently caress whether or not it actually exists. :allears:

Let's say you find a document with some text in it and you want to recognize email addresses.

Let's say you've got an address book with some text in it and you want to recognize email addresses and also recognize which are the same email addresses from multiple people's address books.

Soricidus posted:

(And even in a situation like that, you still shouldn't be writing a regex. Use a library.)

No, you should roll it yourself, because when a customer complains that you're not recognizing certain email addresses properly, you'll need to fix it.

P.S. You'll need to recognize that "John Smith"@foo.com is the same person as johnsmith@foo.com. Also, two people with the same phone number aren't the same person if their phone numbers are "same".

shrughes fucked around with this message at 10:34 on Jul 18, 2014

Soricidus
Oct 21, 2010
freedom-hating statist shill

shrughes posted:

Let's say you find a document with some text in it and you want to recognize email addresses.

Let's say you've got an address book with some text in it and you want to recognize email addresses and also recognize which are the same email addresses from multiple people's address books.
So ... we're suddenly going to completely change the subject from input validation to entity extraction and data mining?

shrughes posted:

No, you should roll it yourself, because when a customer complains that you're not recognizing certain email addresses properly, you'll need to fix it.
Then use a library with a license that lets you modify or extend it. You'd have to be an idiot to wilfully reinvent the wheel when there are already dozens of open source and commercial libraries dedicated to the task.

shrughes
Oct 11, 2008

(call/cc call/cc)

Soricidus posted:

Tell us more about these situations where it is incredibly important that an email address be well-formed, but you don't give a gently caress whether or not it actually exists. :allears:

Edit: never mind, the answer is your mom.

shrughes
Oct 11, 2008

(call/cc call/cc)

Soricidus posted:

So ... we're suddenly going to completely change the subject from input validation to entity extraction and data mining?


Uh yeah, these are user inputs and you're validating them? User inputs an email into a computer system... you validate it? What the gently caress do you think we're talking about?

Edit: Pro tip: Validating something means that you find it to be valid. "Is this a valid email address?" Why gee, don't mention all the situations where you can't just email the guy. That's different!

Edit: Seriously, if you can't think of situations where you might want to know if some text is really a realistic email address... learn how to think. You're basically the furry in the alley right now, going full throttle with your lack of imagination. I'll give you some examples to get the ball started. Suppose your hot mother's writing an SMTP server and she asks you for help...

Soricidus posted:

Then use a library with a license that lets you modify or extend it. You'd have to be an idiot to wilfully reinvent the wheel when there are already dozens of open source and commercial libraries dedicated to the task.

Let me let you in on an industry secret...:

Libraries are crap.

shrughes fucked around with this message at 11:09 on Jul 18, 2014

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



shrughes posted:

Uh yeah, these are user inputs and you're validating them? User inputs an email into a computer system... you validate it? What the gently caress do you think we're talking about?

Those are different things. The use cases are sufficiently different that different methods make sense. For input validation, the simple & sure-fire way is to shoot an email to the address as written and include a validation link. For data mining, you'll need to decide how fuzzy your matching should be. Depending on the size/quality of the data set, it could be as simple as grabbing all tokens containing an @-symbol.

Hiowf
Jun 28, 2013

We don't do .DOC in my cave.

shrughes posted:

...with a regex and it worked fine. Except it didn't...

No more needs to be said.

The rest of your post(s) is trying to weasel out of recognizing validity and into recognizing what's reasonably certain to be a specific type of input.

Hiowf
Jun 28, 2013

We don't do .DOC in my cave.

quote:

But then their phone will ring, and you don't want to bother them.

Then why do you care if it's valid phone number if you're not going to use it anyway?

You don't - that's the whole point. You just care if it's likely to be intended to be a phone number and not something else.

shrughes
Oct 11, 2008

(call/cc call/cc)
You people don't know what the word "valid" means.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

shrughes posted:

Let's say you find a document with some text in it and you want to recognize email addresses.

Let's say you've got an address book with some text in it and you want to recognize email addresses and also recognize which are the same email addresses from multiple people's address books.


No, you should roll it yourself, because when a customer complains that you're not recognizing certain email addresses properly, you'll need to fix it.

P.S. You'll need to recognize that "John Smith"@foo.com is the same person as johnsmith@foo.com. Also, two people with the same phone number aren't the same person if their phone numbers are "same".

code:
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

xtal
Jan 9, 2011

by Fluffdaddy

Shinku ABOOKEN posted:

No they are not. The "...now you have two problems :v:" meme is garbage.

quote:

A regex is basically a deterministic finite automaton (DFA) and your string is the tape it runs on. As simple as that.

O ok

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Apparently "valid" means "it looks like a phone number to the special snowflake program I wrote", and is completely unrelated to whether or not it is something you could enter into a telephone somewhere in the world and be connected to another telephone system.

Jewel
May 2, 2009

I don't know much about how the internals of phone cable switch stations work but isn't the best way to solve this debate to look at how they work? If you call a number it gets routed based on the area code and such; so does it:

A: Use an internal system to check if an area code/number is valid before routing
B: Try to route regardless and return an "error code" if it's not a valid location to route to

Although yeah shrughes' point on "valid" meaning "technically a possible real phone number" is true. You can't actually confirm anything about a phone number other than "does it fit to a schema" and even then the schemas vary so much that why bother. A phone number might not even belong to anyone/be in service and no coding in the world can tell if that's true or not (short of a system that automatically called the number but that's horrible and still not really a solution).

Emails have the same problem too to a much lesser extent, though I think all emails require a '@' then a '.' with at least one character to all sides of each but I might be wrong. Either way in that case it's best to send a verification mail and to make the user enter it twice to avoid typos and make sure someone/they actually own it, rather than trying to see if it conforms to your "standard".

Hiowf
Jun 28, 2013

We don't do .DOC in my cave.

Jewel posted:

You can't actually confirm anything about a phone number other than "does it fit to a schema" and even then the schemas vary so much that why bother. A phone number might not even belong to anyone/be in service and no coding in the world can tell if that's true or not (short of a system that automatically called the number but that's horrible and still not really a solution).

I agree with this (note that you're just repeating what was already said), but I don't see what it has to do with the sentence just before it.

What do you think "technically a possible real phone number" looks like that makes you believe you can 100% reliably identify it *and* not get false positives?

quote:

Emails have the same problem too to a much lesser extent, though I think all emails require a '@' then a '.' with at least one character to all sides of each but I might be wrong.

No need for the '.'. Try mailing root@localhost.

Hiowf
Jun 28, 2013

We don't do .DOC in my cave.

quote:

I don't know much about how the internals of phone cable switch stations work but isn't the best way to solve this debate to look at how they work?

Sure, which model from which manufacturer do you want to look at? :-)

If you think this is a tractable problem in the real world then please read "falsehoods programmers believe about names" and "falsehoods programmers believe about time" and imagine you could probably write a "falsehoods programmers believe about phone numbers" and "falsehoods programmers believe about addresses".

Edit:
1) http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/ (f;b vvvv)
2) https://code.google.com/p/libphonenumber/

Hiowf fucked around with this message at 14:11 on Jul 18, 2014

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

(?s).*

Problem solved.

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



Skuto posted:

Sure, which model from which manufacturer do you want to look at? :-)

If you think this is a tractable problem in the real world then please read "falsehoods programmers believe about names" and "falsehoods programmers believe about time" and imagine you could probably write a "falsehoods programmers believe about phone numbers" and "falsehoods programmers believe about addresses".

http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/

e: :catstare: "From where the Chinese restaurant used to be, two blocks down, half a block toward the lake, next door to the house where the yellow car is parked, Managua, Nicaragua."

Carthag Tuek fucked around with this message at 14:10 on Jul 18, 2014

canis minor
May 4, 2011

All this talk about regexes and validation reminds me of me having an issue registering on kotaku or some other shitsite some time ago, where phone number was required, and me giving a correct one would still trigger off their javascript validation. In the end I've removed the bound events through console, but then, after submitting my data I've gotten 503 error, I could not log in and trying to use reset password triggered another 503. I guess somebody forgot about server side validation...

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
You guys really can't imagine a validation scenario for phone numbers where you want to catch as many obvious typos (and instances where an area code was omitted) as possible and calling the number to validate it isn't acceptable? You're drinking somebody's kool-aid.


pre:
Name        ________________

Address     ________________
            ________________
            ________________

Emergency   ________________
contact #

...

Alereon
Feb 6, 2004

Dehumanize yourself and face to Trumpshed
College Slice
:siren:This is a stupid argument:siren: and all you fuckers who have trouble validating phone numbers should just move to somewhere in the North American Numbering Plan Area. Remember that with phone numbers you actually do have access to an authority that says which ones are valid so if you actually care you can just pay a tiny fraction of a cent per query to ask.

Volmarias
Dec 31, 2002

EMAIL... THE INTERNET... SEARCH ENGINES...
Edit: nevermind.

Volmarias fucked around with this message at 14:29 on Jul 18, 2014

Germstore
Oct 17, 2012

A Serious Candidate For a Serious Time
Paying someone whose entire business is knowing this poo poo is really the best way to go. Any reasonable person, even one who knows USPS Pub 28 by heart would think 'S Ave E' in Chicago is 'South Avenue East,' and they would be wrong.

Hiowf
Jun 28, 2013

We don't do .DOC in my cave.

Alereon posted:

:siren:This is a stupid argument:siren: and all you fuckers who have trouble validating phone numbers should just move to somewhere in the North American Numbering Plan Area.

I think the second part of this sentence invalidates the first. (Note that you have to move your customers too)

quote:

Remember that with phone numbers you actually do have access to an authority that says which ones are valid so if you actually care you can just pay a tiny fraction of a cent per query to ask.

Scales well for international websites. And tell me, does it work for extensions too in the USA?

I'm a bit baffled that even after some discussion here people still think "I'll catch some typos" is a good trade-off against "I'll reject valid input", but I'll shut up now and just continue internally calling you stupid the next time my address, phone number or name gets rejected as invalid (all of which happen or used to happen all the loving time).

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Validation doesn't have to reject. It can be quite useful to simply warn that you selected US for your country but only have 9 digits in your phone number. I see sites periodically that warn if I mistype my email address as @gmali.com, and it's nice.

(There are lots of times that you might want to be able to email someone or call them, but not do it right then. "Phone number at destination" for travel is one, but if you have someone enter home/work/mobile numbers are you going to ring them all? Adding someone to my address book is another; please don't mail my friend to say I did so.)

JawnV6
Jul 4, 2004

So hot ...

Snapchat A Titty posted:

http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/

e: :catstare: "From where the Chinese restaurant used to be, two blocks down, half a block toward the lake, next door to the house where the yellow car is parked, Managua, Nicaragua."

Yeah I got an address in Costa Rica that started with "From where the big tree used to be," and resigned myself to a cab at that point. The chip facility that processed millions of dollars worth of equipment a year had official stationery that listed it's address as "500 meters west of the Firestone plant" because that was visible from the highway. There's generally a local hanging around the landmarks who will dereference the back half in exchange for a conversation.

Adbot
ADBOT LOVES YOU

Hiowf
Jun 28, 2013

We don't do .DOC in my cave.

Subjunctive posted:

Validation doesn't have to reject. It can be quite useful to simply warn that you selected US for your country but only have 9 digits in your phone number. I see sites periodically that warn if I mistype my email address as @gmali.com, and it's nice.

That's a decent solution. I'd guess continuously getting warnings "your name is not valid" is annoying, but not nearly as much as just refusing to post the form entirely. And it also avoids being totally blocked when the form can't accept your address, but when you mutilate it into the shape that the JS expects, you get a rejection because the back-end actually checks an address database and doesn't find a match [1]. I've had this happen.

[1] This is also a stupid idea if there's a remote possibility the database isn't 100% up to date.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply