|
The people who think they can detect valid phone numbers via regexes are the same people who believe you can't have dashes in your name. This example is a horror, regexes or not.
|
# ? Jul 17, 2014 19:57 |
|
|
# ? May 17, 2024 16:41 |
Reminder that this regex discussion was spawned by this:Coylter posted:
An example of a DFA implemented in an ugly way, which would have been more readable as a regular expression.
|
|
# ? Jul 17, 2014 20:56 |
Skuto posted:The people who think they can detect valid phone numbers via regexes are the same people who believe you can't have dashes in your name. This example is a horror, regexes or not. The truth is that detecting valid phone numbers is a horror in of itself. Or better put "what is a valid phone number" is a question that's way harder to answer than appears at first glance. See also: recognizing an address
|
|
# ? Jul 17, 2014 21:19 |
down with slavery posted:The truth is that detecting valid phone numbers is a horror in of itself. Or better put "what is a valid phone number" is a question that's way harder to answer than appears at first glance. See also: recognizing an address gently caress that sideways, throw the address to Bing's API and if they can't deal then it's their fault not yours.
|
|
# ? Jul 17, 2014 21:32 |
|
nielsm posted:Reminder that this regex discussion was spawned by this: Soricidus fucked around with this message at 21:45 on Jul 17, 2014 |
# ? Jul 17, 2014 21:43 |
|
down with slavery posted:The truth is that detecting valid phone numbers is a horror in of itself. Or better put "what is a valid phone number" is a question that's way harder to answer than appears at first glance. See also: recognizing an address Oh hay this is like validating names. quote:So, as a public service, I’m going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.
|
# ? Jul 17, 2014 22:20 |
|
quote:There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.) What the hell is this intended to mean? Because I can think of endless ways to "transform" any data losslessly.
|
# ? Jul 18, 2014 06:53 |
|
How many of your regexes can recognize the following phone number?code:
|
# ? Jul 18, 2014 07:01 |
|
shrughes posted:How many of your regexes can recognize the following phone number? Phone numbering systems change. For instance it used to be something like High Wycombe 4882 Then 0494 4882 Then they changed it to 0494 724882 Then 01494 724882 and they now are talking about changing it to something like 020 494 724882 now what next..... 1100 1001
|
# ? Jul 18, 2014 07:09 |
|
TheresaJayne posted:Phone numbering systems change. To add to the UK numbering poo poo, you can have three digit area codes to six digit area codes. And unless you're a mobile and/or living in Bournemouth, your the last bit of your number can't start with 0 or 1, something silly like that. edit: gently caress validating phone numbers, just call them.
|
# ? Jul 18, 2014 07:45 |
|
shrughes posted:Guess what folks, sometimes you've just got to validate phone numbers, and also email addresses, and regexes are a great way to do that. Email address validation with a regex doesn't get you anything. It's very likely you'll reject valid addresses, and likewise it's very likely users will input "valid" addresses that have typos in them. The way to validate an email address is to accept whatever the gently caress the user types and then send it an email with a validation link in it. Phone numbers? gently caress landlines, send them a validation SMS.
|
# ? Jul 18, 2014 09:32 |
|
Soricidus posted:Email address validation with a regex doesn't get you anything. It's very likely you'll reject valid addresses, and likewise it's very likely users will input "valid" addresses that have typos in them. The way to validate an email address is to accept whatever the gently caress the user types and then send it an email with a validation link in it. You're just repeating something that some retard wrote that you read somewhere. If you can't imagine a situation where you don't want to be sending emails, try thinking of ten of them. Then maybe you'll fall short and think of a few.
|
# ? Jul 18, 2014 09:53 |
|
The idea that you can validate a phone number with a regex remains outright silly and misguided. You will know if it's valid if you CALL it, at which point "same" is going to fail just as hard as something that matches the regex but fails with "this number is no longer connected". The situation for emails is slightly better because there's at least an RFC specifying the acceptable format. And indeed you could even check if you can resolve the MX for the domain specified. But knowing whether it'll actually works? The moment you can actually send the email.
|
# ? Jul 18, 2014 10:14 |
|
shrughes posted:You're just repeating something that some retard wrote that you read somewhere. If you can't imagine a situation where you don't want to be sending emails, try thinking of ten of them. Then maybe you'll fall short and think of a few. (And even in a situation like that, you still shouldn't be writing a regex. Use a library.)
|
# ? Jul 18, 2014 10:20 |
|
Skuto posted:The idea that you can validate a phone number with a regex remains outright silly and misguided. That's weird because I worked on software that validated phone numbers with a regex and it worked fine. Except it didn't work with "same". So we fixed it so that it worked properly with "same". Then it handled "same" properly. Skuto posted:You will know if it's valid if you CALL it, at which point "same" is going to fail just as hard as something that matches the regex but fails with "this number is no longer connected". But then their phone will ring, and you don't want to bother them. Skuto posted:The situation for emails is slightly better because there's at least an RFC specifying the acceptable format. The RFC is irrelevant. There's stuff outside the RFC that you should accept and stuff inside that you should reject.
|
# ? Jul 18, 2014 10:28 |
|
Soricidus posted:Tell us more about these situations where it is incredibly important that an email address be well-formed, but you don't give a gently caress whether or not it actually exists. Let's say you find a document with some text in it and you want to recognize email addresses. Let's say you've got an address book with some text in it and you want to recognize email addresses and also recognize which are the same email addresses from multiple people's address books. Soricidus posted:(And even in a situation like that, you still shouldn't be writing a regex. Use a library.) No, you should roll it yourself, because when a customer complains that you're not recognizing certain email addresses properly, you'll need to fix it. P.S. You'll need to recognize that "John Smith"@foo.com is the same person as johnsmith@foo.com. Also, two people with the same phone number aren't the same person if their phone numbers are "same". shrughes fucked around with this message at 10:34 on Jul 18, 2014 |
# ? Jul 18, 2014 10:31 |
|
shrughes posted:Let's say you find a document with some text in it and you want to recognize email addresses. shrughes posted:No, you should roll it yourself, because when a customer complains that you're not recognizing certain email addresses properly, you'll need to fix it.
|
# ? Jul 18, 2014 10:51 |
|
Soricidus posted:Tell us more about these situations where it is incredibly important that an email address be well-formed, but you don't give a gently caress whether or not it actually exists. Edit: never mind, the answer is your mom.
|
# ? Jul 18, 2014 10:56 |
|
Soricidus posted:So ... we're suddenly going to completely change the subject from input validation to entity extraction and data mining? Uh yeah, these are user inputs and you're validating them? User inputs an email into a computer system... you validate it? What the gently caress do you think we're talking about? Edit: Pro tip: Validating something means that you find it to be valid. "Is this a valid email address?" Why gee, don't mention all the situations where you can't just email the guy. That's different! Edit: Seriously, if you can't think of situations where you might want to know if some text is really a realistic email address... learn how to think. You're basically the furry in the alley right now, going full throttle with your lack of imagination. I'll give you some examples to get the ball started. Suppose your hot mother's writing an SMTP server and she asks you for help... Soricidus posted:Then use a library with a license that lets you modify or extend it. You'd have to be an idiot to wilfully reinvent the wheel when there are already dozens of open source and commercial libraries dedicated to the task. Let me let you in on an industry secret...: Libraries are crap. shrughes fucked around with this message at 11:09 on Jul 18, 2014 |
# ? Jul 18, 2014 10:58 |
|
shrughes posted:Uh yeah, these are user inputs and you're validating them? User inputs an email into a computer system... you validate it? What the gently caress do you think we're talking about? Those are different things. The use cases are sufficiently different that different methods make sense. For input validation, the simple & sure-fire way is to shoot an email to the address as written and include a validation link. For data mining, you'll need to decide how fuzzy your matching should be. Depending on the size/quality of the data set, it could be as simple as grabbing all tokens containing an @-symbol.
|
# ? Jul 18, 2014 12:34 |
|
shrughes posted:...with a regex and it worked fine. Except it didn't... No more needs to be said. The rest of your post(s) is trying to weasel out of recognizing validity and into recognizing what's reasonably certain to be a specific type of input.
|
# ? Jul 18, 2014 12:57 |
|
quote:But then their phone will ring, and you don't want to bother them. Then why do you care if it's valid phone number if you're not going to use it anyway? You don't - that's the whole point. You just care if it's likely to be intended to be a phone number and not something else.
|
# ? Jul 18, 2014 13:00 |
|
You people don't know what the word "valid" means.
|
# ? Jul 18, 2014 13:08 |
|
shrughes posted:Let's say you find a document with some text in it and you want to recognize email addresses. code:
|
# ? Jul 18, 2014 13:18 |
|
Shinku ABOOKEN posted:No they are not. The "...now you have two problems " meme is garbage. quote:A regex is basically a deterministic finite automaton (DFA) and your string is the tape it runs on. As simple as that. O ok
|
# ? Jul 18, 2014 13:22 |
|
Apparently "valid" means "it looks like a phone number to the special snowflake program I wrote", and is completely unrelated to whether or not it is something you could enter into a telephone somewhere in the world and be connected to another telephone system.
|
# ? Jul 18, 2014 13:25 |
|
I don't know much about how the internals of phone cable switch stations work but isn't the best way to solve this debate to look at how they work? If you call a number it gets routed based on the area code and such; so does it: A: Use an internal system to check if an area code/number is valid before routing B: Try to route regardless and return an "error code" if it's not a valid location to route to Although yeah shrughes' point on "valid" meaning "technically a possible real phone number" is true. You can't actually confirm anything about a phone number other than "does it fit to a schema" and even then the schemas vary so much that why bother. A phone number might not even belong to anyone/be in service and no coding in the world can tell if that's true or not (short of a system that automatically called the number but that's horrible and still not really a solution). Emails have the same problem too to a much lesser extent, though I think all emails require a '@' then a '.' with at least one character to all sides of each but I might be wrong. Either way in that case it's best to send a verification mail and to make the user enter it twice to avoid typos and make sure someone/they actually own it, rather than trying to see if it conforms to your "standard".
|
# ? Jul 18, 2014 13:37 |
|
Jewel posted:You can't actually confirm anything about a phone number other than "does it fit to a schema" and even then the schemas vary so much that why bother. A phone number might not even belong to anyone/be in service and no coding in the world can tell if that's true or not (short of a system that automatically called the number but that's horrible and still not really a solution). I agree with this (note that you're just repeating what was already said), but I don't see what it has to do with the sentence just before it. What do you think "technically a possible real phone number" looks like that makes you believe you can 100% reliably identify it *and* not get false positives? quote:Emails have the same problem too to a much lesser extent, though I think all emails require a '@' then a '.' with at least one character to all sides of each but I might be wrong. No need for the '.'. Try mailing root@localhost.
|
# ? Jul 18, 2014 13:53 |
|
quote:I don't know much about how the internals of phone cable switch stations work but isn't the best way to solve this debate to look at how they work? Sure, which model from which manufacturer do you want to look at? :-) If you think this is a tractable problem in the real world then please read "falsehoods programmers believe about names" and "falsehoods programmers believe about time" and imagine you could probably write a "falsehoods programmers believe about phone numbers" and "falsehoods programmers believe about addresses". Edit: 1) http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/ (f;b vvvv) 2) https://code.google.com/p/libphonenumber/ Hiowf fucked around with this message at 14:11 on Jul 18, 2014 |
# ? Jul 18, 2014 14:02 |
|
(?s).* Problem solved.
|
# ? Jul 18, 2014 14:03 |
|
Skuto posted:Sure, which model from which manufacturer do you want to look at? :-) http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/ e: "From where the Chinese restaurant used to be, two blocks down, half a block toward the lake, next door to the house where the yellow car is parked, Managua, Nicaragua." Carthag Tuek fucked around with this message at 14:10 on Jul 18, 2014 |
# ? Jul 18, 2014 14:06 |
|
All this talk about regexes and validation reminds me of me having an issue registering on kotaku or some other shitsite some time ago, where phone number was required, and me giving a correct one would still trigger off their javascript validation. In the end I've removed the bound events through console, but then, after submitting my data I've gotten 503 error, I could not log in and trying to use reset password triggered another 503. I guess somebody forgot about server side validation...
|
# ? Jul 18, 2014 14:08 |
|
You guys really can't imagine a validation scenario for phone numbers where you want to catch as many obvious typos (and instances where an area code was omitted) as possible and calling the number to validate it isn't acceptable? You're drinking somebody's kool-aid.pre:Name ________________ Address ________________ ________________ ________________ Emergency ________________ contact # ...
|
# ? Jul 18, 2014 14:22 |
|
This is a stupid argument and all you fuckers who have trouble validating phone numbers should just move to somewhere in the North American Numbering Plan Area. Remember that with phone numbers you actually do have access to an authority that says which ones are valid so if you actually care you can just pay a tiny fraction of a cent per query to ask.
|
# ? Jul 18, 2014 14:27 |
|
Edit: nevermind.
Volmarias fucked around with this message at 14:29 on Jul 18, 2014 |
# ? Jul 18, 2014 14:27 |
|
Paying someone whose entire business is knowing this poo poo is really the best way to go. Any reasonable person, even one who knows USPS Pub 28 by heart would think 'S Ave E' in Chicago is 'South Avenue East,' and they would be wrong.
|
# ? Jul 18, 2014 15:07 |
|
Alereon posted:This is a stupid argument and all you fuckers who have trouble validating phone numbers should just move to somewhere in the North American Numbering Plan Area. I think the second part of this sentence invalidates the first. (Note that you have to move your customers too) quote:Remember that with phone numbers you actually do have access to an authority that says which ones are valid so if you actually care you can just pay a tiny fraction of a cent per query to ask. Scales well for international websites. And tell me, does it work for extensions too in the USA? I'm a bit baffled that even after some discussion here people still think "I'll catch some typos" is a good trade-off against "I'll reject valid input", but I'll shut up now and just continue internally calling you stupid the next time my address, phone number or name gets rejected as invalid (all of which happen or used to happen all the loving time).
|
# ? Jul 18, 2014 15:23 |
|
Validation doesn't have to reject. It can be quite useful to simply warn that you selected US for your country but only have 9 digits in your phone number. I see sites periodically that warn if I mistype my email address as @gmali.com, and it's nice. (There are lots of times that you might want to be able to email someone or call them, but not do it right then. "Phone number at destination" for travel is one, but if you have someone enter home/work/mobile numbers are you going to ring them all? Adding someone to my address book is another; please don't mail my friend to say I did so.)
|
# ? Jul 18, 2014 16:12 |
|
Snapchat A Titty posted:http://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/ Yeah I got an address in Costa Rica that started with "From where the big tree used to be," and resigned myself to a cab at that point. The chip facility that processed millions of dollars worth of equipment a year had official stationery that listed it's address as "500 meters west of the Firestone plant" because that was visible from the highway. There's generally a local hanging around the landmarks who will dereference the back half in exchange for a conversation.
|
# ? Jul 18, 2014 16:16 |
|
|
# ? May 17, 2024 16:41 |
|
Subjunctive posted:Validation doesn't have to reject. It can be quite useful to simply warn that you selected US for your country but only have 9 digits in your phone number. I see sites periodically that warn if I mistype my email address as @gmali.com, and it's nice. That's a decent solution. I'd guess continuously getting warnings "your name is not valid" is annoying, but not nearly as much as just refusing to post the form entirely. And it also avoids being totally blocked when the form can't accept your address, but when you mutilate it into the shape that the JS expects, you get a rejection because the back-end actually checks an address database and doesn't find a match [1]. I've had this happen. [1] This is also a stupid idea if there's a remote possibility the database isn't 100% up to date.
|
# ? Jul 18, 2014 16:32 |