|
Use .NET regexes.
|
# ? Sep 20, 2017 01:51 |
|
|
# ? May 29, 2024 12:45 |
|
akadajet posted:cool no. not cool. go is bad not as bad as javascript though
|
# ? Sep 20, 2017 02:49 |
|
gonadic io posted:don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/') very much this ^ email addresses are NP hard lol imo just send off a url for the user to click before accepting the email as valid
|
# ? Sep 20, 2017 04:42 |
|
also dont have too many fields cause youre bound to create a situation where theyre impossible to fill out correctly for some subset of users
|
# ? Sep 20, 2017 04:46 |
|
Powaqoatse posted:email addresses are NP hard lol really? I'm willing to believe you but my reduction muscles are highly atrophied and my searching has failed
|
# ? Sep 20, 2017 04:53 |
|
CommunistPancake posted:he is the go guy
|
# ? Sep 20, 2017 04:58 |
|
I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless.
|
# ? Sep 20, 2017 05:31 |
|
pokeyman posted:really? I'm willing to believe you but my reduction muscles are highly atrophied and my searching has failed nah it was a joke but email addresses can get surprisingly weird so trying to validate too much on them is a fools errand
|
# ? Sep 20, 2017 05:35 |
|
Vanadium posted:I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless. epics in house custom code searcher is a million times better than anything I've found for github or java in general and that's p lol to me
|
# ? Sep 20, 2017 05:39 |
|
sarehu posted:Use .NET regexes. I like how they added timeout arguments to their match methods haha
|
# ? Sep 20, 2017 05:45 |
|
tef posted:us: you guys are being very adamant about this with a pretty confusing set of arguments. who has mentioned (posix) syntax at all? everything is indeed perl-derived syntax and perl-like semantics, which is fine, not least since posix semtantics is actually quite inconsistently interpreted and implemented (the standard is unfortunately a bit unclear at one point, and some have taken it as a chance to do a perl/posix mix) exponential cases are indeed not *that* common, but, on the one hand: what is the argument for marching into a situation where the potential for them exist? (it is *not* easy to tell ahead of time if the potential for exponential matching time is there); and, on the other hand: quadratic/cubic matching time issues are fairly common, and are fairly easily triggered by accident in practice (e.g. a subexpression of the form (E|F) where E and F have a string in common, repeated in some way, any failing match containing instances of such a string will be retried for every occurrence that can be matched against that subgroup) in a strange use of money (not mine) obsessing about these issues is my day job, my current research grant is in practical parser use, which has ended up being used to do a bit of diving in practical implementations (to e.g. discuss such performance aspects and semantics differences), a bit of data mining and static analysis of expressions, and then algorithm construction in support of such efforts. one of the annoying happy discoveries early on in the research direction was how cox had beaten us to a lot of our ideas in re2, so i am indeed a bit enamored with it AWWNAW posted:I like how they added timeout arguments to their match methods haha somewhat disingenuously i have skipped mentioning that pcre does this too by default, it takes 10 million "steps" then just errors out if it hasn't completed matching (can be overridden with PCRE_EXTRA_MATCH_LIMIT). which seems safer, but is hardly a complete solution to the problem
|
# ? Sep 20, 2017 08:39 |
|
Cybernetic Vermin posted:in a strange use of money (not mine) obsessing about these issues is my day job, my current research grant is in practical parser use, which has ended up being used to do a bit of diving in practical implementations (to e.g. discuss such performance aspects and semantics differences), a bit of data mining and static analysis of expressions, and then algorithm construction in support of such efforts. one of the annoying happy discoveries early on in the research direction was how cox had beaten us to a lot of our ideas in re2, so i am indeed a bit enamored with it Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance.
|
# ? Sep 20, 2017 08:51 |
|
Athas posted:Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance. Oh, met one of the Kleenex guys at a conference this summer actually! I must admit that I don't know the software very well beyond that presentation and a bit of chat over lunch, but I came away with a very positive impression of the effort. The theory appears sound (in that they have picked up something established and carefully shape it into doing what they need) and what it actually does seems useful and interesting. Going by their numbers it indeed appears extremely fast, though with a heavy-weight compilation approach. The ambitious compilation approach they take ended up the main chat subject, it does not appear to scale great; they construct a deterministic transducer which is then compiled into C to be compiled down further, but then a sufficiently nasty Kleenex program causes the transducer and then the C code to explode in a way which may make things impractical (e.g. an example given had a small program produce 130 kloc of C). It looks very well thought out though, so I wouldn't be surprised if they find ways to keep things under some control (perhaps not realize sufficiently dense parts of the automaton in C, rather falling back on a lookup table). One of those projects which likely lives and dies with the team getting research funding though, so fingers crossed they land more money.
|
# ? Sep 20, 2017 09:09 |
|
Vanadium posted:I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless. counterpoint: if you want to search for .*.*.*.*.* then do that on your own cpu.
|
# ? Sep 20, 2017 09:10 |
|
gonadic io posted:don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/') Ralith posted:try to use it, and return an error if it doesn't work yeah, i guess this makes sense. i couldn't really come up with any counterexamples that would really need a regex. i blame my brain damage on our lovely codebase
|
# ? Sep 20, 2017 11:36 |
|
It's fairly common to do bullshit like that, partly because it's not so bad if 99% of your customers are in an English first language country and have ascii names. For example at work we take the user's name and split it on the first space char to make first and last name. We also force the names to have first letter upper case and all others lowercase which especially pisses me off because that means that my name is Mcgonadic instead of the proper McGonadic
|
# ? Sep 20, 2017 12:00 |
|
See also: timezone assumptions, address assumptions (including postcode ones) We had fun with a customer without a street name - their small village's houses were just numbered all together. In the end we just got them to put the village name into the street field too
|
# ? Sep 20, 2017 12:02 |
|
pretty sure email doesn't need a . after the @
|
# ? Sep 20, 2017 16:23 |
|
Athas posted:Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance. is this made by the tissue company or is someone is trolling for a lawsuit
|
# ? Sep 20, 2017 16:53 |
|
Cybernetic Vermin posted:yeah, sorry for trollish reply, the conversation i was in was responding to a post saying "just use pcre" (with little context other that "how to regex?!") noting it has pitfalls and suggesting re2. i really do not care a lot what your increasingly miniscule usecase is see, i'd believe you if you hadn't invested a lot of time maligning me with the mongo ref and constructing elaborate fantasies where regexes are the wrong thing no matter what implementation is in use, but sure, you're totally above it lol anyway an old friend showed up and i wanted to introduce y'all again A Comprehensive Study of Real-World Numerical Bug Characteristics
|
# ? Sep 20, 2017 17:14 |
|
JewKiller 3000 posted:no. not cool. go is bad go away "jewkiller"
|
# ? Sep 20, 2017 17:55 |
|
gonadic io posted:counterpoint: if you want to search for .*.*.*.*.* then do that on your own cpu. Counter-Counterpoint: you can do it in very, very, very minimal space and O(N) where N is the size of corpus.
|
# ? Sep 20, 2017 18:05 |
|
Cybernetic Vermin posted:who has mentioned (posix) syntax at all? maybe don't barge in on a discussion you haven't been following
|
# ? Sep 20, 2017 18:33 |
|
hackbunny posted:maybe don't barge in on a discussion you haven't been following uh, i apologize, i must have missed it, legit point me to this part of the discussion
|
# ? Sep 20, 2017 18:40 |
|
Cybernetic Vermin posted:uh, i apologize, i must have missed it, legit point me to this part of the discussion Just backtrack until you find it.
|
# ? Sep 20, 2017 18:42 |
|
NihilCredo posted:Just backtrack until you find it.
|
# ? Sep 20, 2017 19:25 |
|
NihilCredo posted:Just backtrack until you find it.
|
# ? Sep 20, 2017 19:35 |
|
NihilCredo posted:Just backtrack until you find it.
|
# ? Sep 20, 2017 19:52 |
|
NihilCredo posted:Just backtrack until you find it. heh
|
# ? Sep 20, 2017 19:53 |
|
Shinku ABOOKEN posted:is this made by the tissue company or is someone is trolling for a lawsuit What would the lawsuit be about? Nobody is making any money here.
|
# ? Sep 20, 2017 20:21 |
|
You don't have to make money to be sued for infringing on a trademark.
|
# ? Sep 20, 2017 20:28 |
|
CPColin posted:You don't have to make money to be sued for infringing on a trademark. You do, however, have to be causing a likelihood of consumer confusion or at least a likelihood of diluting or tarnishing a trademark.
|
# ? Sep 20, 2017 21:09 |
|
NihilCredo posted:Just backtrack until you find it.
|
# ? Sep 20, 2017 21:18 |
|
leper khan posted:pretty sure email doesn't need a . after the @ yeah, but gently caress the people who try that since google bought the .google tld, some employees are apparently starting to use name@google addresses. they deserve everything they get
|
# ? Sep 20, 2017 21:52 |
|
redleader posted:yeah, but gently caress the people who try that less than they wanted?
|
# ? Sep 20, 2017 22:16 |
|
Cybernetic Vermin posted:uh, i apologize, i must have missed it, legit point me to this part of the discussion the discussion was iirc about the many frustratingly similar but incompatible re syntaxes, and it's 100% true: sed has one, grep another, javascript, icu etc. someone proposed, in the usual hyperbolic yospos style, pcre as "the" standard. the word "pcre" triggered you and the rest is history
|
# ? Sep 20, 2017 22:24 |
|
hackbunny posted:the discussion was iirc about the many frustratingly similar but incompatible re syntaxes, and it's 100% true and most of them are 'predates pcre' or 'like pcre for most of it'
|
# ? Sep 21, 2017 00:09 |
|
hackbunny posted:someone proposed, in the usual hyperbolic yospos style,
|
# ? Sep 21, 2017 00:59 |
|
NihilCredo posted:Just backtrack until you find it.
|
# ? Sep 21, 2017 02:40 |
|
|
# ? May 29, 2024 12:45 |
|
NihilCredo posted:Just backtrack until you find it.
|
# ? Sep 21, 2017 03:25 |