Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
sarehu
Apr 20, 2007

(call/cc call/cc)
Use .NET regexes.

Adbot
ADBOT LOVES YOU

JewKiller 3000
Nov 28, 2006

by Lowtax

no. not cool. go is bad

not as bad as javascript though

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



gonadic io posted:

don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/')

like what validation are you doing anyway? with email addresses for example imo the most you should care about is that it has an "@" and then a "." after the @. even that is probably wrong honestly.

very much this ^

email addresses are NP hard lol

imo just send off a url for the user to click before accepting the email as valid

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



also dont have too many fields cause youre bound to create a situation where theyre impossible to fill out correctly for some subset of users

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

Powaqoatse posted:

email addresses are NP hard lol

really? I'm willing to believe you but my reduction muscles are highly atrophied and my searching has failed

Gazpacho
Jun 18, 2004

by Fluffdaddy
Slippery Tilde

CommunistPancake posted:

he is the go guy
wish he would !! lmao

Vanadium
Jan 8, 2005

I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless.

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



pokeyman posted:

really? I'm willing to believe you but my reduction muscles are highly atrophied and my searching has failed

nah it was a joke but email addresses can get surprisingly weird so trying to validate too much on them is a fools errand

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

Vanadium posted:

I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless.

epics in house custom code searcher is a million times better than anything I've found for github or java in general and that's p lol to me

AWWNAW
Dec 30, 2008

sarehu posted:

Use .NET regexes.

I like how they added timeout arguments to their match methods haha

Cybernetic Vermin
Apr 18, 2005

tef posted:

us:

there is posix syntax, pcre syntax, and countless variants

pcre won, posix lost


you:

so i read this webpage about why they used re2 and missed the point, it was about running arbitrary code from users

it was slower, sure, but, like it was faster on pathological cases, a bit like when someone writes a really bad for loop

you guys are being very adamant about this with a pretty confusing set of arguments. who has mentioned (posix) syntax at all? everything is indeed perl-derived syntax and perl-like semantics, which is fine, not least since posix semtantics is actually quite inconsistently interpreted and implemented (the standard is unfortunately a bit unclear at one point, and some have taken it as a chance to do a perl/posix mix)

exponential cases are indeed not *that* common, but, on the one hand: what is the argument for marching into a situation where the potential for them exist? (it is *not* easy to tell ahead of time if the potential for exponential matching time is there); and, on the other hand: quadratic/cubic matching time issues are fairly common, and are fairly easily triggered by accident in practice (e.g. a subexpression of the form (E|F) where E and F have a string in common, repeated in some way, any failing match containing instances of such a string will be retried for every occurrence that can be matched against that subgroup)

in a strange use of money (not mine) obsessing about these issues is my day job, my current research grant is in practical parser use, which has ended up being used to do a bit of diving in practical implementations (to e.g. discuss such performance aspects and semantics differences), a bit of data mining and static analysis of expressions, and then algorithm construction in support of such efforts. one of the annoying happy discoveries early on in the research direction was how cox had beaten us to a lot of our ideas in re2, so i am indeed a bit enamored with it


AWWNAW posted:

I like how they added timeout arguments to their match methods haha

somewhat disingenuously i have skipped mentioning that pcre does this too by default, it takes 10 million "steps" then just errors out if it hasn't completed matching (can be overridden with PCRE_EXTRA_MATCH_LIMIT). which seems safer, but is hardly a complete solution to the problem

Athas
Aug 6, 2007

fuck that joker

Cybernetic Vermin posted:

in a strange use of money (not mine) obsessing about these issues is my day job, my current research grant is in practical parser use, which has ended up being used to do a bit of diving in practical implementations (to e.g. discuss such performance aspects and semantics differences), a bit of data mining and static analysis of expressions, and then algorithm construction in support of such efforts. one of the annoying happy discoveries early on in the research direction was how cox had beaten us to a lot of our ideas in re2, so i am indeed a bit enamored with it

Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance.

Cybernetic Vermin
Apr 18, 2005

Athas posted:

Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance.

Oh, met one of the Kleenex guys at a conference this summer actually! I must admit that I don't know the software very well beyond that presentation and a bit of chat over lunch, but I came away with a very positive impression of the effort. The theory appears sound (in that they have picked up something established and carefully shape it into doing what they need) and what it actually does seems useful and interesting. Going by their numbers it indeed appears extremely fast, though with a heavy-weight compilation approach.

The ambitious compilation approach they take ended up the main chat subject, it does not appear to scale great; they construct a deterministic transducer which is then compiled into C to be compiled down further, but then a sufficiently nasty Kleenex program causes the transducer and then the C code to explode in a way which may make things impractical (e.g. an example given had a small program produce 130 kloc of C). It looks very well thought out though, so I wouldn't be surprised if they find ways to keep things under some control (perhaps not realize sufficiently dense parts of the automaton in C, rather falling back on a lookup table). One of those projects which likely lives and dies with the team getting research funding though, so fingers crossed they land more money.

gonadic io
Feb 16, 2011

>>=

Vanadium posted:

I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless.

counterpoint: if you want to search for .*.*.*.*.* then do that on your own cpu.

redleader
Aug 18, 2005

Engage according to operational parameters

gonadic io posted:

don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/')

like what validation are you doing anyway? with email addresses for example imo the most you should care about is that it has an "@" and then a "." after the @. even that is probably wrong honestly.



Ralith posted:

try to use it, and return an error if it doesn't work

yeah, i guess this makes sense. i couldn't really come up with any counterexamples that would really need a regex. i blame my brain damage on our lovely codebase

gonadic io
Feb 16, 2011

>>=
It's fairly common to do bullshit like that, partly because it's not so bad if 99% of your customers are in an English first language country and have ascii names. For example at work we take the user's name and split it on the first space char to make first and last name. We also force the names to have first letter upper case and all others lowercase which especially pisses me off because that means that my name is Mcgonadic instead of the proper McGonadic

gonadic io
Feb 16, 2011

>>=
See also: timezone assumptions, address assumptions (including postcode ones)

We had fun with a customer without a street name - their small village's houses were just numbered all together. In the end we just got them to put the village name into the street field too

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.
pretty sure email doesn't need a . after the @

Workaday Wizard
Oct 23, 2009

by Pragmatica

Athas posted:

Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance.

is this made by the tissue company or is someone is trolling for a lawsuit

JawnV6
Jul 4, 2004

So hot ...

Cybernetic Vermin posted:

yeah, sorry for trollish reply, the conversation i was in was responding to a post saying "just use pcre" (with little context other that "how to regex?!") noting it has pitfalls and suggesting re2. i really do not care a lot what your increasingly miniscule usecase is

see, i'd believe you if you hadn't invested a lot of time maligning me with the mongo ref and constructing elaborate fantasies where regexes are the wrong thing no matter what implementation is in use, but sure, you're totally above it lol

anyway an old friend showed up and i wanted to introduce y'all again A Comprehensive Study of Real-World Numerical Bug Characteristics

akadajet
Sep 14, 2003

JewKiller 3000 posted:

no. not cool. go is bad

not as bad as javascript though

go away "jewkiller"

Xarn
Jun 26, 2015

gonadic io posted:

counterpoint: if you want to search for .*.*.*.*.* then do that on your own cpu.

Counter-Counterpoint: you can do it in very, very, very minimal space and O(N) where N is the size of corpus.

hackbunny
Jul 22, 2007

I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

Cybernetic Vermin posted:

who has mentioned (posix) syntax at all?

maybe don't barge in on a discussion you haven't been following

Cybernetic Vermin
Apr 18, 2005

hackbunny posted:

maybe don't barge in on a discussion you haven't been following

uh, i apologize, i must have missed it, legit point me to this part of the discussion

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Cybernetic Vermin posted:

uh, i apologize, i must have missed it, legit point me to this part of the discussion

Just backtrack until you find it.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

NihilCredo posted:

Just backtrack until you find it.

JawnV6
Jul 4, 2004

So hot ...

NihilCredo posted:

Just backtrack until you find it.

Workaday Wizard
Oct 23, 2009

by Pragmatica

NihilCredo posted:

Just backtrack until you find it.

Shaggar
Apr 26, 2006

NihilCredo posted:

Just backtrack until you find it.

heh

Athas
Aug 6, 2007

fuck that joker

Shinku ABOOKEN posted:

is this made by the tissue company or is someone is trolling for a lawsuit

What would the lawsuit be about? Nobody is making any money here.

CPColin
Sep 9, 2003

Big ol' smile.
You don't have to make money to be sued for infringing on a trademark.

ulmont
Sep 15, 2010

IF I EVER MISS VOTING IN AN ELECTION (EVEN AMERICAN IDOL) ,OR HAVE UNPAID PARKING TICKETS, PLEASE TAKE AWAY MY FRANCHISE

CPColin posted:

You don't have to make money to be sued for infringing on a trademark.

You do, however, have to be causing a likelihood of consumer confusion or at least a likelihood of diluting or tarnishing a trademark.

tef
May 30, 2004

-> some l-system crap ->

NihilCredo posted:

Just backtrack until you find it.

redleader
Aug 18, 2005

Engage according to operational parameters

leper khan posted:

pretty sure email doesn't need a . after the @

yeah, but gently caress the people who try that

since google bought the .google tld, some employees are apparently starting to use name@google addresses. they deserve everything they get

FamDav
Mar 29, 2008

redleader posted:

yeah, but gently caress the people who try that

since google bought the .google tld, some employees are apparently starting to use name@google addresses. they deserve everything they get

less than they wanted?

hackbunny
Jul 22, 2007

I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

Cybernetic Vermin posted:

uh, i apologize, i must have missed it, legit point me to this part of the discussion

the discussion was iirc about the many frustratingly similar but incompatible re syntaxes, and it's 100% true: sed has one, grep another, javascript, icu etc. someone proposed, in the usual hyperbolic yospos style, pcre as "the" standard. the word "pcre" triggered you and the rest is history

tef
May 30, 2004

-> some l-system crap ->

hackbunny posted:

the discussion was iirc about the many frustratingly similar but incompatible re syntaxes, and it's 100% true

and most of them are 'predates pcre' or 'like pcre for most of it'

hackbunny
Jul 22, 2007

I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

hackbunny posted:

someone proposed, in the usual hyperbolic yospos style,

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

NihilCredo posted:

Just backtrack until you find it.

Adbot
ADBOT LOVES YOU

ozymandOS
Jun 9, 2004

NihilCredo posted:

Just backtrack until you find it.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply