|
tef posted:rob pike: "Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)" well, n here is on the level of 50 for pcre to in practice never halt, where re2, a simpler and better planned library, would run in linear time (well, in |E|*|w| for an expression E and string w). i also note that we are talking about libraries here, and i think the pike quote is best applied as "since importing pcre takes one more keypress than re2 it is best to stick with re2 until pcre proves necessary" JawnV6 posted:also yeah ive never had to put a regex in some perf critical CDN control plane but for mashing text from a data sheet into friendly #def's with vim/sed they're needs-suiting as heck and ive never had to care if it was exponentially matching MODULE_ or not it is not that i particularly care whatever people are doing in their editors, but suggesting pcre as the first choice among regex libraries without qualifications is pretty bad, since it with few other benefits if nothing else opens software up to denial of service attacks if applied to user input. even without expectations of malicious inputs it is pretty silly to spend the cycles if you are doing anything new and can go re2
|
# ? Sep 19, 2017 10:15 |
|
|
# ? May 17, 2024 08:01 |
|
dont regex user input ever, that's insane no matter what lib imo
|
# ? Sep 19, 2017 11:52 |
|
as a regular regex abuserPowaqoatse posted:dont regex ever
|
# ? Sep 19, 2017 12:02 |
|
regexes are for loving around with a bunch of text on your local machine, not for automatic parsing/handling/extraction of anything ever
|
# ? Sep 19, 2017 12:55 |
|
but how else will i scrape web pages??
|
# ? Sep 19, 2017 12:56 |
|
MALE SHOEGAZE posted:Java packages and golang packages are very different so the scoping rules arent really comparable. One major difference is that functions and variables are allowed at the top level of a golang package. also, packages will be initialized once at runtime, and have mutable state, so it's more accurate to think of them as objects (this is why you can't import a single identifier from a go package -- that identifier is tied implicitly to the package because it might rely on the state of the package, so it doesn't make sense to be able to import a single identifier). idk i haven't run into any of these issues. maybe make your packages smaller and more specific?
|
# ? Sep 19, 2017 13:38 |
|
every time the discussion comes up about pcre vs re2 I can't help but think that I've never run into the problems that PCRE has in actual practice. the above posts made me realize why...the idea of exposing regexes to my users seems scary and bad so I never do it.
|
# ? Sep 19, 2017 14:00 |
|
idk though, regexes are pretty good for some initial filtering of strings, and it is certainly preferable in many situations to doing a lot of messing about by hand. it would be nice if people could be brought onto one of the nicer ways of processing text, but history suggests that regexes are one of those that programmers by and large like and use where a lot of other things have fallen by the wayside either way, there are sort of three choices: 1) don't use regexes, 2) use whatever library you want but take a bit of care in how/where they are used, 3) use re2 or similarly predictable library (dfa-compiling ones are often a good choice really) i think 3 is fine, and at any rate should be where unqualified library picking advice ought to head
|
# ? Sep 19, 2017 14:29 |
|
tef posted:rob pike: "Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)" rob pike is a big dumb idiot
|
# ? Sep 19, 2017 15:34 |
|
if n were usually small we wouldn't use computer
|
# ? Sep 19, 2017 16:03 |
|
gonadic io posted:but how else will i scrape web pages?? very carefully
|
# ? Sep 19, 2017 16:09 |
|
obligatory https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
|
# ? Sep 19, 2017 16:22 |
|
Powaqoatse posted:obligatory https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 the worst kind of pedantic poo poo OP never asked about "parsing" html, some chuffed up nerd decided they wanted to drop unrelated truth bombs now it's shared all over as if missing the point for a sweet dunk is admirable
|
# ? Sep 19, 2017 16:24 |
|
Cybernetic Vermin posted:well, n here is on the level of 50 for pcre to in practice never halt like have you ever hit this in production? or is it the same toy case i've seen a million times that has zero bearing on regex in practice have you heard grep sells u out if you have lots of partial matches, check out what this fish has to say Cybernetic Vermin posted:it is not that i particularly care whatever people are doing in their editors, but suggesting pcre as the first choice among regex libraries without qualifications is pretty bad, since it with few other benefits if nothing else opens software up to denial of service attacks if applied to user input. even without expectations of malicious inputs it is pretty silly to spend the cycles if you are doing anything new and can go re2
|
# ? Sep 19, 2017 16:28 |
|
JawnV6 posted:the worst kind of pedantic poo poo Like...I'm not sure they missed the point instead of just offering information. sure, SO answers are supposed to be strict answers to the question, but that's rarely the actual case. it's widely shared because its funny
|
# ? Sep 19, 2017 16:40 |
|
JawnV6 posted:like have you ever hit this in production? or is it the same toy case i've seen a million times that has zero bearing on regex in practice yeah, sure, parse your data with pcre, stick it in mongodb, and glue it all together with bash scripts. i am not sure *why* you are really set on making poor choices, but variety certainly is the spice of life~ e: eh, to not just be a troll: it is way more natural patterns than you may expect that trip these things up, but that turns into a much bigger discussion. use what you please. it is rare that there are huge issues, but i maintain that re2 is a good piece of software to suggest to people, much more systematic engineering and no traps like that Cybernetic Vermin fucked around with this message at 18:35 on Sep 19, 2017 |
# ? Sep 19, 2017 18:12 |
So what if I really want the ability to use negative lookahead / lookbehind (which it looks like re2 does not support), but I don't need full on recursive PCRE? Should I use like oniguruma or something? I know very little about the differences between regex engines.
|
|
# ? Sep 19, 2017 19:47 |
|
no no, the backtracking issues are endemic and re2 is the odd man out avoiding them without giving up a *lot* of features, so if you need something that re2 does not do then pcre is likely a good choice. main issue with it other than that is it has adopted a lot of the weird perl features (e.g. (*PRUNE)), in many cases with very subtle semantics differences from perl. so one may be wise to be a bit restrictive with what one uses as there are bound to be some strange stuff on the fringes of the library
|
# ? Sep 19, 2017 19:55 |
|
The issues are a lot less too if you precompile all you regex and NEVER EVER let users specify the regex itself
|
# ? Sep 19, 2017 20:08 |
|
Cybernetic Vermin posted:yeah, sure, parse your data with pcre, stick it in mongodb, and glue it all together with bash scripts. i am not sure *why* you are really set on making poor choices, but variety certainly is the spice of life~ again, you're the one presuming a shitload about where i'm jamming these? are you actually running sed scripts as a part of your production workflow or needlessly maligning a console tool based on your ridiculous preconceptions the use cases I describe don't amount to "put pcre in front of web client users and run them against arbitrary input before jamming output into other terrible choices" but you're so stuck on that irrelevant point that i'm not sure how to ground you back in reality
|
# ? Sep 19, 2017 20:19 |
|
Thermopyle posted:every time the discussion comes up about pcre vs re2 I can't help but think that I've never run into the problems that PCRE has in actual practice. I have actually ran into non-linear performance of PCRE once, when I wrote what I thought was a fairly simple regex (ie no backrefs) and then had it run over all .txt files in specific folder (and its subfolders). I didn't run into the worst case exponential behaviour, but since the corpus was fairly large I was not amused either. JawnV6 posted:have you heard grep sells u out if you have lots of partial matches, check out what this fish has to say It sells you out a bit, but keeps the same asymptotic complexity. PCRE doesn't even attempt to.
|
# ? Sep 19, 2017 20:21 |
|
Thermopyle posted:Like...I'm not sure they missed the point instead of just offering information. sure, SO answers are supposed to be strict answers to the question, but that's rarely the actual case. if he was just sharing info he wouldn't be such a huge rear end about it.
|
# ? Sep 19, 2017 20:24 |
|
cis autodrag posted:if he was just sharing info he wouldn't be such a huge rear end about it. i've never read that post as being an rear end, just being funny but regardless i wasn't trying to say it was a pure emotionless info dump, only that it didn't have to be a direct answer to the question there
|
# ? Sep 19, 2017 20:28 |
|
yeah i'm not sure I agree with the rear end in a top hat readings. it's just a bit tongue-in-cheek.
DONT THREAD ON ME fucked around with this message at 20:35 on Sep 19, 2017 |
# ? Sep 19, 2017 20:32 |
|
JawnV6 posted:again, you're the one presuming a shitload about where i'm jamming these? are you actually running sed scripts as a part of your production workflow or needlessly maligning a console tool based on your ridiculous preconceptions yeah, sorry for trollish reply, the conversation i was in was responding to a post saying "just use pcre" (with little context other that "how to regex?!") noting it has pitfalls and suggesting re2. i really do not care a lot what your increasingly miniscule usecase is regexes have a bad reputation which is to a great extent caused by bad uses and bad implementations, but going "it is fine to use x for regex matching because you shouldn't use regex matching (for y) anyway" is a pretty drat lovely advice when applied in general without a specific y in sight if you have a well judged fragment as your actual case that's fine, good luck to you
|
# ? Sep 19, 2017 20:35 |
|
MALE SHOEGAZE posted:yeah i'm not sure I agree with the rear end in a top hat readings. it's just a bit tongue-in-cheek. is that like reading tea leaves?
|
# ? Sep 19, 2017 20:58 |
|
MALE SHOEGAZE posted:Java packages and golang packages are very different so the scoping rules arent really comparable. One major difference is that functions and variables are allowed at the top level of a golang package. quote:also, packages will be initialized once at runtime, and have mutable state, so it's more accurate to think of them as objects quote:the end result of this is that go makes it very difficult to organize your code, which is especially an issue in a language that has no generics and demands that you have lots of functions named add_int32, add_int64, etc. quote:add issues with cyclic dependencies on top of this, and you have a real poo poo show. at least in my experience. i was clearly not arranging my go code properly because i had lots of problems.
|
# ? Sep 19, 2017 21:48 |
|
JawnV6 posted:the worst kind of pedantic poo poo To say nothing of the fact that XHTML open tags are strictly regular and can be parsed very easily using a regular expression.
|
# ? Sep 19, 2017 22:28 |
|
If it's valid xhtml just use an xml parser
|
# ? Sep 19, 2017 22:31 |
|
never hit it in prod, but i found our email validation regex was badly constructed when a unit test took 15 minutes to run
|
# ? Sep 19, 2017 22:45 |
|
if not regex, then what should i use for user input validation? or is this a question for the tp thread
|
# ? Sep 19, 2017 22:47 |
|
redleader posted:if not regex, then what should i use for user input validation? or is this a question for the tp thread don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/') like what validation are you doing anyway? with email addresses for example imo the most you should care about is that it has an "@" and then a "." after the @. even that is probably wrong honestly.
|
# ? Sep 19, 2017 22:54 |
|
cis autodrag posted:if he was just sharing info he wouldn't be such a huge rear end about it. no jokes allowed on the internet
|
# ? Sep 19, 2017 23:02 |
|
redleader posted:if not regex, then what should i use for user input validation? or is this a question for the tp thread
|
# ? Sep 19, 2017 23:03 |
|
Cybernetic Vermin posted:yeah, sorry for trollish reply, the conversation i was in was responding to a post saying "just use pcre" (with little context other that "how to regex?!") noting it has pitfalls and suggesting re2. i really do not care a lot what your increasingly miniscule usecase is try posting on hacker news
|
# ? Sep 20, 2017 01:28 |
|
us: there is posix syntax, pcre syntax, and countless variants pcre won, posix lost you: so i read this webpage about why they used re2 and missed the point, it was about running arbitrary code from users it was slower, sure, but, like it was faster on pathological cases, a bit like when someone writes a really bad for loop
|
# ? Sep 20, 2017 01:30 |
|
Shaggar posted:rob pike is a big dumb idiot who the gently caress is rob pike?
|
# ? Sep 20, 2017 01:30 |
|
oh...
|
# ? Sep 20, 2017 01:31 |
|
he is the go guy
|
# ? Sep 20, 2017 01:35 |
|
|
# ? May 17, 2024 08:01 |
|
CommunistPancake posted:he is the go guy cool
|
# ? Sep 20, 2017 01:36 |