|
Empress Brosephine posted:ooo that's a good idea using Python to do it. Thanks! They are written in javascript.
|
# ? Apr 23, 2021 23:07 |
|
|
# ? Jun 10, 2024 03:44 |
|
Empress Brosephine posted:ooo that's a good idea using Python to do it. Thanks! Tangentially related but I've used regex101.com dozens of times so far for regex building stuff. Pretty useful site because of its ability to easily paste test cases
|
# ? Apr 30, 2021 03:02 |
|
CarForumPoster posted:Tangentially related but I've used regex101.com dozens of times so far for regex building stuff. Pretty useful site because of its ability to easily paste test cases +1 Regex101.com, I love that it can explain how the regular expression works as well. That was the piece I needed to be able to wrap my head around regex.
|
# ? Apr 30, 2021 16:54 |
|
I'm going to check that out, thanks
|
# ? Apr 30, 2021 22:34 |
|
FWIW 50+ times over ~4 years using regexs to accomplish something and I still don’t understand them well enough to write one from scratch. I just always Google->stack overflow->regex101 if needed-> test in code. It’s a crutch.
|
# ? May 1, 2021 04:00 |
|
CarForumPoster posted:FWIW 50+ times over ~4 years using regexs to accomplish something and I still don’t understand them well enough to write one from scratch. I just always Google->stack overflow->regex101 if needed-> test in code. I usually either write a simple one by hand then move to regex101 to test a bunch of edge cases and fix my inevitable mistakes or skip all that and jump to regex101 and use the sidebar to write a regex. If anyone tells you that they can write flawless regex without help us lying to you.
|
# ? May 1, 2021 04:12 |
|
No, I think a lot of people have learned regular expressions well enough to code them without help. That doesn't mean that they never make mistakes or that they know every detail of every extension in PCRE, but well enough to rattle off something basic like ^(?:\w+:\w+,)*\w+:\w+$, sure. It's just another programming language, one with at most 20 features you actually need to remember: \, ., \w, \d, \s, \S, [], [^], ^, $, (), ?:, |, *, *?, +, +?, maybe {}. I can never remember the lookahead or back-reference stuff, but some people really swear by it. Like anything else, it's hard to retain and build expertise if you're not using it for months at a time, but if you ever start needing it day-to-day, you'll learn it quickly enough. rjmccall fucked around with this message at 06:33 on May 1, 2021 |
# ? May 1, 2021 06:31 |
|
I can do pretty good, though I also enjoy regex crosswords so maybe I'm special. I could never remember what \b and \w do until I tried out the Execute Program regex course. Somehow that seared it into my brain. It teaches other parts of regexes too!
|
# ? May 1, 2021 07:01 |
|
I can never remember the language-specific extensions so I always have to look them up every time. for example the syntax for named subpatterns is different between C# and Python and I have to look it up whenever I use it in either. I should really write myself a cheat sheet or something I guess.
|
# ? May 1, 2021 11:10 |
|
Also I don't know whether the point was lost but regular expressions aren't a suitable solution for the general problem of extracting class names from an HTML page. I suggested that as a solution because it was a specific page that the poster wanted to do that to, and that page is unlikely to present false positives or if it does you could just work around them. If you had to do it for pages in general then you should use an HTML parsing library.
|
# ? May 1, 2021 11:12 |
|
The only time I have to usually deal with regex is with SSO, since the identity provider typically has usernames stored differently than the legacy systems I'm trying to integrate and I can use regex to manipulate the name (turn f.lastname into firstname.lastname). Doing this regex is an absolute nightmare, but it's possible. Typically I just use a different feature and map the names separately since I can't trust the customer's ldap anyways. Once in a while a guy comes along with a misspelled principle name, or he's lastname.f for some reason and the regex doesn't work.
|
# ? May 1, 2021 16:39 |
|
Step one of learning something is to stop trying to convince yourself that it's impossible to learn. Step two of learning regular expressions is probably to get in the habit of using them for find/replace in your editor of choice as often as possible. It'll be slower than whatever you're currently doing at first, but you won't get better at writing regexps without just spending time doing it.
|
# ? May 1, 2021 17:04 |
|
Jerk McJerkface posted:The only time I have to usually deal with regex is with SSO, since the identity provider typically has usernames stored differently than the legacy systems I'm trying to integrate and I can use regex to manipulate the name (turn f.lastname into firstname.lastname). Doing this regex is an absolute nightmare, but it's possible. Typically I just use a different feature and map the names separately since I can't trust the customer's ldap anyways. Once in a while a guy comes along with a misspelled principle name, or he's lastname.f for some reason and the regex doesn't work. oh man that doesn't sound like a good idea but you gotta do what you gotta do i guess
|
# ? May 2, 2021 05:49 |
|
I personally like https://regexr.com/ Add me to the list of people who use regex on a weekly basis and still need to look up basic poo poo.
|
# ? May 2, 2021 06:52 |
|
I like https://www.debuggex.com/ but I don't do anything particularly complicated.
|
# ? May 2, 2021 14:08 |
|
rjmccall posted:No, I think a lot of people have learned regular expressions well enough to code them without help. That doesn't mean that they never make mistakes or that they know every detail of every extension in PCRE, but well enough to rattle off something basic like ^(?:\w+:\w+,)*\w+:\w+$, sure. It's just another programming language, one with at most 20 features you actually need to remember: \, ., \w, \d, \s, \S, [], [^], ^, $, (), ?:, |, *, *?, +, +?, maybe {}. I can never remember the lookahead or back-reference stuff, but some people really swear by it. Regexs are like a programming language, although one with a syntax akin to brainfuck.
|
# ? May 2, 2021 15:14 |
Sure, regular expressions have a terse syntax, but the structure would be the same if you used a keyword-based syntax with whitespace between tokens and such, but it would also be so much more to write. Some regex syntaxes do allow whitespace to not be significant and even comments, but still use the common symbols for operators.
|
|
# ? May 2, 2021 15:47 |
|
Bruegels Fuckbooks posted:oh man that doesn't sound like a good idea but you gotta do what you gotta do i guess The issue here (which I agree is bad) is that I've been doing replatforming of a CMS that has an internal identity pool/structure/permissions already inside it. The replatforms typically also include using SSO w/SAML. It's not a hard technical task, since our SAML module is a standard open source one. The issue is that users used to access the client via their username inside the client. Typically firstname.lastname. However when they go to saml, usually it's the LDAP name or auth token from SAML that contains a different assertion ID, and I don't know what it is. Sometimes this is initial.lastname or it's from before they were married or divorced and the names don't match. Or in ldap they are just an ID number and their people name isn't used. In our latest release we just added an alias field to the internal user identity config. So they do a saml auth, and then the cms checks the assertion Id against the user saml alias field, and then impersonates the name of the actual user so the legacy config/permissions still work. We also provide an API that they can pipe names/identities etc into it. It works well now and there's no regex required.
|
# ? May 2, 2021 15:51 |
|
nielsm posted:Sure, regular expressions have a terse syntax, but the structure would be the same if you used a keyword-based syntax with whitespace between tokens and such, but it would also be so much more to write. Some regex syntaxes do allow whitespace to not be significant and even comments, but still use the common symbols for operators. My point is that with programming languages it's generally considered a good trade-off if the syntax is more readable but takes longer to write. Sometimes I think that attitude is taken too far, but certainly regexs have a reputation for being difficult to read, and I don't consider that surprising given the syntax.
|
# ? May 2, 2021 16:07 |
|
In a lot of cases the problem is that the logic you're trying to implement is inherently complicated. Consider the following regex for email address validation taken from the accepted answer to How to validate an email address using a regular expression?:code:
|
# ? May 2, 2021 16:39 |
|
Generally speaking, you're better off using a selection of multiple regular expressions combined with some actual code in your programming language of choice. Or a formal grammar and a parser generator. Knowing when it's no longer appropriate to write "a regex" as your solution is part of being able to use regexes effectively.
|
# ? May 2, 2021 23:36 |
|
You validate an email address by sending an email with a link to click IMO, but that's not a regex issue.
|
# ? May 3, 2021 16:00 |
|
Munkeymon posted:You validate an email address by sending an email with a link to click IMO, but that's not a regex issue. And this lets you turn your regex into something like \S+@\S+ which is much easier to remember!
|
# ? May 3, 2021 16:54 |
|
pokeyman posted:And this lets you turn your regex into something like \S+@\S+ which is much easier to remember! you can technically have spaces in an email address if you want. something like "My Butt"@somethingawful.com is a valid email address (the quotes are part of the address) I think .+@.+ or some variant like [^@]+(@[^@]*)*@[^@]+ would be ok but for all I know they're not, the rules for email addresses are extremely permissive. tbh if you really need to check whether an email address is well-formed you probably want to use a finite state machine, not a regular expression (because of the quoting rules); and if you want to check whether it is valid you should just try sending email to it. of course none of this stops you saying "gently caress your fancy-rear end email address that you only created for the sake of being technically correct, I'm arbitrarily disallowing it" and if you're in a position to do that then hell, you should.
|
# ? May 3, 2021 17:35 |
|
Hammerite posted:you can technically have spaces in an email address if you want. something like "My Butt"@somethingawful.com is a valid email address (the quotes are part of the address) Technically sure, but basically every email service out there prevents you from registering an email address with a space in it, so I'm not gonna support some weird rear end edge case. I think it makes sense to do some basic validations to make sure the input data looks like an email address, but beyond that implementing a regex that validates an email address down to the exact email address specifications doesn't really provide a tangible benefit. Anyway, you are obviously on the same page about dumb "technically correct for the sake of being technically correct" email addresses so I digress.
|
# ? May 3, 2021 17:45 |
|
my email address is @@@.at don't @ me
|
# ? May 3, 2021 17:47 |
|
Are there any programs or websites out there that will let me automatically diagram out IIF statements (and ideally work in reverse too)? I work with a dumb program that gives me access to limited customization options and that's the most used one, but following IIF statements 5/6 levels deep with multiple branches is a real pain.
|
# ? May 4, 2021 16:57 |
|
Hammerite posted:tbh if you really need to check whether an email address is well-formed you probably want to use a finite state machine, not a regular expression (because of the quoting rules) Regular expressions are a language for encoding finite state machines (weird backreference stuff aside)
|
# ? May 4, 2021 17:00 |
|
raminasi posted:Regular expressions are a language for encoding finite state machines (weird backreference stuff aside) The two representations are equivalent in power but sometimes one representation is considerably more simple than another. Think about the set of all binary strings that contain an even number of 1s and an odd number of 0s. That's very easy to describe as an FSM but the regex for it isn't quite as simple.
|
# ? May 4, 2021 17:42 |
|
raminasi posted:Regular expressions are a language for encoding finite state machines (weird backreference stuff aside) After looking on wikipedia, it turns out "finite state machine" has a more restrictive definition than I thought. What I thought qualified as a finite state machine is actually a "pushdown automaton". I thought that a finite-state machine could have finitely many variables (e.g. an integer telling you how many levels of nesting you have entered), apparently not? It seems weird to conceptualise that as being backed by a "stack" with symbols from the singleton alphabet but I guess that's what wiki is telling me.
|
# ? May 4, 2021 17:48 |
|
Finite state machines in an introductory theory class are very restricted--they're not even pushdown automata because they don't have stacks--but finite state machines in general are not. There are models for those as well but they don't get taught early on.
|
# ? May 4, 2021 17:52 |
|
ultrafilter posted:Finite state machines in an introductory theory class are very restricted--they're not even pushdown automata because they don't have stacks--but finite state machines in general are not. There are models for those as well but they don't get taught early on. so are you saying that "finite state machine" has 2 different meanings? how come? Wikipedia's article on "finite-state machine" defines them as being strictly less capable than pushdown automata. My degree is in maths and stats, all my cs knowledge I picked up as a hobbyist or later on the job, it doesn't come from introductory classes but it is limited in some areas.
|
# ? May 4, 2021 18:58 |
|
Formally, a finite state machine is characterized by storing a statically-bounded amount of information. A pushdown automata can store unbounded information because the stack can grow without limit. Programmers often say “finite state machine” for the design pattern of a finite control: a system that receives discrete events and switches on some enumerated internal state to decide how to respond, and where a response can include changing that enumerated state. It’s not meant to imply that the total amount of information stored is finite, though; and formally, even Turing machines are built around a finite control.
|
# ? May 4, 2021 19:39 |
A finite state machine definitely can't have any variables, other than the current state. If you want one with variables that all have a finite set of valid values then sure, you can build a multidimensional matrix with each variable in one dimension, then enumerate all positions in the matrix into new state values (in a single dimension), and have a stupidly complex state machine. At least as a mathematical model. If you want one with variables that have a (conceptually) infinite range of valid values, then your state machine is no longer finite. Push-down automatons is a different class of computational capability. They have a stack of infinite capacity, so they do have an infinite number of possible states. As far as I remember, Turing machines are the next step up in computational capability. You can make the argument that digital computers are finite state machines, they just tend to have somewhere around 21000000000000 different states which (in isolation) makes them pretty good at pretending to be Turing machines, but on the other hand they actually also have I/O devices that let them output data to external systems of unknown capabilities, and that output could potentially affect the input stream, so I'd argue they are fully Turing complete given an appropriate I/O device. (Thematically appropriate would be a tape station with some kind of automatic cutting and splicing to make it look like there was a spool with infinite capacity in either direction.) Edit: ^ above me is better at being concise
|
|
# ? May 4, 2021 19:52 |
|
rjmccall posted:Formally, a finite state machine is characterized by storing a statically-bounded amount of information. A pushdown automata can store unbounded information because the stack can grow without limit. This is the distinction I was getting at. There are also different types of finite state machines that people study to develop theories about various systems. For instance, timed automata are very important for reasoning about real-time systems.
|
# ? May 4, 2021 20:56 |
|
There are linear bounded automata, in between pushdown and turning machines. Each of these machine classes corresponds to a type of language in the chomsky hierarchy: FSM = regular language Pushdown automata = context free language LBA = context sensitive language Turing machine = unrestricted language
|
# ? May 4, 2021 21:24 |
|
Finite state machines and Turing machines accept the same class of languages whether you allow nondeterminism or not, nondeterministic pushdown automata are strictly more powerful than the deterministic ones, and it's still an open question for linear bounded automata (because no one really cares).
|
# ? May 4, 2021 21:29 |
|
Hammerite posted:so are you saying that "finite state machine" has 2 different meanings? how come? Wikipedia's article on "finite-state machine" defines them as being strictly less capable than pushdown automata. the class where all the cs majors learn this poo poo is called "theory of computation" or "automata theory." it was the first course i ever took where the professor assigned his own textbook. the warning sign was the subtitle "a gentle introduction" - that was a lie, the class was not gentle.
|
# ? May 4, 2021 23:36 |
|
Is there a string length limit that regex cannot handle? Namely in older systems (It is monk code, based off of lisp).
|
# ? May 7, 2021 03:46 |
|
|
# ? Jun 10, 2024 03:44 |
|
If there is, it would be specific to that particular system, rather than a general regex thing.
|
# ? May 7, 2021 03:57 |