|
Hughmoris posted:Rookie perl question: iirc perl has regex support, might try that
|
# ? Apr 15, 2024 03:44 |
|
|
# ? May 11, 2024 16:27 |
|
Hughmoris posted:Rookie perl question: An example To explain for a perl rookie, the check is: pre:if ($var =~ /(:|,)22(,|$)/) {
biznatchio fucked around with this message at 04:33 on Apr 15, 2024 |
# ? Apr 15, 2024 04:29 |
|
shoeberto posted:iirc perl has regex support, might try that
|
# ? Apr 15, 2024 08:58 |
|
shoeberto posted:iirc perl has regex support, might try that biznatchio posted:An example This is what I was looking for, thanks! I'm writing some simple utility scripts for work and, while Python would be easier for me, I just find Perl fun to write.
|
# ? Apr 15, 2024 12:58 |
|
Pablo Bluth posted:While the concept and implementation of regexp predates perl, the advanced form of regexp that we all know and love arose in perl. I doubt there will be another language where regexpes will be quite so central to a language's identity. Jokes on you, I was being a gigantic dumbass. .. on PURPOSE!! But yeah sorry I just assumed OP would have had experience to write an appropriate regex. The other post nailed it.
|
# ? Apr 15, 2024 14:05 |
|
In the interest of being helpful, I really like iterating on regexes here: https://regex101.com/
|
# ? Apr 15, 2024 14:06 |
|
shoeberto posted:Jokes on you, I was being a gigantic dumbass. .. on PURPOSE!!
|
# ? Apr 15, 2024 14:26 |
|
Actually measuring minor performance differences of this scale is very difficult and depends on the distribution of inputs, however, there is a different way to structure the pattern:code:
This is "find a literal 22 as a word that is somewhere after a colon". The backslash-b is a "word boundary" that automatically handles all non-letters-digits-or-underscores as well as the beginning and ending boundaries of the string. Also as there's known to be the single colon separating the IP from the port list, this pattern might make more intuitive sense because it shows that everything to the left of the colon doesn't matter. If you want additional validation, the full list of ports, etc., then it changes, but this is still twice as fast as fully splitting the string.
|
# ? Apr 16, 2024 05:04 |
|
PhantomOfTheCopier posted:Actually measuring minor performance differences of this scale is very difficult and depends on the distribution of inputs, however, there is a different way to structure the pattern: I was curious about performance of the different solutions. Anyone know best practices for benchmarking scripts in modern perl? I'm thinking I could generate a 1 GB file of IP addresses and ports, and benchmark the different solutions.
|
# ? Apr 16, 2024 21:50 |
|
Hughmoris posted:I was curious about performance of the different solutions. Anyone know best practices for benchmarking scripts in modern perl? I'm thinking I could generate a 1 GB file of IP addresses and ports, and benchmark the different solutions.
|
# ? Apr 16, 2024 23:37 |
|
In my experience, the only time you ever need to worry about regex performance is if you have backtracking; and even then only when you fall into a catastrophic backtracking case that generally only happens when you stack multiple variable length operators on top of each other. Neither of the provided regexes use a concerning backtracking pattern, so their performance differences are almost certainly immaterial, even across 1GB of data. From a performance perspective, Knuth's quote on the matter applies here: quote:The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. Choose one or the other based on readability; because in this case you're going to spend more wallclock time looking at the pattern as the developer than the CPU will spend actually executing it. (And it is worth pointing out that neither pattern will work correctly if the strings might ever contain IPv6 addresses; some additional pattern work would be necessary to distinguish colons in the address from the colon separating the address from the port list.)
|
# ? Apr 18, 2024 02:45 |
|
|
# ? May 11, 2024 16:27 |
|
Catastrophic backtracking is certainly possible, but happily much less likely in Perl because of the regexp optimizations. I remember being surprised how awful some of the libraries were in (popular languages) a few/five years ago.
|
# ? Apr 19, 2024 05:18 |