Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
salted hash browns
Mar 26, 2007
ykrop
man 2007 ya'll old

Adbot
ADBOT LOVES YOU

Rufus Ping
Dec 27, 2006





I'm a Friend of Rodney Nano
java 7 added support for named groups in regexes e.g.

(?<foo>\w+)(?<bar>\d+)

is there some way to get the names of all the groups from a Matcher object if i dont know them in advance?

(like how i can get all the values of the groups by calling group(i) for 0<i<=groupCount() )

Rufus Ping
Dec 27, 2006





I'm a Friend of Rodney Nano
i mean preferably without doing somematcher.pattern.toString() and using... a regex... to find chevron-enclosed names lol

wins32767
Mar 16, 2007

Milkie Galore posted:

i mean preferably without doing somematcher.pattern.toString() and using... a regex... to find chevron-enclosed names lol

This is the one true way.

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>

Milkie Galore posted:

i mean preferably without doing somematcher.pattern.toString() and using... a regex... to find chevron-enclosed names lol

that's how stargate did it

Shaggar
Apr 26, 2006

Milkie Galore posted:

java 7 added support for named groups in regexes e.g.

(?<foo>\w+)(?<bar>\d+)

is there some way to get the names of all the groups from a Matcher object if i dont know them in advance?

(like how i can get all the values of the groups by calling group(i) for 0<i<=groupCount() )

java regex protip. group(0) always refers to the entire matched string. group(1) is the first one, and the last one is groupcount()+1. its loving stupid but thats the way it works.

as far as names go, i dont see any way to get the name if you have the index. you'll have to regex the regex. altho maybe theres a commons lib that adds that functionality 4 u

Shaggar
Apr 26, 2006
tbh theres probably a way better way to do what u want tho.

Shaggar
Apr 26, 2006
dynamic regex group naming smells of p-language hackery.

Meiwaku
Jan 10, 2011

Fun for the whole family!

Shaggar posted:

regex group naming smells

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>
basically anything that's not compile-time checked like using strings for poo poo is the worst

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>
i hope you're at least burying this regex inside some class with a simple external API

Rufus Ping
Dec 27, 2006





I'm a Friend of Rodney Nano

Shaggar posted:

java regex protip. group(0) always refers to the entire matched string. group(1) is the first one, and the last one is groupcount()+1. its loving stupid but thats the way it works.

the last one is groupcount(), note that i used <= not <

Shaggar posted:

tbh theres probably a way better way to do what u want tho.

dynamic regex group naming smells of p-language hackery.

i dont think there really is unfortunately, other than insisting the user uses numbers rather than names

vapid cutlery posted:

basically anything that's not compile-time checked like using strings for poo poo is the worst

i hope you're at least burying this regex inside some class with a simple external API

yeah as best i can/need to. its a peculiar one, its user-supplied but at compile time (its part of a parser generator)

homercles
Feb 14, 2010

You have to cheat:

code:
import java.util.regex.Pattern;
import java.util.Map;
import java.lang.reflect.Method;
import java.lang.reflect.AccessibleObject;

class RegexTest {
  @SuppressWarnings("unchecked")
  public static void main(String[] args) throws Exception {
    Pattern p = Pattern.compile("(?<foo>\\w+)(?<bar>\\d+)");
    // Map<String, Integer> florp = p.namedGroups();

    Method m = Pattern.class.getDeclaredMethod("namedGroups");
    m.setAccessible(true);
    Map<String, Integer> florp = (Map<String, Integer>) m.invoke(p);
    for (String s: florp.keySet()) {
      System.out.format("%s => %d\n", s, florp.get(s));
    }   
  }
}

Shaggar
Apr 26, 2006
lol

Rufus Ping
Dec 27, 2006





I'm a Friend of Rodney Nano
ah i saw that private map but didn't know there was a way to get at it, thanks homercles!

Rufus Ping
Dec 27, 2006





I'm a Friend of Rodney Nano
also brb throwing up

Nomnom Cookie
Aug 30, 2009



Milkie Galore posted:

the last one is groupcount(), note that i used <= not <


i dont think there really is unfortunately, other than insisting the user uses numbers rather than names


yeah as best i can/need to. its a peculiar one, its user-supplied but at compile time (its part of a parser generator)
Using user-supplied regexes opens you to a complexity DoS

NeoHentaiMaster
Jul 13, 2004
More well adjusted then you'd think.

Shaggar posted:

tbh theres probably a way better way to do what u want tho.

Write it in Perl.

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp
perl regex is fully sick

Max Facetime
Apr 18, 2009

Nomnom Cookie posted:

Using user-supplied regexes opens you to a complexity DoS

more importantly it makes for a nasty api

may I suggest

code:
interface SomeParser {
  SomeParser register(String key, String valueRegex);
  void parse(String content);
  String getValue(String key);
}
which at least limits the nastiness

X-BUM-RAIDER-X
May 7, 2008

NeoHentaiMaster posted:

Write it in Perl.

Rufus Ping
Dec 27, 2006





I'm a Friend of Rodney Nano

Nomnom Cookie posted:

Using user-supplied regexes opens you to a complexity DoS

yes - though the user in this case is another coder, using [the output of] my code in theirs (think antlr,yacc,sablecc etc) so it doesnt really matter and theyre unlikely to do this to themselves and its not my problem if they do

its not like im running a website where ppl can type an arbitrary regex in a box dont worry

homercles
Feb 14, 2010

:tipshat:

AAAUUUUUGGGHHHHHHHHHH why aren't you writing this in perl6/c#/ruby/clojure/chicken scheme/delphi/anything but C++/elisp and bind it to ctrl-alt-metal-f-g-6/$MY_FAVOURITE_SHITTY_LANGUAGE

write it in scala tho scala is p.cool and totes not my favourite lovely language

homercles
Feb 14, 2010

rotor posted:

guys guys guys

can we just let shaggar be shaggar and get back to talking about how im at a difficult crossroads in my career and am also pretty depressed about the whole thing?

idk maybe you're just getting old

getting older

homercles
Feb 14, 2010

carbon date rotor's posts

tef
May 30, 2004

-> some l-system crap ->

Nomnom Cookie posted:

Using user-supplied regexes opens you to a complexity DoS

http://code.google.com/p/re2/ :v:

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror

vapid cutlery posted:

basically anything that's ... compile...d ... is the worst

tef
May 30, 2004

-> some l-system crap ->

Tiny Bug Child posted:

writing software is just a lot more fun in general when you're working against the user

this is actually quite true.

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>

homercles posted:

:tipshat:

AAAUUUUUGGGHHHHHHHHHH why aren't you writing this in perl6/c#/ruby/clojure/chicken scheme/delphi/anything but C++/elisp and bind it to ctrl-alt-metal-f-g-6/$MY_FAVOURITE_SHITTY_LANGUAGE

write it in scala tho scala is p.cool and totes not my favourite lovely language

you forgot objective c

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>
xpath is kinda cool i wonder if there's benchmarks for GDataXML

Zizzyx
Sep 18, 2007

INTERGALACTIC CAN CHAMPION

i want a good reason to use xpath. it's neat

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>

Zizzyx posted:

i want a good reason to use xpath. it's neat

whenever i have to scrape a webpage i'm going to use xpath from now on. having something that's aware of the *ml hierarchy is invaluable

0xB16B00B5
Aug 24, 2006

by Y Kant Ozma Post

vapid cutlery posted:

whenever i have to scrape a webpage i'm going to use xpath from now on. having something that's aware of the *ml hierarchy is invaluable

your posts are on the bottom

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>

0xB16B00B5 posted:

your posts are on the bottom

because i'm so grassroots? thanks

0xB16B00B5
Aug 24, 2006

by Y Kant Ozma Post
youre welcome

i appreciate your contributions to yawspaws

Meiwaku
Jan 10, 2011

Fun for the whole family!

vapid cutlery posted:

whenever i have to scrape a webpage i'm going to use xpath from now on. having something that's aware of the *ml hierarchy is invaluable

I wish SO badly I could do this, but unfortunately some huge percentage of HTML is a horrible mess. Probabilistic tree's to the rescue.

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>

Meiwaku posted:

I wish SO badly I could do this, but unfortunately some huge percentage of HTML is a horrible mess. Probabilistic tree's to the rescue.

i think beautifulsoup was a good way to deal with those actually, idk if it's available outside of python though

tef
May 30, 2004

-> some l-system crap ->

Meiwaku posted:

I wish SO badly I could do this, but unfortunately some huge percentage of HTML is a horrible mess. Probabilistic tree's to the rescue.

libxml/lxml handled a bunch of these with xpath. it works.

vapid cutlery
Apr 17, 2007

php:
<?
"it's george costanza" ?>

tef posted:

libxml/lxml handled a bunch of these with xpath. it works.

yeah iirc ios includes libxml2 and GDataXML+HTML is built on top of it and it rocks

Adbot
ADBOT LOVES YOU

zeekner
Jul 14, 2007

i've run into some really bad xpath implementations, but if you find a good one it's hella nice

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply