Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
ToxicFrog
Apr 26, 2008


E: hey new page let's get that quote tag in there

yaoi prophet posted:

Tumblr lets users reblog/like posts they see on their dashboard. When you look at an individual post, it also shows you the last 50 or so 'notes' (reblogs, likes, and replies) on that post, not only by that user, but by all people who reblogged that post. Pretty interesting, right? Sounds like the sort of data you might want to visualize if you're into that sort of thing.

Well, you can't. There's no way, API or scraping, to get all the notes; you have to manually do XmlHttpRequests. And even then you can't get all of them because it likes repeating notes and at some point it gives you the same set of 50 over and over again. Fortunately, you can detect this because it gives you the same XHR url, but still. Ugh.

^^ At least you can use XHR rather than having to scrape the page and then extract the info you need from from only-mostly-valid HTML 4.something.

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

evensevenone posted:

Is there anyone in the world who upon seeing the words "ANSI C" would think "C99"?

Personally, I'd reply with "do you mean ANSI C89 or ANSI C99" unless it was obvious from context.

Adbot
ADBOT LOVES YOU

Opinion Haver
Apr 9, 2007

ToxicFrog posted:

E: hey new page let's get that quote tag in there


^^ At least you can use XHR rather than having to scrape the page and then extract the info you need from from only-mostly-valid HTML 4.something.

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

Hahahaha you think I'm getting XML. I'm getting raw HTML that the javascript would normally inject straight into the page; the injected HTML then itself has another XHR embedded. Tumblr has an XML API but it's useless for getting actual metadata.

In an unrelated note, the SQL database username for furaffinity, one of the largest furry art (read: porn) sites got leaked. How'd this happen? They were serving one of their .sys files, and someone figured out the magic path to use. I don't remember what else got found, but this line just stuck with me:
code:
addslashes(stripslashes(stripslashes($value)))

Spime Wrangler
Feb 23, 2003

Because we can.

so do you think it's a 'they just suck' coding horror or a 'must protect our market share' coding horror?

Opinion Haver
Apr 9, 2007

Spime Wrangler posted:

so do you think it's a 'they just suck' coding horror or a 'must protect our market share' coding horror?

I don't think that Tumblr has significant competition right now. The purpose is different from Twitter (more long-form posts) and Wordpress.org (more social). Not to mention that Tumblr's infrastructure is just crappy; it keeps going down/being slow and apparently right now people's unfollow/follow counts are being wonky. I think they just suck.

Zombywuf
Mar 29, 2008

That sounds like tumbler are doing REST and doing it right. Where's the problem?

Opinion Haver
Apr 9, 2007

Zombywuf posted:

That sounds like tumbler are doing REST and doing it right. Where's the problem?

The problem is that the data they're giving back is unreliable; instead of giving me the next 50 notes chronologically, it's giving me some random set that I may or may not have seen before.

Vanadium
Jan 8, 2005

yaoi prophet posted:

The problem is that the data they're giving back is unreliable; instead of giving me the next 50 notes chronologically, it's giving me some random set that I may or may not have seen before.

Sounds perfectly stateless to me. :smug:

Zombywuf
Mar 29, 2008

p.s. what is tumblr? I can tell it has lots of themes but what does it do and why would anyone want data out of it?

wwb
Aug 17, 2004

tumblr is a place for hipsters too wordy for twitter and not smart enough to write full blog posts.

quote:

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

http://htmlagilitypack.codeplex.com/

Tell me where to send the bullets.

NotShadowStar
Sep 20, 2000

yaoi prophet posted:

The problem is that the data they're giving back is unreliable; instead of giving me the next 50 notes chronologically, it's giving me some random set that I may or may not have seen before.

But it wasn't designed to be doing what you're doing with it in the first place, so I don't understand what the hell you're complaining about.

"Tumblr doesn't have an API to access this" is valid. "I'm going around and submitting XHR mimicking what Tumblr does and it's not working properly" is not.

Zombywuf
Mar 29, 2008

ToxicFrog posted:

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

libxml2, also you know, webkit and even mshtml.

dark_panda
Oct 25, 2004

Zombywuf posted:

libxml2, also you know, webkit and even mshtml.

libtidy might help, too.

Nippashish
Nov 2, 2005

Let me see you dance!
R/parallel: Accessible Parallel Computing for Desktop Computers in R

R/parallel posted:

The package R/parallel enables the parallel execution of loops without data dependencies just by adding a single function: runParallel.

Here's one of their examples:
code:
  myFunction <- function( arg1, arg2, ... )
  {
    # Initial sequence of statements
    # Initializing variables and checking arguments
    variable1 <- constant1
    variable2 <- otherFunction( arg1 )
    ...
    if( "rparallel" %in% names( getLoadedDLLs()) )
    {
      runParallel( resultVar=c("resultVar1", "resultVar2" ),
                   resultOp= c("resultOP1", "resultOP2" ) )
    }
    else
    {
      # loop
      for( index in FirstValue:LastValue )
      {
        (more statements/loops/expressions/function calls/etc)
        ...
        tempVar1 <- functionA( arg1[ index ], variable1, ... )
        tempVar2 <- functionB( arg2[ index ], variable1, ... )
        resultVar1 <- resultOP1( resultVar1, tempVar1)
        resultVar2 <- resultOP2( resultVar2, tempVar2)
      }
    }

    # Finalizing calculation. Final sequence of statements
    (more statements/expressions/function calls/etc)
    ...

    return( anyCalculatedValue )
  }
When you call runParallel the environment figures out what if block you are in, looks for the corresponding else block (which must contain a for loop), and executes the loop it finds there in parallel.

:stare:

Bozart
Oct 28, 2006

Give me the finger.

Nippashish posted:

R/parallel: Accessible Parallel Computing for Desktop Computers in R


Here's one of their examples:
code:
else block that runs no matter what?
When you call runParallel the environment figures out what if block you are in, looks for the corresponding else block (which must contain a for loop), and executes the loop it finds there in parallel.

:stare:

For the first time I am impressed with Matlab's parfor loops.

ToxicFrog
Apr 26, 2008


Zombywuf posted:

libxml2, also you know, webkit and even mshtml.

dark_panda posted:

libtidy might help, too.

Well, I've just been schooled. :downs: Tidy looks ideal, it's nice and lightweight (I don't need a full XML parser, let alone a rendering engine) and has bindings to both Scala and Lua. This'll definitely come in handy. Thanks.

(HTMLAgilityPack looks nice too, but the .NET dependency is a problem.)

Opinion Haver
Apr 9, 2007

NotShadowStar posted:

But it wasn't designed to be doing what you're doing with it in the first place, so I don't understand what the hell you're complaining about.

"Tumblr doesn't have an API to access this" is valid. "I'm going around and submitting XHR mimicking what Tumblr does and it's not working properly" is not.

I'm basically saying that Tumblr's website doesn't work properly. This weird behavior also happens when I click the 'Load more notes' button a bunch of times in a real browser.

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope

ToxicFrog posted:

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

If you use Python, Beautiful Soup

Zombywuf
Mar 29, 2008

Wheany posted:

If you use Python, Beautiful Soup

Do not use beutiful soup, use lxml.etree which is the python front end to libxml2. Its fast and turns HTML into DOMs (with XPath (with regexes)).

trex eaterofcadrs
Jun 17, 2005
My lack of understanding is only exceeded by my lack of concern.

dark_panda posted:

libtidy might help, too.

NekoHTML is a good option too.

POKEMAN SAM
Jul 8, 2004
This is more of an API/OS/hardware horror than a coding horror, but I stumbled across this set of APIs today, and it amazed me how easy it was to call the functions and make awesome applications even more awesome:

http://msdn.microsoft.com/en-us/library/dd692964(v=VS.85).aspx

I'd like to write a WinAmp plugin and feed the music to this API :D

NotShadowStar
Sep 20, 2000
Pretty sure you'd need UAC elevation to do that, since changing the same properties in the Control Panel asks for UAC access.

WinXP is fair game though, and it would be awesome as hell.

Malloc Voidstar
May 7, 2007

Fuck the cowboys. Unf. Fuck em hard.

NotShadowStar posted:

Pretty sure you'd need UAC elevation to do that, since changing the same properties in the Control Panel asks for UAC access.
Obviously he should inject into explorer.exe, then.

csammis
Aug 26, 2003

Mental Institution
DegaussMonitor

My first thought was to check the URL as this must have been an extremely well-designed MSDN parody site :psyduck:

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Aleksei Vasiliev posted:

Obviously he should inject into explorer.exe, then.

explorer doesn't run elevated by default :confused: - at least not that I can see?

Munkeymon fucked around with this message at 15:53 on Jan 27, 2011

Malloc Voidstar
May 7, 2007

Fuck the cowboys. Unf. Fuck em hard.

Munkeymon posted:

explorer doesn't run elevated by default :confused: - at least not that I can see?
It's on the hardcoded UAC whitelist, which means injecting into it bypasses UAC on the default Windows 7 level of security (doesn't affect Vista)
That code was released during the beta and still works on a fully-patched W7 system.

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope

Zombywuf posted:

Do not use beutiful soup, use lxml.etree which is the python front end to libxml2. Its fast and turns HTML into DOMs (with XPath (with regexes)).

I thought the whole point of BS was quick screen scraping:

quote:

You didn't write that awful page. You're just trying to get some data out of it. Right now, you don't really care what HTML is supposed to look like.

Neither does this parser.

NotShadowStar
Sep 20, 2000

Aleksei Vasiliev posted:

It's on the hardcoded UAC whitelist, which means injecting into it bypasses UAC on the default Windows 7 level of security (doesn't affect Vista)
That code was released during the beta and still works on a fully-patched W7 system.

Wow, people keep touting that Apple doesn't have problems with OSX because nobody has targeted it. I don't think that's the case, because Apple just uses standard unix permissions instead of weirdo crap like this. Finder doesn't run in a special privilege, it asks permission just like every singe other application.

I think the real horror here is Microsoft keeps designing these elaborate security systems that always need special exemption for poo poo to work right, so people will just go after the exemptions and it all falls apart.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Aleksei Vasiliev posted:

It's on the hardcoded UAC whitelist, which means injecting into it bypasses UAC on the default Windows 7 level of security (doesn't affect Vista)
That code was released during the beta and still works on a fully-patched W7 system.

Oh, fantastic. I'll have to remember this whenever I get around to upgrading.

NotShadowStar posted:

Wow, people keep touting that Apple doesn't have problems with OSX because nobody has targeted it. I don't think that's the case, because Apple just uses standard unix permissions instead of weirdo crap like this. Finder doesn't run in a special privilege, it asks permission just like every singe other application.

I think the real horror here is Microsoft keeps designing these elaborate security systems that always need special exemption for poo poo to work right, so people will just go after the exemptions and it all falls apart.

That's how explorer/UAC works in Vista, but the problem is that Windows users aren't used to security prompts and bitch incessantly about it.

There's no need for a special exemption per se, but thousands of forum posts by the ignorant and change-averse about how much better it is to just disable UAC than to deal with something that approaches a real security system are apparently pretty convincing. Maybe if the loonix world would just heed this lesson, they'd make some headway among regular desktop users :kiddo:

Scaevolus
Apr 16, 2007

Wheany posted:

I thought the whole point of BS was quick screen scraping:

The point was easy screen scraping. BeautifulSoup is written in Python, don't expect it to be fast.

lxml has essentially the same interface (and XPath), so there's really no reason not to use it.

Scaevolus fucked around with this message at 19:21 on Jan 27, 2011

JediGandalf
Sep 3, 2004

I have just the top prospect YOU are looking for. Whaddya say, boss? What will it take for ME to get YOU to give up your outfielders?

csammis posted:

DegaussMonitor

My first thought was to check the URL as this must have been an extremely well-designed MSDN parody site :psyduck:
You'd be surprised how many people still use CRTs.

Or is it more of the fact that you can't believe there is a DegaussMonitor function.

csammis
Aug 26, 2003

Mental Institution

JediGandalf posted:

You'd be surprised how many people still use CRTs.

Or is it more of the fact that you can't believe there is a DegaussMonitor function.

It's sort of both...I know that a lot of people still use CRTs, but at the same time I'm surprised that at the time of introduction of Windows Vista there was enough customer pressure on Microsoft to introduce APIs for software control of degaussing the monitor, and I'm surprised it's controllable via software at all.

Impotence
Nov 8, 2010
Lipstick Apathy

yaoi prophet posted:

Tumblr lets users reblog/like posts they see on their dashboard. When you look at an individual post, it also shows you the last 50 or so 'notes' (reblogs, likes, and replies) on that post, not only by that user, but by all people who reblogged that post. Pretty interesting, right? Sounds like the sort of data you might want to visualize if you're into that sort of thing.

This is awesome.

quote:

<a class="more_notes_link" href="#" onclick="this.style.display='none';document.getElementById('notes_loading_POSTID').style.display = 'inline';if(window.ActiveXObject)var tumblrReq=new ActiveXObject('Microsoft.XMLHTTP');else if(window.XMLHttpRequest)var tumblrReq=new XMLHttpRequest();else return false;tumblrReq.onreadystatechange=function(){if(tumblrReq.readyState==4){var notes_html=tumblrReq.responseText.split('<!-- START '+'NOTES -->')[1].split('<!-- END '+'NOTES -->')[0];if(window.tumblrNotesLoaded)if(tumblrNotesLoaded(notes_html)==false)return;var more_notes_link=document.getElementById('more_notes_POSTID');var notes=more_notes_link.parentNode;notes.removeChild(more_notes_link);notes.innerHTML+=notes_html;if(window.tumblrNotesInserted)tumblrNotesInserted(notes_html);}};tumblrReq.open('GET','/notes/POSTID/REBLOG_KEY?from_c=',true);tumblrReq.send();return false;">Show more notes</a><span id="notes_loading_POSTID" style="display:none;">Loading...</span>

NotShadowStar
Sep 20, 2000
I really hope there's compatibility issues for the reason that they just put it all in inline javascript instead of putting JS somewhere links by class or ID.

I know tumblr uses Rails and Rails has a link_to_function method for making a link to reference a JS function for like... ever. Or even link_to_remote, which causes strain on the server with http overhead and translating a template. Cripes.

trex eaterofcadrs
Jun 17, 2005
My lack of understanding is only exceeded by my lack of concern.

NotShadowStar posted:

I really hope there's compatibility issues for the reason that they just put it all in inline javascript instead of putting JS somewhere links by class or ID.

I know tumblr uses Rails and Rails has a link_to_function method for making a link to reference a JS function for like... ever. Or even link_to_remote, which causes strain on the server with http overhead and translating a template. Cripes.

I think the real horror is that the poo poo is splitting on html comment tags.

PraxxisParadoX
Jan 24, 2004
bittah.com
Pillbug

NotShadowStar posted:

I really hope there's compatibility issues for the reason that they just put it all in inline javascript instead of putting JS somewhere links by class or ID.

I know tumblr uses Rails and Rails has a link_to_function method for making a link to reference a JS function for like... ever. Or even link_to_remote, which causes strain on the server with http overhead and translating a template. Cripes.

Tumblr is written in PHP (...)

NotShadowStar
Sep 20, 2000
Weird, you're right http://www.marco.org/55384019. Took a bunch of Googling about it to get there, there was a lot of confusion about it since it has the design sensibilities of a Rails app.

Zombywuf
Mar 29, 2008

Wheany posted:

I thought the whole point of BS was quick screen scraping:

The gap between intent and result is very large. lxml handles tag soup way better than Beutiful soup, it also does everything else better and faster.

dwazegek
Feb 11, 2005

WE CAN USE THIS :byodood:
Constructor arguments are for scrubs:

code:
public static string Argument1Init { get; set;}
public static int Argument2Init { get; set;}

private string _argument1;
private int _argument2;

public SomeClass()
{
  _argument1 = Argument1Init;
  _argument2 = Argument2Init;
}
(names are changed)

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope

Zombywuf posted:

The gap between intent and result is very large. lxml handles tag soup way better than Beutiful soup, it also does everything else better and faster.

Yeah, I checked lxml after making my post. I've just had Beautiful soup lodged int my skull for so long from people asking about html parsing (using regexes) that it had become my kneejerk answer.

Adbot
ADBOT LOVES YOU

Orzo
Sep 3, 2004

IT! IT is confusing! Say your goddamn pronouns!

dwazegek posted:

Constructor arguments are for scrubs:

code:
public static string Argument1Init { get; set;}
public static int Argument2Init { get; set;}

private string _argument1;
private int _argument2;

public SomeClass()
{
  _argument1 = Argument1Init;
  _argument2 = Argument2Init;
}
(names are changed)
oh my god, this is horrible

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply