Coding Horrors: You can gather all your technical debt into one easy framework!

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »

ToxicFrog: Apr 26, 2008

E: hey new page let's get that quote tag in there

yaoi prophet posted:

Tumblr lets users reblog/like posts they see on their dashboard. When you look at an individual post, it also shows you the last 50 or so 'notes' (reblogs, likes, and replies) on that post, not only by that user, but by all people who reblogged that post. Pretty interesting, right? Sounds like the sort of data you might want to visualize if you're into that sort of thing.

Well, you can't. There's no way, API or scraping, to get all the notes; you have to manually do XmlHttpRequests. And even then you can't get all of them because it likes repeating notes and at some point it gives you the same set of 50 over and over again. Fortunately, you can detect this because it gives you the same XHR url, but still. Ugh.

^^ At least you can use XHR rather than having to scrape the page and then extract the info you need from from only-mostly-valid HTML 4.something.

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

evensevenone posted:

Is there anyone in the world who upon seeing the words "ANSI C" would think "C99"?

Personally, I'd reply with "do you mean ANSI C89 or ANSI C99" unless it was obvious from context.

# ? Jan 25, 2011 19:46

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 16:11

Opinion Haver: Apr 9, 2007

ToxicFrog posted:

E: hey new page let's get that quote tag in there

^^ At least you can use XHR rather than having to scrape the page and then extract the info you need from from only-mostly-valid HTML 4.something.

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

Hahahaha you think I'm getting XML. I'm getting raw HTML that the javascript would normally inject straight into the page; the injected HTML then itself has another XHR embedded. Tumblr has an XML API but it's useless for getting actual metadata.

In an unrelated note, the SQL database username for furaffinity, one of the largest furry art (read: porn) sites got leaked. How'd this happen? They were serving one of their .sys files, and someone figured out the magic path to use. I don't remember what else got found, but this line just stuck with me:

code:

addslashes(stripslashes(stripslashes($value)))

# ? Jan 25, 2011 19:52

Spime Wrangler: Feb 23, 2003; Because we can.

so do you think it's a 'they just suck' coding horror or a 'must protect our market share' coding horror?

# ? Jan 25, 2011 21:25

Opinion Haver: Apr 9, 2007

Spime Wrangler posted:

so do you think it's a 'they just suck' coding horror or a 'must protect our market share' coding horror?

I don't think that Tumblr has significant competition right now. The purpose is different from Twitter (more long-form posts) and Wordpress.org (more social). Not to mention that Tumblr's infrastructure is just crappy; it keeps going down/being slow and apparently right now people's unfollow/follow counts are being wonky. I think they just suck.

# ? Jan 25, 2011 21:53

Zombywuf: Mar 29, 2008

That sounds like tumbler are doing REST and doing it right. Where's the problem?

# ? Jan 25, 2011 22:01

Opinion Haver: Apr 9, 2007

Zombywuf posted:

That sounds like tumbler are doing REST and doing it right. Where's the problem?

The problem is that the data they're giving back is unreliable; instead of giving me the next 50 notes chronologically, it's giving me some random set that I may or may not have seen before.

# ? Jan 25, 2011 22:22

Vanadium: Jan 8, 2005

yaoi prophet posted:

The problem is that the data they're giving back is unreliable; instead of giving me the next 50 notes chronologically, it's giving me some random set that I may or may not have seen before.

Sounds perfectly stateless to me. :smug:

# ? Jan 25, 2011 22:26

Zombywuf: Mar 29, 2008

p.s. what is tumblr? I can tell it has lots of themes but what does it do and why would anyone want data out of it?

# ? Jan 25, 2011 23:00

wwb: Aug 17, 2004

tumblr is a place for hipsters too wordy for twitter and not smart enough to write full blog posts.

quote:

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

http://htmlagilitypack.codeplex.com/

Tell me where to send the bullets.

# ? Jan 25, 2011 23:18

NotShadowStar: Sep 20, 2000

yaoi prophet posted:

The problem is that the data they're giving back is unreliable; instead of giving me the next 50 notes chronologically, it's giving me some random set that I may or may not have seen before.

But it wasn't designed to be doing what you're doing with it in the first place, so I don't understand what the hell you're complaining about.

"Tumblr doesn't have an API to access this" is valid. "I'm going around and submitting XHR mimicking what Tumblr does and it's not working properly" is not.

# ? Jan 25, 2011 23:31

Zombywuf: Mar 29, 2008

ToxicFrog posted:

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

libxml2, also you know, webkit and even mshtml.

# ? Jan 25, 2011 23:56

dark_panda: Oct 25, 2004

Zombywuf posted:

libxml2, also you know, webkit and even mshtml.

libtidy might help, too.

# ? Jan 26, 2011 00:51

Nippashish: Nov 2, 2005; Let me see you dance!

R/parallel: Accessible Parallel Computing for Desktop Computers in R

R/parallel posted:

The package R/parallel enables the parallel execution of loops without data dependencies just by adding a single function: runParallel.

Here's one of their examples:

code:

  myFunction <- function( arg1, arg2, ... )
  {
    # Initial sequence of statements
    # Initializing variables and checking arguments
    variable1 <- constant1
    variable2 <- otherFunction( arg1 )
    ...
    if( "rparallel" %in% names( getLoadedDLLs()) )
    {
      runParallel( resultVar=c("resultVar1", "resultVar2" ),
                   resultOp= c("resultOP1", "resultOP2" ) )
    }
    else
    {
      # loop
      for( index in FirstValue:LastValue )
      {
        (more statements/loops/expressions/function calls/etc)
        ...
        tempVar1 <- functionA( arg1[ index ], variable1, ... )
        tempVar2 <- functionB( arg2[ index ], variable1, ... )
        resultVar1 <- resultOP1( resultVar1, tempVar1)
        resultVar2 <- resultOP2( resultVar2, tempVar2)
      }
    }

    # Finalizing calculation. Final sequence of statements
    (more statements/expressions/function calls/etc)
    ...

    return( anyCalculatedValue )
  }

When you call runParallel the environment figures out what if block you are in, looks for the corresponding else block (which must contain a for loop), and executes the loop it finds there in parallel.

:stare:

# ? Jan 26, 2011 05:29

Bozart: Oct 28, 2006; Give me the finger.

Nippashish posted:

R/parallel: Accessible Parallel Computing for Desktop Computers in R

Here's one of their examples:
code:
else block that runs no matter what?
When you call runParallel the environment figures out what if block you are in, looks for the corresponding else block (which must contain a for loop), and executes the loop it finds there in parallel.

For the first time I am impressed with Matlab's parfor loops.

# ? Jan 26, 2011 06:17

ToxicFrog: Apr 26, 2008

Zombywuf posted:

libxml2, also you know, webkit and even mshtml.

dark_panda posted:

libtidy might help, too.

Well, I've just been schooled. :downs:

Tidy looks ideal, it's nice and lightweight (I don't need a full XML parser, let alone a rendering engine) and has bindings to both Scala and Lua. This'll definitely come in handy. Thanks.

(HTMLAgilityPack looks nice too, but the .NET dependency is a problem.)

# ? Jan 26, 2011 06:19

Opinion Haver: Apr 9, 2007

NotShadowStar posted:

But it wasn't designed to be doing what you're doing with it in the first place, so I don't understand what the hell you're complaining about.

"Tumblr doesn't have an API to access this" is valid. "I'm going around and submitting XHR mimicking what Tumblr does and it's not working properly" is not.

I'm basically saying that Tumblr's website doesn't work properly. This weird behavior also happens when I click the 'Load more notes' button a bunch of times in a real browser.

# ? Jan 26, 2011 06:21

Wheany: Mar 17, 2006; Spinya^{ha^{ha^haha}ha}ha_{ha_{ha_haha}ha}ha!; Doctor Rope

ToxicFrog posted:

I would kill for a stand-alone library that takes arbitrarily hosed up HTML as input and gives you its best guess at the DOM, rather than the current choice between "only support XHTML/XML" and "shoot yourself".

If you use Python, Beautiful Soup

# ? Jan 26, 2011 07:22

Zombywuf: Mar 29, 2008

Wheany posted:

If you use Python, Beautiful Soup

Do not use beutiful soup, use lxml.etree which is the python front end to libxml2. Its fast and turns HTML into DOMs (with XPath (with regexes)).

# ? Jan 26, 2011 13:05

trex eaterofcadrs: Jun 17, 2005; My lack of understanding is only exceeded by my lack of concern.

dark_panda posted:

libtidy might help, too.

NekoHTML is a good option too.

# ? Jan 26, 2011 13:58

POKEMAN SAM: Jul 8, 2004

This is more of an API/OS/hardware horror than a coding horror, but I stumbled across this set of APIs today, and it amazed me how easy it was to call the functions and make awesome applications even more awesome:

http://msdn.microsoft.com/en-us/library/dd692964(v=VS.85).aspx

I'd like to write a WinAmp plugin and feed the music to this API

# ? Jan 27, 2011 01:31

NotShadowStar: Sep 20, 2000

Pretty sure you'd need UAC elevation to do that, since changing the same properties in the Control Panel asks for UAC access.

WinXP is fair game though, and it would be awesome as hell.

# ? Jan 27, 2011 02:02

Malloc Voidstar: May 7, 2007; Fuck the cowboys. Unf. Fuck em hard.

NotShadowStar posted:

Pretty sure you'd need UAC elevation to do that, since changing the same properties in the Control Panel asks for UAC access.

Obviously he should inject into explorer.exe, then.

# ? Jan 27, 2011 02:18

csammis: Aug 26, 2003; Mental Institution

DegaussMonitor

My first thought was to check the URL as this must have been an extremely well-designed MSDN parody site :psyduck:

# ? Jan 27, 2011 15:13

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Aleksei Vasiliev posted:

Obviously he should inject into explorer.exe, then.

explorer doesn't run elevated by default :confused:

- at least not that I can see?

Munkeymon fucked around with this message at 15:53 on Jan 27, 2011

# ? Jan 27, 2011 15:50

Malloc Voidstar: May 7, 2007; Fuck the cowboys. Unf. Fuck em hard.

Munkeymon posted:

explorer doesn't run elevated by default - at least not that I can see?

It's on the hardcoded UAC whitelist, which means injecting into it bypasses UAC on the default Windows 7 level of security (doesn't affect Vista)
That code was released during the beta and still works on a fully-patched W7 system.

# ? Jan 27, 2011 15:56

Wheany: Mar 17, 2006; Spinya^{ha^{ha^haha}ha}ha_{ha_{ha_haha}ha}ha!; Doctor Rope

Zombywuf posted:

Do not use beutiful soup, use lxml.etree which is the python front end to libxml2. Its fast and turns HTML into DOMs (with XPath (with regexes)).

I thought the whole point of BS was quick screen scraping:

quote:

You didn't write that awful page. You're just trying to get some data out of it. Right now, you don't really care what HTML is supposed to look like.

Neither does this parser.

# ? Jan 27, 2011 16:20

NotShadowStar: Sep 20, 2000

Aleksei Vasiliev posted:

It's on the hardcoded UAC whitelist, which means injecting into it bypasses UAC on the default Windows 7 level of security (doesn't affect Vista)
That code was released during the beta and still works on a fully-patched W7 system.

Wow, people keep touting that Apple doesn't have problems with OSX because nobody has targeted it. I don't think that's the case, because Apple just uses standard unix permissions instead of weirdo crap like this. Finder doesn't run in a special privilege, it asks permission just like every singe other application.

I think the real horror here is Microsoft keeps designing these elaborate security systems that always need special exemption for poo poo to work right, so people will just go after the exemptions and it all falls apart.

# ? Jan 27, 2011 16:37

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Aleksei Vasiliev posted:

It's on the hardcoded UAC whitelist, which means injecting into it bypasses UAC on the default Windows 7 level of security (doesn't affect Vista)
That code was released during the beta and still works on a fully-patched W7 system.

Oh, fantastic. I'll have to remember this whenever I get around to upgrading.

NotShadowStar posted:

Wow, people keep touting that Apple doesn't have problems with OSX because nobody has targeted it. I don't think that's the case, because Apple just uses standard unix permissions instead of weirdo crap like this. Finder doesn't run in a special privilege, it asks permission just like every singe other application.

I think the real horror here is Microsoft keeps designing these elaborate security systems that always need special exemption for poo poo to work right, so people will just go after the exemptions and it all falls apart.

That's how explorer/UAC works in Vista, but the problem is that Windows users aren't used to security prompts and bitch incessantly about it.

There's no need for a special exemption per se, but thousands of forum posts by the ignorant and change-averse about how much better it is to just disable UAC than to deal with something that approaches a real security system are apparently pretty convincing. Maybe if the loonix world would just heed this lesson, they'd make some headway among regular desktop users :kiddo:

# ? Jan 27, 2011 17:07

Scaevolus: Apr 16, 2007

Wheany posted:

I thought the whole point of BS was quick screen scraping:

The point was easy screen scraping. BeautifulSoup is written in Python, don't expect it to be fast.

lxml has essentially the same interface (and XPath), so there's really no reason not to use it.

Scaevolus fucked around with this message at 19:21 on Jan 27, 2011

# ? Jan 27, 2011 19:12

JediGandalf: Sep 3, 2004; I have just the top prospect YOU are looking for. Whaddya say, boss? What will it take for ME to get YOU to give up your outfielders?

csammis posted:

DegaussMonitor

My first thought was to check the URL as this must have been an extremely well-designed MSDN parody site

You'd be surprised how many people still use CRTs.

Or is it more of the fact that you can't believe there is a DegaussMonitor function.

# ? Jan 27, 2011 21:05

csammis: Aug 26, 2003; Mental Institution

JediGandalf posted:

You'd be surprised how many people still use CRTs.

Or is it more of the fact that you can't believe there is a DegaussMonitor function.

It's sort of both...I know that a lot of people still use CRTs, but at the same time I'm surprised that at the time of introduction of Windows Vista there was enough customer pressure on Microsoft to introduce APIs for software control of degaussing the monitor, and I'm surprised it's controllable via software at all.

# ? Jan 27, 2011 21:11

Impotence: Nov 8, 2010; Lipstick Apathy

yaoi prophet posted:

Tumblr lets users reblog/like posts they see on their dashboard. When you look at an individual post, it also shows you the last 50 or so 'notes' (reblogs, likes, and replies) on that post, not only by that user, but by all people who reblogged that post. Pretty interesting, right? Sounds like the sort of data you might want to visualize if you're into that sort of thing.

This is awesome.

quote:

<a class="more_notes_link" href="#" onclick="this.style.display='none';document.getElementById('notes_loading_POSTID').style.display = 'inline';if(window.ActiveXObject)var tumblrReq=new ActiveXObject('Microsoft.XMLHTTP');else if(window.XMLHttpRequest)var tumblrReq=new XMLHttpRequest();else return false;tumblrReq.onreadystatechange=function(){if(tumblrReq.readyState==4){var notes_html=tumblrReq.responseText.split('')[1].split('')[0];if(window.tumblrNotesLoaded)if(tumblrNotesLoaded(notes_html)==false)return;var more_notes_link=document.getElementById('more_notes_POSTID');var notes=more_notes_link.parentNode;notes.removeChild(more_notes_link);notes.innerHTML+=notes_html;if(window.tumblrNotesInserted)tumblrNotesInserted(notes_html);}};tumblrReq.open('GET','/notes/POSTID/REBLOG_KEY?from_c=',true);tumblrReq.send();return false;">Show more notes</a><span id="notes_loading_POSTID" style="display:none;">Loading...</span>

# ? Jan 27, 2011 22:05

NotShadowStar: Sep 20, 2000

I really hope there's compatibility issues for the reason that they just put it all in inline javascript instead of putting JS somewhere links by class or ID.

I know tumblr uses Rails and Rails has a link_to_function method for making a link to reference a JS function for like... ever. Or even link_to_remote, which causes strain on the server with http overhead and translating a template. Cripes.

# ? Jan 27, 2011 23:44

trex eaterofcadrs: Jun 17, 2005; My lack of understanding is only exceeded by my lack of concern.

NotShadowStar posted:

I really hope there's compatibility issues for the reason that they just put it all in inline javascript instead of putting JS somewhere links by class or ID.

I know tumblr uses Rails and Rails has a link_to_function method for making a link to reference a JS function for like... ever. Or even link_to_remote, which causes strain on the server with http overhead and translating a template. Cripes.

I think the real horror is that the poo poo is splitting on html comment tags.

# ? Jan 28, 2011 00:20

PraxxisParadoX: Jan 24, 2004; bittah.com; Pillbug

NotShadowStar posted:

I really hope there's compatibility issues for the reason that they just put it all in inline javascript instead of putting JS somewhere links by class or ID.

I know tumblr uses Rails and Rails has a link_to_function method for making a link to reference a JS function for like... ever. Or even link_to_remote, which causes strain on the server with http overhead and translating a template. Cripes.

Tumblr is written in PHP (...)

# ? Jan 28, 2011 01:29

NotShadowStar: Sep 20, 2000

Weird, you're right http://www.marco.org/55384019. Took a bunch of Googling about it to get there, there was a lot of confusion about it since it has the design sensibilities of a Rails app.

# ? Jan 28, 2011 02:52

Zombywuf: Mar 29, 2008

Wheany posted:

I thought the whole point of BS was quick screen scraping:

The gap between intent and result is very large. lxml handles tag soup way better than Beutiful soup, it also does everything else better and faster.

# ? Jan 28, 2011 11:54

dwazegek: Feb 11, 2005; WE CAN USE THIS

Constructor arguments are for scrubs:

code:

public static string Argument1Init { get; set;}
public static int Argument2Init { get; set;}

private string _argument1;
private int _argument2;

public SomeClass()
{
  _argument1 = Argument1Init;
  _argument2 = Argument2Init;
}

(names are changed)

# ? Jan 28, 2011 12:16

Wheany: Mar 17, 2006; Spinya^{ha^{ha^haha}ha}ha_{ha_{ha_haha}ha}ha!; Doctor Rope

Zombywuf posted:

The gap between intent and result is very large. lxml handles tag soup way better than Beutiful soup, it also does everything else better and faster.

Yeah, I checked lxml after making my post. I've just had Beautiful soup lodged int my skull for so long from people asking about html parsing (using regexes) that it had become my kneejerk answer.

# ? Jan 28, 2011 14:06

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 16:11

Orzo: Sep 3, 2004; IT! IT is confusing! Say your goddamn pronouns!

dwazegek posted:

Constructor arguments are for scrubs:

code:

public static string Argument1Init { get; set;}
public static int Argument2Init { get; set;}

private string _argument1;
private int _argument2;

public SomeClass()
{
  _argument1 = Argument1Init;
  _argument2 = Argument2Init;
}

(names are changed)

oh my god, this is horrible

# ? Jan 28, 2011 15:40

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »