Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Sidpret
Jun 17, 2004

I rewrote everything to use Bing and they're being much nicer to me so far. Just an FYI in case anyone else ever needs to do something like this.

Adbot
ADBOT LOVES YOU

lmao zebong
Nov 25, 2006

NBA All-Injury First Team
I wrote a python script that scraped a google results page and got banned from google constantly - I ended up setting up a sleep command for a random amount of time between 20-35 seconds and after that I never had an issue getting banned again. I see you set it up for Bing now but if you wanted to go back to Google that worked for me.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



You might also consider httplib and re-use the connection object. I had a problem with urllib2 holding sockets open until the program ended which is really lovely for the server.

Python code:
from httplib import HTTPConnection
connection = HTTPConnection(host_name)
for word in word_list:
   connection.request('GET', query_url.format(word))
   response = connection.getresponse().read()
   # do stuff with response here

Polio Vax Scene
Apr 5, 2009



This is really vague/stupid but I'm trying to find an example of some language that has some conceptual way of using the numerators and denominators of fractions to avoid precision errors entirely. Help? Also I mean some sort of code that I can examine, not just a demonstration.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Manslaughter posted:

This is really vague/stupid but I'm trying to find an example of some language that has some conceptual way of using the numerators and denominators of fractions to avoid precision errors entirely. Help? Also I mean some sort of code that I can examine, not just a demonstration.

Haskell's standard library includes Data.Ratio, which is used to implement rational numbers at arbitrary precision.

The GMP library is a cross-platform implementation in C of, among many other things, arbitrary-precision rationals.

Strong Sauce
Jul 2, 2003

You know I am not really your father.






Google was using IE with Bing Toolbar doing those searches. What did they expect when they Bing received data saying honeypot word, <Y> ended up with a click to website <X>? All of that was just a whole bunch of bullshit Google conjured up when there was a big deal about Bing improving it's search results. It couldn't possibly be because Bing was actually, you know, researching new ways to improve it's algorithms.

I'm not a very big fan of Microsoft (although I think one of my cousins does work on something Bing Search related) but I felt that this "sting" was very un-Google-like for them to do. It was just a huge stink over nothing.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
The thing is that Bing must have gone to Google to get those results, somehow.

Somehow the Bing Toolbar had special knowledge that user following a link from Google to a specific website means a correlation, and that the Bing servers knew how to parse the Google URLs. Right?

JawnV6
Jul 4, 2004

So hot ...
Oh, you mean the documented behavior of the Bing toolbar that the Google engineers agreed to when they installed it before doing that test? The exact same thing that Chrome does across millions of installs? Yeah, pretty much.

JawnV6 fucked around with this message at 23:52 on Aug 24, 2012

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

JawnV6 posted:

Oh, you mean the documented behavior of the Bing toolbar that the Google engineers agreed to when they installed it before doing that test? The exact same thing that Chrome does across millions of installs? Yeah, pretty much.

Chrome does not do any user tracking. It's a myth that has been dispelled by multiple Google engineers. If you have a better citation than Google engineers, please let me know.

JawnV6
Jul 4, 2004

So hot ...

Suspicious Dish posted:

Chrome does not do any user tracking. It's a myth that has been dispelled by multiple Google engineers. If you have a better citation than Google engineers, please let me know.

It was Google Toolbar, not Chrome. My bad. The relevant question still stands, and Google Engineers were "mystified" when the Bing toolbar sent a clickstream, exactly as it said it would when they agreed to install it.

Strong Sauce
Jul 2, 2003

You know I am not really your father.





If someone like me can figure out how Bing got those results then someone at Google (perhaps say the Google Toolbar team, or you know, the loving Google Fellow in charge of the SRA) would have realized, "Oh if they are tracking what users search for and what they click, and the associated page they might be able to get results from that."

How does a loving "Google Fellow" who is in charge of something as important as Google's Search Ranking Algorithm not understand the way that Bing was getting the data?

If Bing got data for such an obvious non-word, then you think Google would deduce that they're probably tracking what you type and click through from the toolbar. But instead the guy thinks that they somehow detect the keyword, and scrape the results from Google rather than use the resources that Microsoft has put into developing Bing?

I mean in their loving "test" they search for keyword, then click on the only loving link associated with that search. Why the only way Bing could have gotten data like that is only through Google!

But the media is so enamored with Google that when Bing posts a response no one at Google even got so much as a wrist slap for creating this brouhaha because it benefited Google and the media benefited from the huge page views it got. It was a completely bogus fabrication just to slow down the slight increase in usage of Bing.

The most important thing to come out of that fiasco is that Google finally admitted that they can technically induce their SRA to include "synthetic results." But don't worry, they like totally scrapped that code once their farce of a sting was over.

Strong Sauce fucked around with this message at 06:46 on Aug 25, 2012

pseudorandom name
May 6, 2007

The problem was that instead of saying "Gee, Google is returning different results than us, maybe we should figure out why that happens" and then doing that, they said "gently caress it, just copy Google."

So while Google devoted the time, money and effort in making their spelling correction system awesome (which is how they noticed this in the first place), Microsoft just copied them and Bing's spelling correction is still terrible.

They then proved this by rigging Google to return garbage results for nonsensical queries, and watched as Bing blindly copied that directly into their index.

If you can't understand why this is embarrassing to the Bing team, you're a moron.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

It doesn't matter how Bing copied, whether from a toolbar or not. The thing is that Bing directly copied Google's results. There was code in the Bing toolbar to notice the search string when Google was in the URL address bar when the user clicked the specific link, and send it back to Microsoft.

They directly copied Google's index, through indirect means. Crowdsourcing (I hate that word, but it's appropriate here I think) their large install base to get around Google's API restrictions.

Strong Sauce
Jul 2, 2003

You know I am not really your father.





Suspicious Dish posted:

It doesn't matter how Bing copied, whether from a toolbar or not. The thing is that Bing directly copied Google's results. There was code in the Bing toolbar to notice the search string when Google was in the URL address bar when the user clicked the specific link, and send it back to Microsoft.

They directly copied Google's index, through indirect means. Crowdsourcing (I hate that word, but it's appropriate here I think) their large install base to get around Google's API restrictions.

Do you have a link to someone saying Bing only does this when the user was at a Google URL in the address bar? Because no one ever says that in any of the articles mentioned. Google's blog posts never say this or anything to this effect.

I don't know how Bing Toolbar works in the background when you opt-in to their default services. But my guess of how it works is that every page you visit gets saved to some storage over at Bing, and I'm also assuming that they also store what links you clicked on that page, and then they use this data later on as a signal as part of their overall search algorithm/ranking system.

I am also guessing that one of those signals is related to what words are on a page when the user clicks on a URL because there might be some correlation between that word and that URL.

So obviously when Google makes up a new word and starts typing in the search on Google, Bing Toolbar caches the SERP with this made up word in it, and saves it on disk somewhere for a rainy day.

Now when Google tries to search for that new word on Bing, Bing's Signal Collector go through their whole process of trying to determine the best results. Except none of the signals can find a match for this new word. Until finally they get a result in their Bing Toolbar Signal. There is ONE result! It's to the same exact link that Google kept clicking over and over again. So faced with the fact that the only association they can find for this new word is when Google went onto their own website to search for it, and then clicked the URL repeatedly pretending that they were users clicking on a link, Bing returns the only result they know about associated with this newly made up word and returns that as the best result.

This explains why the spelling mistake search, "torsorophy" ended up showing the correct first result on Bing even though Bing did not correct this spelling mistake. Because most likely a lot of random users have Bing Toolbar installed, but still use Google as their search engine. They search "torsorophy" on Google, Google corrects it to the correct spelling, "tarsorrhaphy" and displays the SERP for that word. The first result is Wikipedia, and considering most people only click on the first 3 links or so, it's not that difficult to reason that most users clicked on the Wikipedia article, and Bing Toolbar sent this data to its servers. Then when users started searching "torsorophy" on Bing, Bing can't really find any strong correlation to this word until it looks into its Bing Toolbar Signal and discovers that when this word is on a page. A lot of users click on a certain link (to Wikipedia). So given that this is the strongest signal to the misspelling of tarsorrhaphy, Bing displays that result first because it ends up ranking the best from all the signals that it's collected for, "torsorophy"

Here's how Google could have tested if Bing was only caching results/signals when it was a Google SERP. Create a new domain that hasn't been indexed by anyone. Create a page with a made up word and a link to a website that has no association with the domain or with the made up word (assuming the word is even pronounceable), then do the same exact experiment that you just did for the SERPs. If Bing is telling the truth about their clickstream signals. Then eventually searching for that made up word on Bing will reveal that URL on the page. And if it never shows up on Bing's website, then Google has much more solid evidence that Bing is just straight up copying the results when users are on a Google page using Bing Toolbar.

Then for even further proof, ask someone with a relatively popular website to do the same thing, so Bing can't use the excuse that it doesn't apply the signal for new domains. Because then you can show them that there were no results on this really popular website.

To me it seems very UnGoogleLike of them to not do their due diligence on determining whether or not Bing was actually copying Google SERPs directly. To not have a control by doing the same thing but on a non-Google site seems like an obvious oversight to them, and again very UnGoogleLike. If they seriously believed Bing was copying them I figure they would run more tests than the one they did.

Strong Sauce fucked around with this message at 10:07 on Aug 25, 2012

Sidpret
Jun 17, 2004

lmao zebong posted:

I wrote a python script that scraped a google results page and got banned from google constantly - I ended up setting up a sleep command for a random amount of time between 20-35 seconds and after that I never had an issue getting banned again. I see you set it up for Bing now but if you wanted to go back to Google that worked for me.

You win. Bing started kicking me off, I went back to google, implemented exactly this procedure, and it works fine. For some reason my program hangs after about 1,000 searches and won't continue searching or exit with an error. Kinda hard to debug so for now I'm just going to keep restarting it from where it left off. At any rate, thanks for this.

lmao zebong
Nov 25, 2006

NBA All-Injury First Team

Sidpret posted:

You win. Bing started kicking me off, I went back to google, implemented exactly this procedure, and it works fine. For some reason my program hangs after about 1,000 searches and won't continue searching or exit with an error. Kinda hard to debug so for now I'm just going to keep restarting it from where it left off. At any rate, thanks for this.

Yeah I'm really not too sure how Google checks if you're doing automated queries but they're pretty good at it. I had a constant 10 sec sleep time which I was hoping would originally work, but they still figured me out and banned me again. I figured doing a random sleep time made it look a bit more genuine, or it may just be that I'm waiting long enough that they don't catch it. Although I only had to run through 100 queries, I can't imagine having to wait for your script to run if you're pumping through 1000+.

Sidpret
Jun 17, 2004

When I thought it would be easy to do this I didn't bother to be careful with my queries. I managed to cut the number significantly by writing a few little checks in the program to make sure that it isn't going to query the same thing twice etc. I also decided there were a few things I could live without.

Still, it's going to be about 18k queries. My program stopped hanging so at 30 seconds per query it should be done... in about another 120 hours. It will probably be faster though, I think a lot of the later queries are going to be dupes of the earlier ones, so they won't run.

Ulio
Feb 17, 2011


ultrafilter posted:

Again, that's a huge topic. Are you looking for a general introduction, or do you have a specific problem in mind?

Yes an introduction would be good.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Sidpret posted:

When I thought it would be easy to do this I didn't bother to be careful with my queries. I managed to cut the number significantly by writing a few little checks in the program to make sure that it isn't going to query the same thing twice etc. I also decided there were a few things I could live without.

Still, it's going to be about 18k queries. My program stopped hanging so at 30 seconds per query it should be done... in about another 120 hours. It will probably be faster though, I think a lot of the later queries are going to be dupes of the earlier ones, so they won't run.

How timely does it have to be? SEOMoz and ahrefs have a secondary indexing service that shows google results updated weekly I think.

Obdicut
May 15, 2012

"What election?"
Hey all.

I'm trying to teach myself Ruby, using Why's guide, and ActiveState Komodo.

I have little to no previous programming experience-- just mucking around in VB to kludge things inside Word/Excel.

I'm going along fine, except for the 'require' function. I created a hash, saved it as 'filenameX' in the same directory as 'programX', and I used 'require 'filenameX' as the first line of 'programX'. I get a bunch of error messages, and it doesn't work. Do I have to tell it what directory to look in, or what am I doing wrong?

Johnny Cache Hit
Oct 17, 2011

Obdicut posted:

I get a bunch of error messages, and it doesn't work. Do I have to tell it what directory to look in, or what am I doing wrong?

Maybe! It's impossible to say without knowing the actual error message and seeing the code in question.

You could try taking that to the Rails Megathread because they'll probably be very helpful. I don't think CoC has a Ruby only megathread?

I don't know how in depth _why's guide is but if you are very new to programming you might take to Learn Ruby the Hard Way -- people are super happy with Learn Python the Hard Way so hopefully the port is good too.

Lamont Cranston
Sep 1, 2006

how do i shot foam

Obdicut posted:

Hey all.

I'm trying to teach myself Ruby, using Why's guide, and ActiveState Komodo.

I have little to no previous programming experience-- just mucking around in VB to kludge things inside Word/Excel.

I'm going along fine, except for the 'require' function. I created a hash, saved it as 'filenameX' in the same directory as 'programX', and I used 'require 'filenameX' as the first line of 'programX'. I get a bunch of error messages, and it doesn't work. Do I have to tell it what directory to look in, or what am I doing wrong?

Can't be certain this is your problem, but the current directory is no longer included by default in your load path (in the $LOAD_PATH variable) as of 1.9. You can either simply add it ($: is an alias for $LOAD_PATH):
code:
$: << "."
or use the require_relative method instead.

Obdicut
May 15, 2012

"What election?"

Lamont Cranston posted:

Can't be certain this is your problem, but the current directory is no longer included by default in your load path (in the $LOAD_PATH variable) as of 1.9. You can either simply add it ($: is an alias for $LOAD_PATH):
code:
$: << "."
or use the require_relative method instead.

Yep, require_relative did it. Thank you very much.

The Born Approx.
Oct 30, 2011
Parallel computing nerds: is MPI_ALLREDUCE a blocking communication function? Considering what it does, I assume it must be, but I'm getting some unexpected behavior that is making me second guess myself.

The Born Approx.
Oct 30, 2011

The Born Approx. posted:

Parallel computing nerds: is MPI_ALLREDUCE a blocking communication function? Considering what it does, I assume it must be, but I'm getting some unexpected behavior that is making me second guess myself.

Welp, after a long day an a half wasted debugging this issue, what I've figured out is that if I call this routine more than once in a given sub routine, it segfaults unless I give it the same buffers each time. Wtf? Do I have to call some other routine to flush some sort of MPI parameters or something?

tarepanda
Mar 26, 2011

Living the Dream
Is there a thread I could go to for questions about Haartraining?

Cizzo
Jul 5, 2007

Haters gonna hate.
Okay, so I need some advice.

At my current internship, I have been tasked with looking for a way to speed up this horrible process of reporting in Excel that they do which involves going to various internal websites and just filling in the numbers from the website to this spreadsheet.

I basically have had no Excel experience before this. So my question is, what would be the best way to go about this?

I can export all the data from these websites into CSV or XLS but I can't really figure out what method of approach would be best. I unfortunately don't know VBA but would be happy if someone could give a good reference site for it. My Google Fu for VBA resources always resulted in websites using Excel from 10 years ago and really nonsensical context.

Zhentar
Sep 28, 2003

Brilliant Master Genius

Cizzo posted:

I unfortunately don't know VBA but would be happy if someone could give a good reference site for it.

http://msdn.microsoft.com/en-us/library/office/gg278919

stubblyhead
Sep 13, 2007

That is treason, Johnny!

Fun Shoe

Cizzo posted:

Okay, so I need some advice.

At my current internship, I have been tasked with looking for a way to speed up this horrible process of reporting in Excel that they do which involves going to various internal websites and just filling in the numbers from the website to this spreadsheet.

I basically have had no Excel experience before this. So my question is, what would be the best way to go about this?

I can export all the data from these websites into CSV or XLS but I can't really figure out what method of approach would be best. I unfortunately don't know VBA but would be happy if someone could give a good reference site for it. My Google Fu for VBA resources always resulted in websites using Excel from 10 years ago and really nonsensical context.

Wrox's Excel 2003 VBA book was good, can't vouch for anything more current though. MSDN plus the online help should get you in the right direction though.

Sab669
Sep 24, 2009

So I'm working on a project for people to carpool together. But getting half way through the beginning stages, I realized I have no idea how to 'compare' zip codes. So say you have this guy searching for rides in zip code 01234, how would I know that zip code 03456 might be the adjacent state? I have no idea what the proper terminology or anything would be called to try and google it.

baquerd
Jul 2, 2007

by FactsAreUseless

Sab669 posted:

So I'm working on a project for people to carpool together. But getting half way through the beginning stages, I realized I have no idea how to 'compare' zip codes. So say you have this guy searching for rides in zip code 01234, how would I know that zip code 03456 might be the adjacent state? I have no idea what the proper terminology or anything would be called to try and google it.

You want to make an undirected graph with edge weights.

shrughes
Oct 11, 2008

(call/cc call/cc)

Sab669 posted:

So I'm working on a project for people to carpool together. But getting half way through the beginning stages, I realized I have no idea how to 'compare' zip codes. So say you have this guy searching for rides in zip code 01234, how would I know that zip code 03456 might be the adjacent state? I have no idea what the proper terminology or anything would be called to try and google it.

Ignore baquerd's reply, it's wrong.

Presumably you want a zipcode dictionary, a lookup table giving you "the" latitude and longitude for each zipcode (and perhaps states, too). Then use a 2D geographic index so that people can search for fellow carpoolers near themselves. If that's not available in your database engine, you can use a space-filling curve, probably a Hilbert curve, to accomplish the same result with a linear index and some careful geometry. Once you've gathered nearby results, you can use the haversine formula to find the distance between two latitude/longitude pairs. (You probably need to do some algebra or trig with the formula if you want to know the precise N-mile rectangle around a given coordinate, but that might be overkill unless you really are struggling on database performance there.) Of course, zipcodes don't really have an individual lat/long to them, and some zipcodes can be quite large. If you got actual addresses you could probably use some maps API to get more precise information, but I've never looked at that possibility.

Edit: Of course you're not really looking for rides from a given area, you're looking for rides that pass through a given area. So what you really want is the rider with the route that goes closest (in terms of driving time) to a given address.

shrughes fucked around with this message at 22:26 on Aug 29, 2012

Sab669
Sep 24, 2009

Well this sounds like a lot more than I can chew, haha. I think I found some open source framework to help :v:

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

This might help:
http://stackoverflow.com/questions/2410529/php-mysql-zip-code-proximity-search

Depending on your needs you might be able to offload some of it to Google maps.

Sab669
Sep 24, 2009

Google Maps was the original plan, but my teammate who was supposed to be researching it has been pretty terrible and we've pretty much had to scrap that and do it ourselves.

Deus Rex
Mar 5, 2005

edit: never mind, but look into PostGIS if you're using Postgres (if you're using mysql and don't need to, don't)

baquerd
Jul 2, 2007

by FactsAreUseless

shrughes posted:

Ignore baquerd's reply, it's wrong.

It may be suboptimal, and it will have some issues with weirdly shaped or large zip codes, but it's not wrong. Populate your graph once, which will be O(n^2), discarding edge weights over a certain size. Create a dictionary to the graph nodes, then when a request comes in, return all the neighbors. If this is an academic project it should be good enough model to use in a proof of concept.

Sidpret
Jun 17, 2004

Could you use data from openstreetmap? http://planet.openstreetmap.org

Doh004
Apr 22, 2007

Mmmmm Donuts...
Does anyone know if there's a service out there at that would allow me to programatically accept online payments and have them processed into individual accounts?

I run a service that allows people to keep track of their finances. I would like to incorporate the option for people to send a "bill" to their individual members. This bill would just be for a determined amount (done on my side) and have it directed to be deposited into the user's organization's bank account - as in I do not want any of this money to ever be in my hands.

I've already looked into: WePay, Stripe, Google Checkout/Wallet. The thing that they're all lacking is the ability to have money deposited into differentiating bank accounts. The only option is to have users create their own accounts on whatever WePay/Stripe website and setup their bank accounts manually and this is something I'd like to avoid. Instead, I would want one WePay/Stripe account and just route the money directly into a bank account that the user had provided to be on their account on my side.

Any ideas? I know this isn't necessarily programming related, but Stripe says it's "Payments for Developers" and this is in essence what I'm looking for.

Adbot
ADBOT LOVES YOU

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

I think what you're looking at is two different problems. The first is collecting payments; because of the security required by payment processors they are not going to want to be switching this account around at the drop of a hat. They're going to want that account to be rock solid and in good standing. As far as I know there is no legal way to set up a payment firehouse that comes straight from the consumer and ends up straight in the pocket of a quasi-random person except for:
- Bitcoins (lol)
- PayPal (assuming everyone has paypal, q.v. WePay, Stripe, Google Wallet, etc.)

The problem with what you want is basically a banking interconnect, which is usually reserved for the financial networks themselves. The only way I've seen to do this in the past is to consolidate collection into one account (e.g. PayPal) and then transfer into others, either using the native service (e.g. PayPal) or getting into the banking network, which usually doesn't happen unless you ARE a bank/services provider, or are doing enough money with one of them that they'll do it for you.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply