Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.





:what: welp, at least there's a workaround, but am I doing something fantastically stupid here? I mean, even stupider than my normal level.

To be clear, you're seeing sorted fail to finish sorting when I hand it the output of Series.clip.

Adbot
ADBOT LOVES YOU

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



SurgicalOntologist posted:

Isn't there a sorted method of Series? If so it's probably more reliable. In any case you should report this on the pandas github.

The only one I can find is for an Index, but I only started using Pandas on Tuesday, so my knowledge and understanding of its capabilities is very much lacking :)

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



vikingstrike posted:

Here are the docs for a Series sort: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort.html?highlight=sort#pandas.Series.sort
and the DataFrame sort: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort.html?highlight=sort#pandas.DataFrame.sort

If your DataFrame is df, then just use df.sort(['var1', 'var2']). For Series, it should just be s.sort().

I know this is from forever ago, but these (and .order) are even more hilariously broken.

unrelated, but has GitHub never had the ability to just search bugs or have I just never reported a bug on a project with hundreds of bugs?

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



ohgodwhat posted:

I've never had an issue with pd's sorts, which isn't to say it can't happen, I'm just curious what you're doing that you're seeing a problem.

I didn't think there were NaNs in the data but there were, basically. Being new to Pandas, I thought the fancy slicing notation like s[(s > 0) && (s < whatever)] was the same as .clip(0, whatever). Turns out the first filters out NaNs and the second doesn't. I did accidentally find an inconsistency between the behavior of .sort and .order, so hooray for stupid newbies, I guess?

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



the posted:

Fullstack backend Python developer at a small startup of less than a dozen people.

Here's the one I did for an interview I had the other week:

quote:

Using the CDC Birth Vital Statistics data set and Python, design and develop one or more insightful views of the data, data facets, or subsets of data. What did you learn and can demonstrate from the data that surprised you? Bonus points for using AppEngine.

Because they're an AppEngine shop. I was surprised how quick and easy it was to get my idiot baby "compute r of any two things and group by a third" idea working on AppEngine having never used it before and I'm not even a full time Python dev right now.

What I'm getting at is think of something harder or just steal that idea :)

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



The only time I ever used Smith-Waterman my sequences were too short to get any benefit out of using numpy over a mostly naive implementation :\

Looking at it again, I could probably abuse a multi-level generator to speed it up some :unsmigghh:

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Haystack posted:

I'd also add that Python's official documentation is excellent, and good for getting oriented by.

Beyond that, it depends on what you want to do. In my estimation, Python's strong points are web programming, scientific scripting, systems administration, general scripting, and rapid prototyping.

Just chiming in to say that's how I learned: I read through the docs and fiddled with examples in a REPL. I already knew several other not-super-dissimilar languages (JS, PHP) at the time, though.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Thermopyle posted:

Just to be clear you're using a dict there, and a dict is the python equivalent of an associative array. Continue as you were. You're doing the right thing.

PHP arrays are actually more like OrderedDicts, but that distinction wouldn't matter much most of the time.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Thermopyle posted:

I don't understand this. What's bumpy about writing methods?

edit: oops, there's another page. I still don't get where you're coming from.

It's more annoying to write self.property than simply property as you would in most languages with the concept of a class method, for one.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



tef posted:

(Notably, most languages doesn't include JavaScript, Ruby, Lua. Ruby has uninheritable singleton class methods, and uh, javascript is prototypical)

So we're talking C# or Java here. Python actually has class methods too.

Adding C++, VB and Objective-C* you get half of the top 10 most used languages and the set that a majority of people are going to be used to. So, yeah, Python is going to have some explaining to do because the way it handles scope resolution in methods just is different from the "norm" as most people see it. Note that I'm not saying it's bad or complaining! Just pointing out that the average experienced programmer is going to see explicit self, get annoyed and assume it's some weird language wart because their home language doesn't make them do that.

"concept of methods that belong to instances of classes (or 'objects in memory' if you prefer) as opposed to freestanding functions" since "concept of a class method" wasn't sufficient.

*sort of: self is required to run setters/getters but not to access underlying members (for anyone like me who hasn't used it and unlike me doesn't feel like reading through the docs)

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Jose Cuervo posted:

Is this more along the lines of what you were saying? I am not sure I have used a Pool correctly here - multiple processes get spawned (when I look in Task Manager, but they each use <10% CPU), and the code does not terminate even after 7 minutes of running, where the unparallelized code is done after 3 minutes.

I was able to get a speedup with joblib by breaking the list of (bldg_id, bldg_coord) tuples into 4 equal sized lists, and then using joblib to run four processes in parallel (i.e. run the four for loops in parallel). When I run the code this way four processes do get spawned and they use 100% CPU (25% each). It just seems like a hacky way to get things done, but I suppose it is better than nothing.

You can also set chunksize to the length of the list divided by the pool_size. Someone night be able to say for sure, but I think multiprocessing is using such a small default chunksize to move data between threads that it's actually spending a ton of (wall-clock) time waiting for the OS to move tiny chunks of data between threads and the solution I found was to basically force it to move the whole dang list (or as close to that as the OS will allow) to the process at once.

I wouldn't call it "hacky", but I'm biased :)

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Karthe posted:

JWT plugin

Hmm, I've read that phrase today... where was - oh https://auth0.com/blog/2015/03/31/critical-vulnerabilities-in-json-web-token-libraries/

Careful!

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



BeautifulSoup uses lxml (and other parsers) under the hood so the only reason not to use it is if you really hate its API and know that your other favorite parser is perfectly capable of handling whatever you happen to throw at it.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



onionradish posted:

True. I didn't mean I used LXML "rather than" BeautifulSoup to be :smuggo:. That was intended to be clarification on why I didn't use BeautifulSoup in the code snippet. I learned on and have done all my parsing with LXML so I know its syntax and just haven't had a reason to switch.

That's fair, but I feel like I see a lot of advice to avoid BS because lxml is better at parsing, which grates on me.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Thermopyle posted:

Didn't BS used to not use lxml? I feel like that was a thing and the reason people started recommending lxml instead.

Maybe at some point but it has for years and was using lxml when I first saw that advice pop up :shrug:


onionradish posted:

From a quick scan of BS docs, the syntax looks pretty similar. Not sure how to do a "first-child" selector on games, though. We only want the first TD element in those rows.

It's an iterator, so just call next() once.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



I think it supports :first-of-type but you're still getting an iterator back, so...

E: you could .find('td')

Munkeymon fucked around with this message at 20:55 on Apr 7, 2015

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



They overcomplicated
Python code:
[str.strip() for str in BS(html).find(id='main').strings if len(str.strip())]
in their example :colbert:

I'm sure that can be done without calling strip twice, but I have a sinus headache and don't feel like giving myself a nested generator headache, too.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



onionradish posted:

gently caress, I can't tell whether responses are real or just cruel obfuscation and code golf anymore....

I'm going back to LXML.

E: Dominoes' joke is the only one I got...

I was being serious. The example on the Soupy page is what you get if you don't read the BS docs. It's like a bitcoiner's example of a credit card transaction that includes steps like 'fish wallet out of jorts' and 'remove card from wallet (ugh!)'. Obviously you don't have to use a generator but you also don't have to do convoluted poo poo with isinstance to get text out of BS like they're pretending you do.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



hooah posted:

I'm trying to evaluate a program I wrote by comparing predicted lines against ground-truth lines. How can I pull a line from each file? I tried doing for ground_truth_line, pred_line in ground_truth, predicted: where ground_truth and predicted are the respective files, but I got an error "too many values to unpack".

I think you want
Python code:
from itertools import izip

for ground_truth_line, pred_line in izip(ground_truth, predicted):
   #etc

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



hooah posted:

"Cannot import name 'izip'" :( I'm using PyCharm with Python 3.4.1.

My bad - I forgot I was living in the future now


KICK BAMA KICK posted:

In Python 3 just use the built-in zip. As I recall the difference is that Python 2's zip has the limitation that it needs actual sequences like lists or tuples rather than iterators, which izip was invented to handle. In Python 3 they just made zip capable of handling iterators and got rid of izip.

zip in 2.x handles iterators just fine but it consumes them in their entirety immediately in order to return a list as opposed to izip which returns an iterator that consumes the input iterators as needed, so zip in 2.x doesn't actually do what hooah was trying to do which is iterate over the lines in the files until there was a difference.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Use the CSV module and maybe fileinput if you want to make batching the files a bit easier but you've already written that part so whatevs.

Python code:
import csv

with open(studyfile_name, 'rb') as study_f:
    study = csv.DictReader(study_f, dialect=csv.excel_tab)
    filtered_study = [line for line in study if line['Accuracy-T'] not in ['2', '3'] and int(line['RT-P']) >= 100]
(untested)

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Poizen Jam posted:

Thanks for the input- I'll try some of the solutions posted. I originally adapted the code I posted from a script that would cross reference two different files; ie. It would check Phase 1 for incorrect answers and remove from analysis those same items in Phase 2. That script works flawlessly, but as I discovered iterating over a list currently being modified seems to be problematic for Python. Would I be correct in assuming it has something to do with skipping lines when a previous line has been deleted? If so couldn't you insert a command to continuously restart the line iterations from the beginning until it reached the end without encountering a 2 or a 3?

Just don't modify a list while iterating over it. It's A Bad Idea in any language I can think of. Make a new list with the items you want or write them out to the result file immediately when you find them.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Most CSV libraries either know how to or can be trivially configured to parse tab delimited files since it's (mostly) just a matter of looking for '\t' instead of ','

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



ch3cooh posted:

I'm an engineer with an oil and gas company that up until 6 weeks ago knew nothing about coding. That's when my boss assigned me the task of integrating of our daily operations reporting system (basically just gui slapped on an Oracle DB) into Spotfire. I managed to teach myself enough SQL to pull in the data I want (though I won't say it's particularly efficient). Now I am running into things that apparently have to be done through IronPython. So my questions are:

A) how different is IronPython from Python?

B) can anyone recommend some accessible IronPython/Python references? (By this I mean things that won't assume I'm already conversant in coding and just need to Python tips but that I need to start from zero)

A) Not very. There are a few things it does differently because it's evaluated by the DLR instead of the CPython interpreter but you're unlikely to run into them, for the most part*. The main problems you're going to run into are that IronPython doesn't have the complete Python standard library and can't use anything that depends on a C extension. You can solve the first problem by simply copying any native Python library straight into IronPython's Lib directory or just including it in your project like a normal library.

B) https://docs.python.org/2/ will work just fine with the caveat that not everything they document might be available out of the box as I explained earlier.

* some libraries might depend on CPython behavior rather than just C extensions and not work when run under IronPython - like Werkzeug (last I checked), so Flask and Bottle are out, unfortunately, assuming nobody's patched them since I last tried a few years ago.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Cloud9 just reminded me they exist the other day and I was pleasantly surprised at how slick it all was for an IDE running in a browser.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Dominoes posted:

Database question. Using Django, but it's more of a general ORM question. I'm looking to store data that can have one of a few set choices. It seems like a common way of doing this is by using integers, since they take up less space compared to writing the data out for each row. Is there a clean way to map these integers to their names cleanly in an ORM like Django's or SQLAlchemy?

Do databases intelligently stores charfields to avoid repetition?

It seems like this article provides a solution, although its focus is more about ways to restrict choices than data storage; it uses integer for choices as its example. Also: It has a simple solution using Django's choices kwarg, and comments about Python not having enum. Maybe that, with 3.4's enum would be an ideal solution.

You should probably have a table that stores the choices and use foreign keys in whatever table they're related to to reference them. That way you can keep your data (available choices) out of your logic layer and adding or removing one becomes much simpler.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Ghost of Reagan Past posted:

Suppose I wanted to run a function for a specified time period (it gathers data from the Internet and prints it to a file). I can do
Python code:
t = time.time() + 86400
while t > time.time():
    # DO THINGS
to run it for a day, but there has to be a better way to do this.

Python code:
import datetime
import time

dayFromNow = datetime.datetime.now() + datetime.timedelta(days=1)
secondsBetweenRuns = 1 #use a float to sleep for less than a second

while datetime.datetime.now() < dayFromNow:
    #do stuff here
    time.sleep(secondsBetweenRuns) #I'm assuming you don't want it to run constantly
If you want it to run less than every minute or so, just set up a cron job instead of rolling your own.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



sharktamer posted:

Is there a favourite user agent parser? All the ones I've googled for seem to be slightly out of date, they don't seem to detect windows 10 and microsoft edge for example.

http://www.useragentstring.com/pages/api.php

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Rime posted:

So this launches a window which displays the correct @ character, but the window then immediately freezes up and must be terminated. Any idea why? Windows 10, Python 2.7.

code:
#snip

You probably need to call libtcod.console_wait_for_keypress(True) at the end of the loop. Right now it looks like it's just drawing an @ over and over again as quickly as it can.

Look at https://kooneiform.wordpress.com/2009/03/29/241/ for an example

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



SurgicalOntologist posted:

...and it might be marginally faster using generators, with less indexing.

Python code:
horses.sort()
smallest_diff = min(second - first for first, second in zip(horses, horses[1:]))
and izip if it's python 2, I think.

Wouldn't you want to use islice instead of regular slicing for a throwaway list or is the interpreter somehow smart enough to not make a new list object in that case?

(yes I understand the underlying values aren't copied, but the list object would be and I don't see any reason to copy the object when it's just going to be iterated once and thrown away)

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Fergus Mac Roich posted:

I don't know how codingame works but my takeaway would be that it's a lot easier to track this stuff down if you can code it so that you can use a debugger(mocking the external data) :v:

The only feedback you get from their UI is the output of an assert like 'Error: expected "a" but found "b"' but you can do the equivalent of good old printf debugging, though they'll truncate output :rolleyes:

It's really frustrating when you pass all the sample test cases but fail submission cases, which are intentionally different to prevent you from cheating.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Functions work like stored procedures in SQL*. You pass in arguments (or not) and they do something that's hopefully related to their name and return a value (or not).

*I think you said in the general thread that you had some SQL experience, if not ignore that sentence I guess

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Ekster posted:

I'm new to Python and I just finished writing my first assembler for an exercise, complete with symbol lookup table. I've never done (serious) text processing before so I feel like I've learned a lot. Here's a simplified version of what I did to get the necessary tokens:

code:
for line in file:
	tokenList = re.split(expression, line)
	tokenList = list(filter(None, tokenList))
	token = tokenList[position]


Everything works fine but I was wondering if there's perhaps a way to skip the filter step? Using re.split() generates a lot of empty entries at different locations depending on the line I feed it.

What's the expression look like? You can probably add capture groups to make getting just the parts you want as easy as tokenList.group('name') as seen in https://docs.python.org/3.4/library/re.html#re.match.group

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



QuarkJets posted:

Likewise, why should the people reviewing the results give a poo poo about the underlying code? When I'm reading a research paper I almost never want to see the underlying code. If I want to reproduce the results of a specific project then I should be writing my own code to do that in order to prevent biases from creeping in, so looking at someone else's notebook is the last thing that I should ever do.

They might want to throw their own data at your code?

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



pmchem posted:

...why can't they do that with a regular Python script? There's nothing magical about ipynb that allows people to provide different inputs and regenerate output that can't be done with regular scripts/modules/IDEs.

QJ was arguing against sharing code at all, not about how it's shared and that bugs me purely from a consumer of scientific output standpoint because it's a great way to accidentally hide errors that affect results.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



flosofl posted:

But that's *not* what he's saying? Unless I completely missed something elsewhere, he even mentioned having the code available for review to make sure the methodologies are appropriate and correct. He's just saying he doesn't see the point of a Notebook at all and it's sloppier than using a formally coded program.

Yeah, maybe I was reading that paragraph in isolation rather than proper context, but that's what it seemed like at the time :shrug:

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



SurgicalOntologist posted:

I used Chrome dev tools to find the POST that generates the data, but it includes a whole lot of json. I tried copy-pasting it from Chrome and keep getting JavaScript errors. I still need to try copy-pasting the headers as well. But I'm wondering if it would be possible to use dryscrape or a similar tool to get the POST payload that gets sent when I click the "download" button. Then I send that payload through requests.

In the Netowrk tab of the Chrome dev tools, you can copy the request as a cURL command which will include the headers. You can also just copy the headers to see what's being sent and incorporate that into a urllib request.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



dreesemonkey posted:

Tomorrow I'm going to try and get it to write the results to a database via ODBC maybe and see if I can setup the multiple products in an array to run it looped in one script.

Consider using https://docs.python.org/3.5/library/sqlite3.html It's a great starter database for small to medium sized applications and it comes with Python.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



dreesemonkey posted:

I have an existing database platform that is my day job and it would make it super easy for me to manipulate the data/create reports/charts/etc. But I'll keep it in mind.

You're probably good, then. I thought maybe you were talking about using ODBC with Access and figured I'd point out a good alternative.

Adbot
ADBOT LOVES YOU

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Melian Dialogue posted:

When I type something that has a paranthesis, frequently it autofills in the closing paranthesis. However, when Im typing once I get to that paranthesis, I can't just hit space or enter to move past it, I have to reach across my keyboard and use the arrow buttons. Is there some sort of shortcut (like alt or tab or something) where it lets me skip the word?

Sometimes editors that do that will skip the auto-generated right-paren with... right-paren. What's the point? Dunno! That's why I usually turn that useless bullshit off.

  • Locked thread