|
What are people's recommendations for modules to scape information from and then interact with a website? I know beautifulsoup is very popular but theres a lot of others that get mentioned as well. Application is creating a bot to first scrape tabulated info, then analyse & implement the best moves to make in an online management game.
|
# ? Oct 16, 2015 21:20 |
|
|
# ? May 8, 2024 07:33 |
I've been working on a simple script using Selenium that I'm still working on. Selenium can parse HTML and interact with webpage elements, though if you're using a webpage from 2003 you might run into issues with frames. I should probably use Beautiful Soup for my script more, but I'm using Selenium to log in to a page using user-provided credentials, navigate to another page, and then identify the first <tr> that matches a specific color and select the adjacent multiple-select box. From there it reads from specific files based on the content of one of the <td> elements and interacts with drop down menus to select the value from the file. Then it cycles until a specific message appears on the page. Or at least it'll do it when I get around to finishing the script instead of doing the task manually. I'm still working on it (and a novice) but you can look for elements by ID, xpath, class, and so on. It's pretty handy.
|
|
# ? Oct 16, 2015 21:32 |
|
QuarkJets posted:Joining on an empty string is kind of pointless, really, because strings are immutable: in Python you can never put more or less stuff into a string, you can only create a new string. Do you mean code:
|
# ? Oct 17, 2015 03:25 |
|
I usually use a combination of requests and Beautiful Soup to download and scrape information from websites. Not sure if there is anything newer that has supplanted this. For working with tabular data I use pandas.
|
# ? Oct 17, 2015 03:26 |
|
fritz posted:Do you mean I meant: code:
|
# ? Oct 17, 2015 04:38 |
|
Not actually a Python question, more of a restructured text question but: is there some facility in ReST to optionally show / hide content? Wider context: I'm writing a Xmas pub quiz and naturally thought that it would be easy to do it all in ReST, including answers and then produce printouts with and without the answers. Seems an obvious use case, and the various slides program do something similar with notes, but I can't find any prior art.
|
# ? Oct 19, 2015 13:50 |
|
Zero Gravitas posted:What are people's recommendations for modules to scape information from and then interact with a website? I know beautifulsoup is very popular but theres a lot of others that get mentioned as well. BSoup bogs down on giant tables (like 20M+) but otherwise works great.
|
# ? Oct 19, 2015 16:48 |
|
I'm trying to explore and understand pandas and numpy. i have a data set that looks like this:code:
|
# ? Oct 20, 2015 02:39 |
|
Hughmoris posted:I'm trying to explore and understand pandas and numpy. i have a data set that looks like this: Something like slice the dataframe with df[df['UNIT']==unit].mean() as long as mean() works with times like that. Also you can use .query() but I don't know how to use .query() because it causes some Exception in the underlying parsing library...I should try to resolve that problem.
|
# ? Oct 20, 2015 02:47 |
|
In this post I am going to complain about a python module that fails at doing the one thing it's supposed to do: Advanced Python Scheduler. - instead of storing the schedule (and other config) in a human readable format, it serializes everything to bytes so you cannot read it with your own eyes, or use things like SQL "update" to change the schedule once it has been saved. - the only way of changing the schedule of a job is to use the reschedule_job() method, however that method always throws an AttributeError, essentially meaning once the schedule has been made, ya can't change it. - for reasons unknown, the scheduler quit executing jobs, which I could live with, but it never raised an exception, and only logged a warning. The specific message given was "Run time of job <blah> next run at: <blah> was missed by <a few seconds>". The VM could have been starved for resources, or maybe there was an NTP glitch or something, I dunno. There does not seem to be any consensus about what could cause it. I ended up dropping and recreating the jobs, and also modified the job creation code to use a 30 second grace period. People, I beg you, before you create your applications, spend a bit of time vetting the modules you are going to use. My vetting process is: - install - read docs and do "hello world" - try out a few obvious error cases - try some of the logic I will need in my app If I can't get through those steps quickly and without major drama, or I find other issues like those mentioned above, I don't use the module. It's better to vet first and use quality stuff in production rather than have it blow up or stop working right when some manager "NEEDS IT RIGHT NOW".
|
# ? Oct 20, 2015 02:52 |
|
Hughmoris posted:I'm trying to explore and understand pandas and numpy. i have a data set that looks like this: Phone posting, but get discharge delay into numeric units and then use group by: df.groupby("unit", sort=True)["discharge_delay"].mean() This will find the mean discharge delay for each unit in the data. You can then take the series that's returned and easily plot a bar graph in matplotlib or the the built in pandas functions.
|
# ? Oct 20, 2015 03:53 |
|
Like 50% of posts on the first site of http://stackoverflow.com right now are about Python.
|
# ? Oct 22, 2015 20:56 |
|
My Rhythmic Crotch posted:In this post I am going to complain about a python module that fails at doing the one thing it's supposed to do: Advanced Python Scheduler. add_job() with replace_existing=True will allow you to modify the job on the fly, including the schedule. not sure why modify_job() and reschedule_job() don't work this way but whatever!
|
# ? Oct 22, 2015 22:00 |
|
Is there a smarter way to achieve this:code:
|
# ? Oct 23, 2015 23:59 |
|
I think that's fine, except you should say 'v is not None' instead of 'v != None'
|
# ? Oct 24, 2015 04:32 |
|
Cingulate posted:Is there a smarter way to achieve this: Instead of defining a function, you could define a generator. If the exception is raised then you simply wouldn't yield anything, otherwise you'd yield the (key, value) pair. This would let you define one dictionary instead of two
|
# ? Oct 24, 2015 05:38 |
|
QuarkJets posted:Instead of defining a function, you could define a generator. If the exception is raised then you simply wouldn't yield anything, otherwise you'd yield the (key, value) pair. This would let you define one dictionary instead of two code:
|
# ? Oct 24, 2015 12:33 |
|
You could do something like:code:
|
# ? Oct 24, 2015 15:46 |
|
Your loop should go inside the generator. Personally, I would factor out the exception-skipping from the actual logic of the function. Python code:
|
# ? Oct 24, 2015 15:51 |
|
SurgicalOntologist posted:Your loop should go inside the generator. I like this one. Defining the exception that you catch as a function input kind of makes me frown but I don't really know why
|
# ? Oct 24, 2015 21:05 |
|
I'd just use the original version. Or probably write out the for loop explicitly. Just because comprehensions exist doesn't mean you need to use them.
|
# ? Oct 24, 2015 21:09 |
|
Nippashish posted:I'd just use the original version. Or probably write out the for loop explicitly. Just because comprehensions exist doesn't mean you need to use them. I'm grateful for all the proposals, I'm not sure I can apply any of them directly, but they're all showing me stuff I hadn't thought of that'll be useful otherwise. Note, I'm actually running the function inside joblib so it may all be a bit more problematic.
|
# ? Oct 24, 2015 21:14 |
|
What's wrong with the comprehension here? It seems pretty clean.
|
# ? Oct 24, 2015 22:09 |
|
It's two, and they are fairly redundant.
|
# ? Oct 24, 2015 22:36 |
|
Cingulate posted:It's two, and they are fairly redundant. Yeah there are two, but until there's a nice equivalent in Python for something like Scala's flatmap, I think what you have is clean and readable. First you map your values to the function outputs, then you filter out the None's.
|
# ? Oct 24, 2015 22:44 |
|
You could do something likePython code:
|
# ? Oct 25, 2015 00:12 |
|
KernelSlanders posted:What's wrong with the comprehension here? It seems pretty clean. There's nothing wrong with the original pair of comprehensions, it's clean and totally good code. But the thread was asked for alternatives, and that's fun, so we're coming up with alternatives
|
# ? Oct 25, 2015 01:00 |
|
Is there a painless way of moving my primary (3.4) anaconda environment to 3.5? I have a huge bunch of conda packages installed, some from binstar.
|
# ? Oct 26, 2015 01:20 |
|
Cingulate posted:Is there a painless way of moving my primary (3.4) anaconda environment to 3.5? I have a huge bunch of conda packages installed, some from binstar. won't conda update --all do it? EDIT: No it won't. Sorry I thought they had released 3.5 and that's what you were asking. Proteus Jones fucked around with this message at 02:06 on Oct 26, 2015 |
# ? Oct 26, 2015 02:04 |
|
conda install python=3.5.* worked for me. But that may have been a bad idea because now conda update --all doesn't work, so I suspect there are some incompatibilities present in the packages I already have installed.
|
# ? Oct 26, 2015 03:30 |
|
I don't think there's a way to do what you ask. You can create a Python 3.5 virtual environment with this:code:
|
# ? Oct 26, 2015 05:18 |
|
Oh well.
|
# ? Oct 26, 2015 09:57 |
|
Two more beginner questions, should anybody find the time to go through them ... What's the difference between "&" and "and"? I know I can only use one of them for certain series stuff, but I don't get the principled reason. I often use list comps to filter. But some terseness is lost when unpacking it to a regular for loop. Is there some magic hack I'm not seeing to make this code:
code:
|
# ? Oct 27, 2015 12:39 |
|
The operator and is the builtin logical operator you should probably be using unless you're dealing with numpy arrays. For the loop, you could pre-filter with a generator or list comprehension, although I don't know that it's any clearer than your second example. Python code:
Python code:
|
# ? Oct 27, 2015 14:22 |
|
Cingulate posted:Two more beginner questions, should anybody find the time to go through them ... & is a bitwise operation. It means convert the two operands into sequences of bits, and return the integer value that results from &ing the sequences together. Or at least, for integers that's what it does, and in principle if you implement it for your own classes you don't have to return an int, you can return absolutely anything you want and it doesn't have to even have anything to do with bitwise operations. But at least for integers it's bitwise and. Type this in and try it for some pairs of integers: code:
"and" and "or" on the other hand are logical operators. You should use & and | when you really mean to do operations on bits (which is a numerical calculation rather than a logical proposition), and you should use "and" and "or" when you really mean to express logical conjunction and disjunction ("I want to add VAT if the shipping address is in Texas AND Venus is rising in Capricorn"). To do otherwise is failing to effectively communicate what your code is meant to be doing.
|
# ? Oct 27, 2015 14:25 |
|
Ah okay thanks guys. I indeed operate a lot on numpy array, or at least stuff that is very similar (pandas series).
|
# ? Oct 27, 2015 14:55 |
|
I'm reasonably certain that internally Series objects use numpy arrays. I use bit-wise and all the time with indexing on Series, say you have something stored in a series with a datetime index and you want the average within a time slice: code:
|
# ? Oct 28, 2015 04:05 |
|
If you're just comparing to the index may as well slice:Python code:
|
# ? Oct 28, 2015 04:27 |
|
Emacs Headroom posted:I'm reasonably certain that internally Series objects use numpy arrays. This is an example of what I was talking about where whatever these object types are have had the behaviour of & customised to do something the library creator thought would be useful. Now, I don't know anything about this library, but it strikes me as bad. I would prefer to see code that relies less on the reader being intimately familiar with how the & and >= operators are implemented for this library's data types, at the cost of most likely being more verbose.
|
# ? Oct 28, 2015 11:28 |
|
|
# ? May 8, 2024 07:33 |
|
Hammerite posted:This is an example of what I was talking about where whatever these object types are have had the behaviour of & customised to do something the library creator thought would be useful. Now, I don't know anything about this library, but it strikes me as bad. I would prefer to see code that relies less on the reader being intimately familiar with how the & and >= operators are implemented for this library's data types, at the cost of most likely being more verbose.
|
# ? Oct 28, 2015 14:03 |