|
QuarkJets posted:Are all instances writing to the same output file, as described (foo.txt)? That would be a problem. Use temporary working directories to solve that. Nope, but it turned out that every single invocation of the program (of which there were hundreds) insisted on writing a small log file with the same name. Which, together with working in a temp directory which was an NFS share, and some combination of NFS client and server version, made things go to heck. So, Python did exactly what it was supposed to.
|
# ? Dec 20, 2021 16:51 |
|
|
# ? May 28, 2024 15:26 |
|
Oof. I wonder if there's a way to virtualize the root path without resorting to a docker container (although that's probably the right move anyway, containerizing)
|
# ? Dec 20, 2021 17:51 |
|
Hey, I wanted to come back and thank this thread for pointing me to better tools, although my environment is still a complete mess. So far I got rid of the Anaconda environment, and installed miniforge instead. I'm using mamba instead of conda in the base environment, and now my system isn't trying to link against x86 libraries in loaded from anaconda anymore. However, before being able to pip install uwsgi, I had to specify LD_FLAGS to manually tell it where to load one library. There's a long running Github issue about this: https://github.com/unbit/uwsgi/issues/2361 . I hope that somebody with a detailed understanding of how setuptools works can understand why the setup.py is able to find some libraries installed through homebrew and manually pass them to the linker, but not others. My honeymoon with miniforge / mamba also ended when I couldn't get jupyter notebooks installed just by doing mamba install jupyter, and had to hack in a symlink to make the newer version of a library appear to be an older one: https://github.com/conda/conda/issues/9038. My overall impression of anaconda / miniforge / conda / mamba is that this is all just a barely working duct-taped together disaster.
|
# ? Dec 20, 2021 21:02 |
|
It's a pretty outstanding ecoysystem for Windows and Linux, but lol Mac
|
# ? Dec 21, 2021 09:31 |
|
Is there a way to get the url of the currently active firefox tab in python? I've been looking for hours, but can't find anything. All answers say to use Selenium, but either they or I don't understand how it works. As far as i can tell you can only start an instance of a browser and control that, but you can't control other instances. You used to be able to get it from a json file in your profile, but apparently they moved to a proprietary format. Oh and these stackoverflow scrapers are driving me crazy, what a bunch of assholes ruining my search results.
|
# ? Jan 1, 2022 11:53 |
|
uguu posted:Is there a way to get the url of the currently active firefox tab in python? My experience with Selenium suggests you're right. Selenium allows you to make a new browser window and then do things to it. It can't (as far as I know) interact with pre-existing windows.
|
# ? Jan 1, 2022 13:09 |
|
From my quick search it looks like you can reconnect to a browser session using selenium. You'd have to get the session information from the existing Firefox window, patch it into the reconnect function and then go from there. Have you explored this yet? Not sure what is involved in getting the existing session information.
|
# ? Jan 1, 2022 16:06 |
|
a dingus posted:From my quick search it looks like you can reconnect to a browser session using selenium. You'd have to get the session information from the existing Firefox window, patch it into the reconnect function and then go from there. Have you explored this yet? Not sure what is involved in getting the existing session information. I missed that, I'll look into it, thanks dingus! Edit: looks like it's to reconnect to a selenium session. A request to add this functionality was denied. https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/18#issuecomment-191402419 uguu fucked around with this message at 17:26 on Jan 1, 2022 |
# ? Jan 1, 2022 17:14 |
|
I was looking at this: https://stackoverflow.com/questions/8344776/can-selenium-interact-with-an-existing-browser-session it looks like an older answer, hopefully that works for what you're looking to do Edit* looks like it's doing the same thing as above
|
# ? Jan 1, 2022 17:33 |
|
What's the use case? Is it possible to maybe run an extension in Firefox that would listen on a port locally, then reply back with the open tab's information in response to a request? Or is relying on such a thing being present not possible?
|
# ? Jan 1, 2022 22:29 |
|
Yeah, the way selenium interacts with the browser seems to rely on debugger functions that aren't generally on - you have to launch chrome specifically to allow Selenium to control it, for example. More info on the use case can help because I might be doing something similar.
|
# ? Jan 2, 2022 02:42 |
|
Epsilon Plus posted:What's the use case? Is it possible to maybe run an extension in Firefox that would listen on a port locally, then reply back with the open tab's information in response to a request? Or is relying on such a thing being present not possible? That's a good idea and possible. I made a program, a script really, which copies text to a folder. I highlight some text, press windows-q, a popup appears and I enter a twoletter code to select the folder. If I copy text from a browser I'd like it to add the URL automatically. It can kind of detect whether I'm copying from a browser ('firefox' in windowname), but getting the URL is an ordeal. Edit: an other suggestion was to read the JSON session file. I dismissed it at first because Mozilla moved to a propietary json4lz format. But I found this https://gist.github.com/Tblue/62ff47bef7f894e92ed5 which should unpack it. I'll try that today. Edit2: I got it working! I'll throw it up on GitHub in case it's useful to you. Pretty Jacky though. uguu fucked around with this message at 07:53 on Jan 2, 2022 |
# ? Jan 2, 2022 05:12 |
|
uguu posted:That's a good idea and possible. I wonder if a greasemonkey script might be able to do that. It ain't python though.
|
# ? Jan 2, 2022 09:01 |
|
I don't know anything about Greasemonkey (and very little about Python), but this does the trick. https://github.com/PaaEl/TextOrganiser list_fftabs returns a dictionary with tab names and urls and I compare that with the name of the current active window.
|
# ? Jan 2, 2022 11:01 |
|
Could use some help with Enums:Python code:
I'm using the value of a few Enums in my code and I'm getting tired of typing .value at the end of everything, plus it looks gross punished milkman fucked around with this message at 22:14 on Jan 5, 2022 |
# ? Jan 5, 2022 21:59 |
|
punished milkman posted:Could use some help with Enums: I don't think so, but I think it's pretty rare that I actually need to access the values of an enum directly (I might pass in a value to get its enumerated name, e.g. Foo('bar')). Are you sure that you want an enum and not a dataclass?
|
# ? Jan 5, 2022 22:15 |
|
I have never been super fond of enums in general, especially not how they work in python, but you can put them into a dataclassPython code:
Python code:
|
# ? Jan 5, 2022 22:21 |
|
I use dataclasses all the time but for some reason I thought an Enum was more appropriate… I’m just trying to maintain a central source of truth for keys referenced throughout the codebase. Sounds like I should just scrap the Enums and use dataclasses even more!
|
# ? Jan 5, 2022 22:36 |
|
punished milkman posted:I use dataclasses all the time but for some reason I thought an Enum was more appropriate… I’m just trying to maintain a central source of truth for keys referenced throughout the codebase. Sounds like I should just scrap the Enums and use dataclasses even more! It's possible that they are, it's just hard to tell without knowing more about your specific situation. That does kind of sound like something I'd want to use an Enum for, but I usually use Enums for their names, not their values. Those names are hashable and can be used as keys directly: Python code:
|
# ? Jan 5, 2022 23:39 |
|
I'm working on a personal project that uses BeautifulSoup to parse a webpage, pulls out data of interest, etc. I'm trying to write pytest tests and realizing the way I've constructed the class is likely going to require me to repeatedly hammer the relevant webserver which is going to add a lot of time and also some amount of bandwidth for the server I'm hitting (non-profit research organization). My thinking is that I should reformulate the main class I'm using to optionally take file contents from passed to it by the test and just ignore the url if it's present. Grossly simplified, what I have now is page_ripper and I'm thinking of changing it to be like page_ripper2, but I'm pretty new to automated testing and wanted to get some input if there's a more elegant way to do what I'm proposing? Python code:
Python code:
|
# ? Jan 6, 2022 06:11 |
|
BeastOfExmoor posted:I'm working on a personal project that uses BeautifulSoup to parse a webpage, pulls out data of interest, etc. I'm trying to write pytest tests and realizing the way I've constructed the class is likely going to require me to repeatedly hammer the relevant webserver which is going to add a lot of time and also some amount of bandwidth for the server I'm hitting (non-profit research organization). page_ripper2 is ugly, it's really 2 different classes smooshed together. It would be much cleaner to just have a class that uses BeautifulSoup to do stuff with "html_data". This is more concise and breaks fewer cardinal sins. Class names should also not be snake_case, see PEP8. I also can almost guarantee that __init__ is not returning a bool, so your annotations are whack Python code:
Python code:
|
# ? Jan 6, 2022 06:52 |
|
QuarkJets posted:page_ripper2 is ugly, it's really 2 different classes smooshed together. It would be much cleaner to just have a class that uses BeautifulSoup to do stuff with "html_data". This is more concise and breaks fewer cardinal sins. Class names should also not be snake_case, see PEP8. I also can almost guarantee that __init__ is not returning a bool, so your annotations are whack Yea, sorry I threw that code together really quick just to distill my idea down and missed some PEP8 stuff, etc. I'm writing this as part of a library (mostly as practice, since its unlikely anyone else will use it) and trying to make it as user friendly as possible so I wanted to avoid having the end user worry about several lines of code (or a nested function call) to populate the PageRipper object, but your logic makes sense. The second example is exactly what I needed. I couldn't quite figure out how to make the Enum stuff work for this case, but that does it.
|
# ? Jan 6, 2022 07:58 |
|
I understand the appeal of a one-size-fits-all design, but it's better to write classes and functions that do straightforward things. This makes your code self-documenting and easier to understand, and it will actually make your project easier to use. Take your original implementation of page_ripper2: it's unclear that one parameter invalidates the other without actually reading the code (or perhaps a docstring), and a user may think that it's necessary to populate both parameters when really they just need one or the other. They may assume that passing a URL will cause a request to be made even if an html string is passed in, but that is not the case. There are a lot of other potential pitfalls with this kind of design. Even if you write really good docstrings to explain what's going, it's better if the signature makes that obvious from the get-go Most of what I'm talking about is explained in Clean Code by RC Martin, which is a really excellent primer on writing better code. In your case, I think you should have a class that parses HTML text into attributes, and a function that accepts a URL and returns an instance of that class. Optionally, as a convenience, have a function that accepts HTML text data as input and returns an instance of that class as well. Using descriptive names and type annotations, anyone can then come along and immediately understand what these pieces are doing. This also makes it much easier to write good tests.
|
# ? Jan 6, 2022 10:06 |
|
a thing that might be nice there is for the url fetcher to be a classmethod:Python code:
|
# ? Jan 6, 2022 13:26 |
|
BeastOfExmoor posted:I'm working on a personal project that uses BeautifulSoup to parse a webpage, pulls out data of interest, etc. I'm trying to write pytest tests and realizing the way I've constructed the class is likely going to require me to repeatedly hammer the relevant webserver which is going to add a lot of time and also some amount of bandwidth for the server I'm hitting (non-profit research organization). If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain.
|
# ? Jan 6, 2022 15:03 |
|
Wallet posted:If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain. I am not certain but I think pytest fixtures will even do the caching for you automatically if you set the scope appropriately (or maybe that's not even necessary, I don't remember)
|
# ? Jan 6, 2022 22:37 |
|
QuarkJets posted:I understand the appeal of a one-size-fits-all design, but it's better to write classes and functions that do straightforward things. This makes your code self-documenting and easier to understand, and it will actually make your project easier to use. Phobeste posted:a thing that might be nice there is for the url fetcher to be a classmethod: Wallet posted:If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain. QuarkJets posted:I am not certain but I think pytest fixtures will even do the caching for you automatically if you set the scope appropriately (or maybe that's not even necessary, I don't remember) Thank you all, this is extraordinarily helpful. Classmethods were completely off my radar, but that definitely seems like a much better option. I'll also have to look and see if I can figure out the caching thing. I started reading through Clean Code as well.
|
# ? Jan 7, 2022 07:33 |
|
Wallet posted:If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain. I have a related question to this. I’m working on a project that involves using the boto3 API to describe EC2 instance details and analyze the results. I have a bunch of fake dictionaries meant to structure the response and really don’t like it. I like the idea of actually querying the API and storing the response, and using the data for many of my tests. How can I do this in a way that doesn’t require me to query the API every time I run my unit tests? Would the fixture not run every time in run pytest? Or do I want it to run every time I run pytest so if AWS changes their API schema we can detect that on a PR? At what point does that become an integration test as opposed to plain unit tests? An additional downside of this is that I also need to have a valid instance_id to query too, which is an additional dependency. I’m not entirely sure how to make the correct tradeoffs here.
|
# ? Jan 7, 2022 18:51 |
|
The rule of thumb here is that you aren't writing the AWS API, so you shouldn't unit test it, it's fine to use a mock response. As part of upgrading boto3 versions you would check to see if your stored mock responses are invalid, which will be fairly obvious because the new boto3 release notes notes will have a section on breaking changes (if any). Python specifically has the moto package for mocking AWS services in this manner.
|
# ? Jan 7, 2022 19:20 |
|
If you are parsing AWS data in some way, write a json that contains some actual AWS data and load it with a fixture.
|
# ? Jan 7, 2022 22:28 |
|
QuarkJets posted:If you are parsing AWS data in some way, write a json that contains some actual AWS data and load it with a fixture. This ended up being exactly what I ended up doing. The additional benefit is that we can store json in source control as well. I've yet to add it as a fixture, just putting in directly into the test function for now, but I'll fix that once I ask the following silly question. I'm very new to testing, and have a basic organizational question. What are the principals behind organizing tests into classes and class methods? for example, I have the following: Python code:
Should I just put all my dozen or so tests into a single class? Strictly divide class by function? Not really sure why I should be making the decisions I'm making here.
|
# ? Jan 7, 2022 23:03 |
|
That's an old school way of organizing tests, I like pytest a lot more than unittest and I define most if not all of my tests as functions. The advantage of using a class is that you can use __init__ to get the same initialization steps for a bunch of tests, but fixtures do that for you too. You can also share state between tests, but I view that as a downside (it goes against the entire concept of a "unit" test)
|
# ? Jan 7, 2022 23:28 |
|
setUp() and tearDown() are the recommended ways to share (de-) initialisation stages between tests in unittest, but I agree with just using pytest - apart from anything else it requires less typing (also it can run unittest tests, so no need to rewrite existing tests)
|
# ? Jan 7, 2022 23:40 |
|
QuarkJets posted:That's an old school way of organizing tests, I like pytest a lot more than unittest and I define most if not all of my tests as functions. The advantage of using a class is that you can use __init__ to get the same initialization steps for a bunch of tests, but fixtures do that for you too. You can also share state between tests, but I view that as a downside (it goes against the entire concept of a "unit" test) This is what I do. It's cleaner that way, and you can easily add tags to your tests if you have enough of them that you will often only want to run a subset of them at once. The Iron Rose posted:This ended up being exactly what I ended up doing. The additional benefit is that we can store json in source control as well. For this I basically just have a function that loads test data from the folder I store json test data in based on a file name, and then in the files with the actual tests for a given set of tests that needs to load some particular bit of data I add a scoped fixture that calls that load function (and then loads the data into the database or whatever I need it to do). Not sure how all that would work using unittest instead of pytest.
|
# ? Jan 8, 2022 00:28 |
|
i mean I think at this point moving to pytest is highly recommended, simple to do, and brings lots of perks :P Thanks for the advice folks!
|
# ? Jan 8, 2022 00:34 |
|
Hell yeah And while we're on the topic of project upgrades, who here is still using setup.py? Cause you don't need that poo poo, and in fact setuptools discourages having one.
|
# ? Jan 8, 2022 01:11 |
|
The Iron Rose posted:i mean I think at this point moving to pytest is highly recommended, simple to do, and brings lots of perks :P this was/is not simple ahhh I could use some further advice here I think, largely because I don't understand modules and packaging well enough yet, but whatever. I've now spent 3-4 hours trying to redo my tests and it's a little frustrating, so I am asking for more help! Currently I have a directory structure like so: code:
also i know i should just be using the serverless application model at this point which would solve this and other problems but I learned about it a week into the project and don't want to re-write all my infrastructure. E: i hate rubber ducking. i think i figured this out, if I use python -m pytest instead of pytest it works fine, per https://doc.pytest.org/en/latest/explanation/goodpractices.html I do have another follow up though... Any advice on automating this? I'm using github actions for my CI/CD. I want to ideally have it so we run test_evaluate_instance when I make changes to src/evaluate_instance, and run test_stop_instance.py when I make changes to src/stop_instance. I think I can use markers for this like so: code:
e2: I love rubber ducking, `python -m pytest -k "name_of_file" is my friend. The Iron Rose fucked around with this message at 22:38 on Jan 8, 2022 |
# ? Jan 8, 2022 22:14 |
|
__init__.py designates a directory as a module. It can be empty but can also be used to organize your API, e.g. if you want a function deep within your module tree to be importable from a shallower level, you would do this by importing that function from __init__.py at whatever level you desire. You may be at the point where you should figure out how to install your package; it's really nice being able to just execute 'pytest' from your root directory to run everything. I don't have any familiarity with the github CI tools, but it's nice to have a pipeline that runs your tests whenever you create any pull request; that should be easy to set up. All of the tests, in your case, not just specific ones.
|
# ? Jan 9, 2022 03:59 |
|
On the topic of unit tests, are there any good books or sites on how to retrofit an existing codebase with unit tests, especially one that's reasonably large (around 8k SLoC) and involves a lot of external calls? I know that generally it's going to boil down to 'chunk it apart and start working on pieces' but it'd be nice if someone had already written some stuff from the perspective that I'm going into. Most of the stuff I've read highly leans into test driven development, which is a good paradigm, but I'm working on an existing product I didn't start on. We have a backlog item to write some unit tests later, and I'm interested in doing that.
|
# ? Jan 9, 2022 07:00 |
|
|
# ? May 28, 2024 15:26 |
|
Sounds like Working Effectively with Legacy Code by Michael Feathers might help. It doesn't use Python but I think the advice should be generally applicable.
|
# ? Jan 9, 2022 20:51 |