Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
bolind
Jun 19, 2005



Pillbug

QuarkJets posted:

Are all instances writing to the same output file, as described (foo.txt)? That would be a problem. Use temporary working directories to solve that.

Nope, but it turned out that every single invocation of the program (of which there were hundreds) insisted on writing a small log file with the same name.

Which, together with working in a temp directory which was an NFS share, and some combination of NFS client and server version, made things go to heck.

So, Python did exactly what it was supposed to.

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

Oof. I wonder if there's a way to virtualize the root path without resorting to a docker container (although that's probably the right move anyway, containerizing)

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Hey, I wanted to come back and thank this thread for pointing me to better tools, although my environment is still a complete mess.

So far I got rid of the Anaconda environment, and installed miniforge instead. I'm using mamba instead of conda in the base environment, and now my system isn't trying to link against x86 libraries in loaded from anaconda anymore.

However, before being able to pip install uwsgi, I had to specify LD_FLAGS to manually tell it where to load one library. There's a long running Github issue about this: https://github.com/unbit/uwsgi/issues/2361 . I hope that somebody with a detailed understanding of how setuptools works can understand why the setup.py is able to find some libraries installed through homebrew and manually pass them to the linker, but not others.

My honeymoon with miniforge / mamba also ended when I couldn't get jupyter notebooks installed just by doing mamba install jupyter, and had to hack in a symlink to make the newer version of a library appear to be an older one: https://github.com/conda/conda/issues/9038.

My overall impression of anaconda / miniforge / conda / mamba is that this is all just a barely working duct-taped together disaster.

QuarkJets
Sep 8, 2008

It's a pretty outstanding ecoysystem for Windows and Linux, but lol Mac

uguu
Mar 9, 2014

Is there a way to get the url of the currently active firefox tab in python?
I've been looking for hours, but can't find anything.
All answers say to use Selenium, but either they or I don't understand how it works.
As far as i can tell you can only start an instance of a browser and control that, but you can't control other instances.
You used to be able to get it from a json file in your profile, but apparently they moved to a proprietary format.
Oh and these stackoverflow scrapers are driving me crazy, what a bunch of assholes ruining my search results.

Sad Panda
Sep 22, 2004

I'm a Sad Panda.

uguu posted:

Is there a way to get the url of the currently active firefox tab in python?
I've been looking for hours, but can't find anything.
All answers say to use Selenium, but either they or I don't understand how it works.
As far as i can tell you can only start an instance of a browser and control that, but you can't control other instances.
You used to be able to get it from a json file in your profile, but apparently they moved to a proprietary format.
Oh and these stackoverflow scrapers are driving me crazy, what a bunch of assholes ruining my search results.

My experience with Selenium suggests you're right. Selenium allows you to make a new browser window and then do things to it. It can't (as far as I know) interact with pre-existing windows.

a dingus
Mar 22, 2008

Rhetorical questions only
Fun Shoe
From my quick search it looks like you can reconnect to a browser session using selenium. You'd have to get the session information from the existing Firefox window, patch it into the reconnect function and then go from there. Have you explored this yet? Not sure what is involved in getting the existing session information.

uguu
Mar 9, 2014

a dingus posted:

From my quick search it looks like you can reconnect to a browser session using selenium. You'd have to get the session information from the existing Firefox window, patch it into the reconnect function and then go from there. Have you explored this yet? Not sure what is involved in getting the existing session information.

I missed that, I'll look into it, thanks dingus!

Edit: looks like it's to reconnect to a selenium session.
A request to add this functionality was denied.
https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/18#issuecomment-191402419

uguu fucked around with this message at 17:26 on Jan 1, 2022

a dingus
Mar 22, 2008

Rhetorical questions only
Fun Shoe
I was looking at this: https://stackoverflow.com/questions/8344776/can-selenium-interact-with-an-existing-browser-session it looks like an older answer, hopefully that works for what you're looking to do

Edit* looks like it's doing the same thing as above :(

death cob for cutie
Dec 30, 2006

dwarves won't delve no more
too much splatting down on Zot:4
What's the use case? Is it possible to maybe run an extension in Firefox that would listen on a port locally, then reply back with the open tab's information in response to a request? Or is relying on such a thing being present not possible?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Yeah, the way selenium interacts with the browser seems to rely on debugger functions that aren't generally on - you have to launch chrome specifically to allow Selenium to control it, for example. More info on the use case can help because I might be doing something similar.

uguu
Mar 9, 2014

Epsilon Plus posted:

What's the use case? Is it possible to maybe run an extension in Firefox that would listen on a port locally, then reply back with the open tab's information in response to a request? Or is relying on such a thing being present not possible?

That's a good idea and possible.

I made a program, a script really, which copies text to a folder. I highlight some text, press windows-q, a popup appears and I enter a twoletter code to select the folder.

If I copy text from a browser I'd like it to add the URL automatically. It can kind of detect whether I'm copying from a browser ('firefox' in windowname), but getting the URL is an ordeal.

Edit: an other suggestion was to read the JSON session file. I dismissed it at first because Mozilla moved to a propietary json4lz format. But I found this
https://gist.github.com/Tblue/62ff47bef7f894e92ed5
which should unpack it. I'll try that today.

Edit2: I got it working! I'll throw it up on GitHub in case it's useful to you. Pretty Jacky though.

uguu fucked around with this message at 07:53 on Jan 2, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

uguu posted:

That's a good idea and possible.

I made a program, a script really, which copies text to a folder. I highlight some text, press windows-q, a popup appears and I enter a twoletter code to select the folder.

If I copy text from a browser I'd like it to add the URL automatically. It can kind of detect whether I'm copying from a browser ('firefox' in windowname), but getting the URL is an ordeal.

Edit: an other suggestion was to read the JSON session file. I dismissed it at first because Mozilla moved to a propietary json4lz format. But I found this
https://gist.github.com/Tblue/62ff47bef7f894e92ed5
which should unpack it. I'll try that today.

Edit2: I got it working! I'll throw it up on GitHub in case it's useful to you. Pretty Jacky though.

I wonder if a greasemonkey script might be able to do that. It ain't python though.

uguu
Mar 9, 2014

I don't know anything about Greasemonkey (and very little about Python), but this does the trick.
https://github.com/PaaEl/TextOrganiser
list_fftabs returns a dictionary with tab names and urls and I compare that with the name of the current active window.

punished milkman
Dec 5, 2018

would have won
Could use some help with Enums:

Python code:
from enum import Enum

class Foo(Enum):
    BAR = "bar"

Foo.BAR
Is there something I can override so that in the interactive console, when I call `Foo.BAR`, it returns the value from `Foo.BAR.value` (“bar”) instead of `<Foo.BAR: "bar">`?

I'm using the value of a few Enums in my code and I'm getting tired of typing .value at the end of everything, plus it looks gross

punished milkman fucked around with this message at 22:14 on Jan 5, 2022

QuarkJets
Sep 8, 2008

punished milkman posted:

Could use some help with Enums:

Python code:
from enum import Enum

class Foo(Enum):
    BAR = "bar"

Foo.BAR
Is there something I can override so that in the interactive console, when I call `Foo.BAR`, it returns `Foo.BAR.value` instead of `<Foo.BAR: "bar">`?

I'm using the value of a few Enums in my code and I'm getting tired of typing .value at the end of everything, plus it looks gross

I don't think so, but I think it's pretty rare that I actually need to access the values of an enum directly (I might pass in a value to get its enumerated name, e.g. Foo('bar')). Are you sure that you want an enum and not a dataclass?

12 rats tied together
Sep 7, 2006

I have never been super fond of enums in general, especially not how they work in python, but you can put them into a dataclass

Python code:
>>> from dataclasses import dataclass
>>> @dataclass
... class Foo:
...     BAR: str = "bar"
... 
>>> Foo.BAR
'bar'
this is handy if your enums are really just a deduplication of effort for typing in some constant values. since python doesn't really have constant values, you could also just cram them into a module, and then import it

Python code:
# foo.py
BAR = "bar"

# some_other_file.py
import foo

foo.BAR

punished milkman
Dec 5, 2018

would have won
I use dataclasses all the time but for some reason I thought an Enum was more appropriate… I’m just trying to maintain a central source of truth for keys referenced throughout the codebase. Sounds like I should just scrap the Enums and use dataclasses even more!

QuarkJets
Sep 8, 2008

punished milkman posted:

I use dataclasses all the time but for some reason I thought an Enum was more appropriate… I’m just trying to maintain a central source of truth for keys referenced throughout the codebase. Sounds like I should just scrap the Enums and use dataclasses even more!

It's possible that they are, it's just hard to tell without knowing more about your specific situation. That does kind of sound like something I'd want to use an Enum for, but I usually use Enums for their names, not their values. Those names are hashable and can be used as keys directly:

Python code:
from enum import Enum, auto
class Farts(Enum):
	silent = auto()
	squeaker = auto()
	loud = auto()
	deafening = auto()

fart_culprits = {Farts.silent: ["John", "Sarah", "Melissa"],
    Farts.deafening: ["Eric"]}
print(fart_culprits[Farts.silent])

BeastOfExmoor
Aug 19, 2003

I will be gone, but not forever.
I'm working on a personal project that uses BeautifulSoup to parse a webpage, pulls out data of interest, etc. I'm trying to write pytest tests and realizing the way I've constructed the class is likely going to require me to repeatedly hammer the relevant webserver which is going to add a lot of time and also some amount of bandwidth for the server I'm hitting (non-profit research organization).

My thinking is that I should reformulate the main class I'm using to optionally take file contents from passed to it by the test and just ignore the url if it's present.

Grossly simplified, what I have now is page_ripper and I'm thinking of changing it to be like page_ripper2, but I'm pretty new to automated testing and wanted to get some input if there's a more elegant way to do what I'm proposing?

Python code:
import requests
from bs4 import BeautifulSoup

class page_ripper():
    def __init__(self, url: str) -> bool:
        response = requests.get(url)
        soup = BeautifulSoup(response.text)
        # Do some stuff


class page_ripper2():
    def __init__(self, url: str, html_data: bytes = None) -> bool:
        if html_data:
            soup = BeautifulSoup(html_data)
        else:
            response = requests.get(url)
            soup = BeautifulSoup(response.text)
        # Do some stuff
Secondly, this code takes some parameters and I'd like to limit what can be passed to only specific options. Right now I'm essentially doing something like below, but I feel like there's probably a much smarter way. Feels like something with enum or data classes might be able to handle this with simpler code, but so far everything I've tried hasn't seemed to handle it any better.

Python code:
class another_example():
    def __init__(self, url: str, period: str = "year") -> None:
        if period not in ["year", "month", "day"]:
            raise ValueError("Error, invalid period")

QuarkJets
Sep 8, 2008

BeastOfExmoor posted:

I'm working on a personal project that uses BeautifulSoup to parse a webpage, pulls out data of interest, etc. I'm trying to write pytest tests and realizing the way I've constructed the class is likely going to require me to repeatedly hammer the relevant webserver which is going to add a lot of time and also some amount of bandwidth for the server I'm hitting (non-profit research organization).

My thinking is that I should reformulate the main class I'm using to optionally take file contents from passed to it by the test and just ignore the url if it's present.

Grossly simplified, what I have now is page_ripper and I'm thinking of changing it to be like page_ripper2, but I'm pretty new to automated testing and wanted to get some input if there's a more elegant way to do what I'm proposing?

Python code:
import requests
from bs4 import BeautifulSoup

class page_ripper():
    def __init__(self, url: str) -> bool:
        response = requests.get(url)
        soup = BeautifulSoup(response.text)
        # Do some stuff


class page_ripper2():
    def __init__(self, url: str, html_data: bytes = None) -> bool:
        if html_data:
            soup = BeautifulSoup(html_data)
        else:
            response = requests.get(url)
            soup = BeautifulSoup(response.text)
        # Do some stuff
Secondly, this code takes some parameters and I'd like to limit what can be passed to only specific options. Right now I'm essentially doing something like below, but I feel like there's probably a much smarter way. Feels like something with enum or data classes might be able to handle this with simpler code, but so far everything I've tried hasn't seemed to handle it any better.

Python code:
class another_example():
    def __init__(self, url: str, period: str = "year") -> None:
        if period not in ["year", "month", "day"]:
            raise ValueError("Error, invalid period")

page_ripper2 is ugly, it's really 2 different classes smooshed together. It would be much cleaner to just have a class that uses BeautifulSoup to do stuff with "html_data". This is more concise and breaks fewer cardinal sins. Class names should also not be snake_case, see PEP8. I also can almost guarantee that __init__ is not returning a bool, so your annotations are whack

Python code:
class PageRipper:
    def __init__(self, html_data: bytes):
        soup = BeautifulSoup(html_data)
        # Do some stuff

def some_function(url):
        response = requests.get(url)
        page_ripper = PageRipper(response.text)
        # Do something with the PageRipper instance
For your second question, that approach is fine. Flipping over to an enum could be better if you have a lot of these checks or if you need to iterate over the enumerated types for some reason. You can do value lookups with enums, and you'll raise a ValueError explicitly if the provided value does not match any of the values in the enum:

Python code:
class TimePeriod(Enum):
    year = "year"
    month = "month"
    day = "day"

class another_example():
    def __init__(self, url: str, period_name: str = "year"):
        period = TimePeriod(period_name)  # Exception may occur here

BeastOfExmoor
Aug 19, 2003

I will be gone, but not forever.

QuarkJets posted:

page_ripper2 is ugly, it's really 2 different classes smooshed together. It would be much cleaner to just have a class that uses BeautifulSoup to do stuff with "html_data". This is more concise and breaks fewer cardinal sins. Class names should also not be snake_case, see PEP8. I also can almost guarantee that __init__ is not returning a bool, so your annotations are whack

Python code:
class PageRipper:
    def __init__(self, html_data: bytes):
        soup = BeautifulSoup(html_data)
        # Do some stuff

def some_function(url):
        response = requests.get(url)
        page_ripper = PageRipper(response.text)
        # Do something with the PageRipper instance
For your second question, that approach is fine. Flipping over to an enum could be better if you have a lot of these checks or if you need to iterate over the enumerated types for some reason. You can do value lookups with enums, and you'll raise a ValueError explicitly if the provided value does not match any of the values in the enum:

Python code:
class TimePeriod(Enum):
    year = "year"
    month = "month"
    day = "day"

class another_example():
    def __init__(self, url: str, period_name: str = "year"):
        period = TimePeriod(period_name)  # Exception may occur here

Yea, sorry I threw that code together really quick just to distill my idea down and missed some PEP8 stuff, etc.

I'm writing this as part of a library (mostly as practice, since its unlikely anyone else will use it) and trying to make it as user friendly as possible so I wanted to avoid having the end user worry about several lines of code (or a nested function call) to populate the PageRipper object, but your logic makes sense. The second example is exactly what I needed. I couldn't quite figure out how to make the Enum stuff work for this case, but that does it.

QuarkJets
Sep 8, 2008

I understand the appeal of a one-size-fits-all design, but it's better to write classes and functions that do straightforward things. This makes your code self-documenting and easier to understand, and it will actually make your project easier to use.

Take your original implementation of page_ripper2: it's unclear that one parameter invalidates the other without actually reading the code (or perhaps a docstring), and a user may think that it's necessary to populate both parameters when really they just need one or the other. They may assume that passing a URL will cause a request to be made even if an html string is passed in, but that is not the case. There are a lot of other potential pitfalls with this kind of design. Even if you write really good docstrings to explain what's going, it's better if the signature makes that obvious from the get-go

Most of what I'm talking about is explained in Clean Code by RC Martin, which is a really excellent primer on writing better code.

In your case, I think you should have a class that parses HTML text into attributes, and a function that accepts a URL and returns an instance of that class. Optionally, as a convenience, have a function that accepts HTML text data as input and returns an instance of that class as well. Using descriptive names and type annotations, anyone can then come along and immediately understand what these pieces are doing. This also makes it much easier to write good tests.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
a thing that might be nice there is for the url fetcher to be a classmethod:

Python code:
class PageRipper:
    def __init__(self, html_data: bytes) -> None:
        """" Build a pageripper. In most circumstances, rather than building
        this class directly, use from_url."""
        soup = BeautifulSoup(html_data)
        # ... do stuff ...

    @classmethod
    def from_url(cls, url: bytes) -> 'PageRipper':
        response = requests.get(url)
        return cls(response.text)
but yes in general you'll want to separate the concerns of 1) getting the content 2) parsing the content and probably 3) doing any processing based on the content to separate, independent chunks of code that can be tested individually.

Wallet
Jun 19, 2006

BeastOfExmoor posted:

I'm working on a personal project that uses BeautifulSoup to parse a webpage, pulls out data of interest, etc. I'm trying to write pytest tests and realizing the way I've constructed the class is likely going to require me to repeatedly hammer the relevant webserver which is going to add a lot of time and also some amount of bandwidth for the server I'm hitting (non-profit research organization).

My thinking is that I should reformulate the main class I'm using to optionally take file contents from passed to it by the test and just ignore the url if it's present.

If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain.

QuarkJets
Sep 8, 2008

Wallet posted:

If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain.

I am not certain but I think pytest fixtures will even do the caching for you automatically if you set the scope appropriately (or maybe that's not even necessary, I don't remember)

BeastOfExmoor
Aug 19, 2003

I will be gone, but not forever.

QuarkJets posted:

I understand the appeal of a one-size-fits-all design, but it's better to write classes and functions that do straightforward things. This makes your code self-documenting and easier to understand, and it will actually make your project easier to use.

Take your original implementation of page_ripper2: it's unclear that one parameter invalidates the other without actually reading the code (or perhaps a docstring), and a user may think that it's necessary to populate both parameters when really they just need one or the other. They may assume that passing a URL will cause a request to be made even if an html string is passed in, but that is not the case. There are a lot of other potential pitfalls with this kind of design. Even if you write really good docstrings to explain what's going, it's better if the signature makes that obvious from the get-go

Most of what I'm talking about is explained in Clean Code by RC Martin, which is a really excellent primer on writing better code.

In your case, I think you should have a class that parses HTML text into attributes, and a function that accepts a URL and returns an instance of that class. Optionally, as a convenience, have a function that accepts HTML text data as input and returns an instance of that class as well. Using descriptive names and type annotations, anyone can then come along and immediately understand what these pieces are doing. This also makes it much easier to write good tests.

Phobeste posted:

a thing that might be nice there is for the url fetcher to be a classmethod:

Python code:
class PageRipper:
    def __init__(self, html_data: bytes) -> None:
        """" Build a pageripper. In most circumstances, rather than building
        this class directly, use from_url."""
        soup = BeautifulSoup(html_data)
        # ... do stuff ...

    @classmethod
    def from_url(cls, url: bytes) -> 'PageRipper':
        response = requests.get(url)
        return cls(response.text)
but yes in general you'll want to separate the concerns of 1) getting the content 2) parsing the content and probably 3) doing any processing based on the content to separate, independent chunks of code that can be tested individually.

Wallet posted:

If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain.

QuarkJets posted:

I am not certain but I think pytest fixtures will even do the caching for you automatically if you set the scope appropriately (or maybe that's not even necessary, I don't remember)

Thank you all, this is extraordinarily helpful. Classmethods were completely off my radar, but that definitely seems like a much better option. I'll also have to look and see if I can figure out the caching thing. I started reading through Clean Code as well.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Wallet posted:

If you're going to do it from a file I'd try to avoid having to create/maintain a fake set of data for testing for a whole bunch of reasons and instead create a test fixture that caches a local version of all of the relevant content and a timestamp (could just be dumped into a json file or whatever) so you can smash it with all the tests you want and not worry about maintaining it in the future. That may already be what you're thinking, but if not it will save you some pain.

I have a related question to this. I’m working on a project that involves using the boto3 API to describe EC2 instance details and analyze the results. I have a bunch of fake dictionaries meant to structure the response and really don’t like it. I like the idea of actually querying the API and storing the response, and using the data for many of my tests.

How can I do this in a way that doesn’t require me to query the API every time I run my unit tests? Would the fixture not run every time in run pytest? Or do I want it to run every time I run pytest so if AWS changes their API schema we can detect that on a PR? At what point does that become an integration test as opposed to plain unit tests? An additional downside of this is that I also need to have a valid instance_id to query too, which is an additional dependency. I’m not entirely sure how to make the correct tradeoffs here.

12 rats tied together
Sep 7, 2006

The rule of thumb here is that you aren't writing the AWS API, so you shouldn't unit test it, it's fine to use a mock response. As part of upgrading boto3 versions you would check to see if your stored mock responses are invalid, which will be fairly obvious because the new boto3 release notes notes will have a section on breaking changes (if any).

Python specifically has the moto package for mocking AWS services in this manner.

QuarkJets
Sep 8, 2008

If you are parsing AWS data in some way, write a json that contains some actual AWS data and load it with a fixture.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

QuarkJets posted:

If you are parsing AWS data in some way, write a json that contains some actual AWS data and load it with a fixture.

This ended up being exactly what I ended up doing. The additional benefit is that we can store json in source control as well.

I've yet to add it as a fixture, just putting in directly into the test function for now, but I'll fix that once I ask the following silly question.

I'm very new to testing, and have a basic organizational question. What are the principals behind organizing tests into classes and class methods?

for example, I have the following:

Python code:
class TestTags(unittest.TestCase):
    
    def test_check_tags_excluded(self):
        ...
        self.assertDictEqual(result, expected_result)

    def test_check_tags_not_excluded(self):
        ...
        self.assertDictEqual(result, expected_result)

    def test_check_tags_not_exist(self):
        ...
        self.assertDictEqual(result, expected_result)

class TestSecurityGroupFunctions(unittest.TestCase):
    def test_has_ssh_open(self):
    ...
class TestSecurityGroupFull(unitest.TestCase):
    @mock.patch('boto3.client')
    def test_check_security_groups(self, mock_boto_client):
    ...
I don't love the organization of this. It was intuitive to do a separate class for each discrete function in my code... but then TestSecurityGroupFull calls the function check_security_groups, which in turn some of the functions tested individually in TestSecurityGroupFunctions, thus rendering the entire thing inconsistent and thus bad.

Should I just put all my dozen or so tests into a single class? Strictly divide class by function? Not really sure why I should be making the decisions I'm making here.

QuarkJets
Sep 8, 2008

That's an old school way of organizing tests, I like pytest a lot more than unittest and I define most if not all of my tests as functions. The advantage of using a class is that you can use __init__ to get the same initialization steps for a bunch of tests, but fixtures do that for you too. You can also share state between tests, but I view that as a downside (it goes against the entire concept of a "unit" test)

DoctorTristan
Mar 11, 2006

I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?
setUp() and tearDown() are the recommended ways to share (de-) initialisation stages between tests in unittest, but I agree with just using pytest - apart from anything else it requires less typing (also it can run unittest tests, so no need to rewrite existing tests)

Wallet
Jun 19, 2006

QuarkJets posted:

That's an old school way of organizing tests, I like pytest a lot more than unittest and I define most if not all of my tests as functions. The advantage of using a class is that you can use __init__ to get the same initialization steps for a bunch of tests, but fixtures do that for you too. You can also share state between tests, but I view that as a downside (it goes against the entire concept of a "unit" test)

This is what I do. It's cleaner that way, and you can easily add tags to your tests if you have enough of them that you will often only want to run a subset of them at once.

The Iron Rose posted:

This ended up being exactly what I ended up doing. The additional benefit is that we can store json in source control as well.

I've yet to add it as a fixture, just putting in directly into the test function for now, but I'll fix that once I ask the following silly question.

For this I basically just have a function that loads test data from the folder I store json test data in based on a file name, and then in the files with the actual tests for a given set of tests that needs to load some particular bit of data I add a scoped fixture that calls that load function (and then loads the data into the database or whatever I need it to do). Not sure how all that would work using unittest instead of pytest.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
i mean I think at this point moving to pytest is highly recommended, simple to do, and brings lots of perks :P


Thanks for the advice folks!

QuarkJets
Sep 8, 2008

Hell yeah

And while we're on the topic of project upgrades, who here is still using setup.py? Cause you don't need that poo poo, and in fact setuptools discourages having one.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

The Iron Rose posted:

i mean I think at this point moving to pytest is highly recommended, simple to do, and brings lots of perks :P


Thanks for the advice folks!

this was/is not simple ahhh

I could use some further advice here I think, largely because I don't understand modules and packaging well enough yet, but whatever. I've now spent 3-4 hours trying to redo my tests and it's a little frustrating, so I am asking for more help!

Currently I have a directory structure like so:

code:
.
--src
---evaluate_instance
----evaluate_instance.py
---stop_instance
----stop_instance.py
--tests
---test_evaluate_instance.py
---test_stop_instances.py
--terraform
---a.tf
---b.tf
Assuming I want to keep all of my tests and all of my python code in a single file, how do I properly import evaluate_instance.py into my test file? I have a few __init__.py files scattered around my src and subdirectories, but I don't really know what it's for other than that it's an older way of doing packages.


also i know i should just be using the serverless application model at this point which would solve this and other problems but I learned about it a week into the project and don't want to re-write all my infrastructure.


E: i hate rubber ducking. i think i figured this out, if I use python -m pytest instead of pytest it works fine, per https://doc.pytest.org/en/latest/explanation/goodpractices.html

I do have another follow up though...

Any advice on automating this? I'm using github actions for my CI/CD. I want to ideally have it so we run test_evaluate_instance when I make changes to src/evaluate_instance, and run test_stop_instance.py when I make changes to src/stop_instance.

I think I can use markers for this like so:
code:
@pytest.mark.mytest
def test_conf():
    assert True
but it seems weird and incorrect to do that for every single function in a file.

e2: I love rubber ducking, `python -m pytest -k "name_of_file" is my friend.

The Iron Rose fucked around with this message at 22:38 on Jan 8, 2022

QuarkJets
Sep 8, 2008

__init__.py designates a directory as a module. It can be empty but can also be used to organize your API, e.g. if you want a function deep within your module tree to be importable from a shallower level, you would do this by importing that function from __init__.py at whatever level you desire.

You may be at the point where you should figure out how to install your package; it's really nice being able to just execute 'pytest' from your root directory to run everything.

I don't have any familiarity with the github CI tools, but it's nice to have a pipeline that runs your tests whenever you create any pull request; that should be easy to set up. All of the tests, in your case, not just specific ones.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
On the topic of unit tests, are there any good books or sites on how to retrofit an existing codebase with unit tests, especially one that's reasonably large (around 8k SLoC) and involves a lot of external calls? I know that generally it's going to boil down to 'chunk it apart and start working on pieces' but it'd be nice if someone had already written some stuff from the perspective that I'm going into.

Most of the stuff I've read highly leans into test driven development, which is a good paradigm, but I'm working on an existing product I didn't start on. We have a backlog item to write some unit tests later, and I'm interested in doing that.

Adbot
ADBOT LOVES YOU

qsvui
Aug 23, 2003
some crazy thing
Sounds like Working Effectively with Legacy Code by Michael Feathers might help. It doesn't use Python but I think the advice should be generally applicable.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply