Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Macichne Leainig: Jul 26, 2012; by VG

Not sure if this is better suited as a Python or general programming question, but I'll ask here since I'm using Python and pytesseract specifically.

Anyone have experience with OCR? I think I'm just misunderstanding something about Tesseract, but I'm at my wit's end trying to get this to work here.

I have a bunch of images like this - pre-processed, binary black and white images of numbers of interest:

I need to extract these numbers out of here. I know it's possible because if I drop this image into the web demo of Tesseract here, it picks it up fine:

This is my call to PyTesseract, and I do not get any usable results out of it. I've tried other PSM modes as well. I have Tesseract v5.0.0-alpha.20210811 installed locally. I figured PSM 5 should be ideal because it's described as "a single uniform block of vertically aligned text," which this is, is it not?

Python code:

pytesseract.image_to_string(Image.open(f"results/{basename}_ocr.jpg"), config='digits --psm 5')

I'm almost sure I have Tesseract configured incorrectly because as seen above, Tesseract can clearly handle these images - just not with any of the PSM modes I've tried (and I did try them all, just as a sanity test...).

# ? Sep 13, 2021 17:04

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 17:16

mr_package: Jun 13, 2000

How much should I worry about setting things in os.environ? There's a warning in the docs about memory leaks but it's unclear to me how much this can matter. The warning if you dig into it is about assigning values with different length/sizes. Is this really a problem in real world usage? https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/putenv.3.html

quote:

BUGS
Successive calls to setenv() or putenv() assigning a differently sized value to the same name will result in a memory leak. The FreeBSD semantics for these functions (namely, that the contents of value are copied and that old values remain accessible indefinitely) make this bug unavoidable. Future versions may eliminate one or both of these semantic guarantees in order to fix the bug.

I think in my case where I am making a dictionary copy and then passing it to subprocess.run is probably fine anyway (it should clean up when subprocess finishes, yes?)

code:

e = os.environ.copy()
e["DEVELOPER_DIR"] = "path/to/xcode/dir"
subprocess.run([python, "build_macos.py"], env=e)

It's not clear to me whether there's a workaround such as removing the value entirely: Python docs indicate unsetenv() is called when deleting from os.environ but nothing suggesting it is a workaround for this memory leak. But the explicit error condition is "successive calls to setenv() or putenv()" so it's kind of implied but I don't know enough about BSD guts to say.

I'm thinking we're talking about a few bytes per day so odds are it would take years to notice anyway..?

edit: someone please tell me how to generate that awesome linted "python code" quoted text above. I've seen it a few times but it's not in the documented PHPBB codes is it?

mr_package fucked around with this message at 19:05 on Sep 13, 2021

# ? Sep 13, 2021 19:02

Wallet: Jun 19, 2006

mr_package posted:

How much should I worry about setting things in os.environ?

I think something like this probably gets around the weirdness of copying the environment itself:

Python code:

subprocess.run([python, "build_macos.py"], env={**os.environ, "DEVELOPER_DIR": "path/to/xcode/dir"})

mr_package posted:

edit: someone please tell me how to generate that awesome linted "python code" quoted text above. I've seen it a few times but it's not in the documented PHPBB codes is it?

Instead of regular code tags you use code=python to get the highlights, you can see it if you quote someone using it.

Wallet fucked around with this message at 19:19 on Sep 13, 2021

# ? Sep 13, 2021 19:17

QuarkJets: Sep 8, 2008

mr_package posted:

How much should I worry about setting things in os.environ? There's a warning in the docs about memory leaks but it's unclear to me how much this can matter. The warning if you dig into it is about assigning values with different length/sizes. Is this really a problem in real world usage? https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/putenv.3.html

I think in my case where I am making a dictionary copy and then passing it to subprocess.run is probably fine anyway (it should clean up when subprocess finishes, yes?)
code:
e = os.environ.copy()
e["DEVELOPER_DIR"] = "path/to/xcode/dir"
subprocess.run([python, "build_macos.py"], env=e)
It's not clear to me whether there's a workaround such as removing the value entirely: Python docs indicate unsetenv() is called when deleting from os.environ but nothing suggesting it is a workaround for this memory leak. But the explicit error condition is "successive calls to setenv() or putenv()" so it's kind of implied but I don't know enough about BSD guts to say.

I'm thinking we're talking about a few bytes per day so odds are it would take years to notice anyway..?

edit: someone please tell me how to generate that awesome linted "python code" quoted text above. I've seen it a few times but it's not in the documented PHPBB codes is it?

The memory leak only applies to successive calls to setenv when trying to set the same key with differently sized values. In other words, the value persists even if the variable name is deleted or set to some other value with a different size. In practice, this is an insane edge case that doesn't have any meaningful impact; if you are repeatedly setting the env for a continuous process then you're doing something weird and there's almost certainly a better way. You are not doing that; each new process is getting its env set once, and that env should disappear along with the new process ending

# ? Sep 13, 2021 20:49

QuarkJets: Sep 8, 2008

Loezi posted:

I need to associate Things (below: strings) with float ranges, s.t. some of the ranges go to either negative or positive infinity. The "trivial" implementation would probably be something like this:
Python code:
thresholds = {
    "low": (float('-inf'), 0),
    "medium": (0, 10),
    "high": (10, float('inf'))
}

def find(x: float) -> str:
    for key, (lower_bound, upper_bound) in thresholds.items():
        if lower_bound <= x < upper_bound: 
            return key
But I dislike find having to know stuff about what I feel like are the internals of thresholds. I could, instead, do this with lambdas:
Python code:
thresholds = {
    "low": lambda x: x < 0,
    "medium": lambda x: 0 <= x < 10,
    "high": lambda x: 10 <= x
}

def find(x: float) -> str:
    for key, check in thresholds.items():
        if check(x): 
            return key
but I'm not thrilled about that either, because lambdas seem like a great recipe to ensure that I get some hard-to-debug bug later on.

Naturally, I could add a custom class, perhaps along the lines of following:
Python code:
class Bounds:
    def __init__(self, lower_bound, upper_bound) -> None:
        self.lower_bound = lower_bound if lower_bound is not None else float('-inf')
        self.upper_bound = upper_bound if upper_bound is not None else float('inf')

    def __contains__(self, value: float) -> bool:
        return self.lower_bound <= value < self.upper_bound

thresholds = {
    "low": Bounds(None, 0),
    "medium": Bounds(0, 10),
    "high": Bounds(10, None)
}

def find(x: float) -> str:
    for key, bounds in thresholds.items():
        if x in bounds: 
            return key
This feels like I'm making a custom implementation of range which I dislike immensely.

Is there some part of the stdlib that I'm missing here, that would make this pattern nice and concise without going all lambda?

This feels like something you should just use a numpy array for, using dtype='object' assuming you're dealing with actual objects and not just strings

code:


x = np.array(['medium' for _ in somearray], dtype='object') 
x[somearray < 0] = "low"
x[somearray > 10] = "high"

# ? Sep 13, 2021 21:01

Loezi: Dec 18, 2012; Never buy the cheap stuff

cinci zoo sniper posted:

PEP-636 if you�re on 3.10, otherwise I would subclass dictionary to implement range checking inside dictionary key.

Thanks, PEP636 seems worth keeping in mind. Playing around a bit with a dictionary subclass, this seems like a neat approach, allowing me to write code like this:

Python code:

class BoundMapping(dict):

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __getitem__(self, key):
        for (lb, up), value in self.items():
            if lb <= key < up:
                return value

    def __setitem__(self, key, val):
        key = (
            key[0] if key[0] is not None else float('-inf'),
            key[1] if key[1] is not None else float('inf')
        )
        dict.__setitem__(self, key, val)

    def update(self, *args, **kwargs):
            for k, v in dict(*args, **kwargs).items():
                self[k] = v


bound_mapping = BoundMapping({
    (None, 0): 'low',
    (0, 10): 'medium',
    (10, None): 'high',
})

def get_label(value):
    return bound_mapping[value]

The dict subclass is, naturally, quite fugly, but I like the fact that this hides all the nastiness in that one class def while the rest of the code is super clean :cheers:

QuarkJets posted:

This feels like something you should just use a numpy array for, using dtype='object' assuming you're dealing with actual objects and not just strings
code:
x = np.array(['medium' for _ in somearray], dtype='object') 
x[somearray < 0] = "low"
x[somearray > 10] = "high" 

Not too hot on adding a dependency to numpy just to do something as simple as this.

Loezi fucked around with this message at 10:44 on Sep 14, 2021

# ? Sep 14, 2021 10:38

Hed: Mar 31, 2004; Fun Shoe

Wow this caused me to look at 3.10 and the structural pattern matching looks neat.

Coming from doing a bit of Rust lately the lack of having to always declare a catch-all makes me nervous but could lead to some �interesting� uses.

# ? Sep 14, 2021 15:50

QuarkJets: Sep 8, 2008

Loezi posted:

Thanks, PEP636 seems worth keeping in mind. Playing around a bit with a dictionary subclass, this seems like a neat approach, allowing me to write code like this:
Python code:
class BoundMapping(dict):

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __getitem__(self, key):
        for (lb, up), value in self.items():
            if lb <= key < up:
                return value

    def __setitem__(self, key, val):
        key = (
            key[0] if key[0] is not None else float('-inf'),
            key[1] if key[1] is not None else float('inf')
        )
        dict.__setitem__(self, key, val)

    def update(self, *args, **kwargs):
            for k, v in dict(*args, **kwargs).items():
                self[k] = v


bound_mapping = BoundMapping({
    (None, 0): 'low',
    (0, 10): 'medium',
    (10, None): 'high',
})

def get_label(value):
    return bound_mapping[value]
The dict subclass is, naturally, quite fugly, but I like the fact that this hides all the nastiness in that one class def while the rest of the code is super clean

Not too hot on adding a dependency to numpy just to do something as simple as this.

If that's the case then I would use map:

Python code:


def label_mapper(x):
    if x < 0:
        return "low" 
    if x < 10:
        return "medium" 
    return "high" 

mapped = map(label_mapper, some_other_iterable)

# ? Sep 14, 2021 16:19

D34THROW: Jan 29, 2012; RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL

Hopefully this makes sense as I'm writing it, like it does in my head.

I have a QWidget window with two QLineEdit boxes and four QRadioButtons, grouped into two QButtonGroups of two each. I want to verify, before enabling the "Finish" button, that both QLineEdits are populated and both QButtonGroups have a checkedId() less than -1.

Where I'm running into issue is finding some sort of editingFinished or focusOut type event for a QButtonGroup. I need self.check_complete() to run on that event of the QButtonGroups.

Python code:

    # Focus-out checks for completeness.
    self.ui.line_width.editingFinished.connect(self.check_complete)
    self.ui.line_projection.editingFinished.connect(self.check_complete)
...
def check_complete(self):
    if (self.ui.grp_fasciasides.checkedId() < -1 and
            self.ui.grp_covers.checkedId() < -1 and
            self.ui.line_width.text() != "" and
            self.ui.line_projection.text() != ""):
        self.ui.btn_finish.setEnabled(True)
    else:
        self.ui.btn_finish.setEnabled(False)

EDIT: Never mind, clicked did the trick.

D34THROW fucked around with this message at 19:48 on Sep 14, 2021

# ? Sep 14, 2021 19:43

HappyHippo: Nov 19, 2003; Do you have an Air Miles Card?

Loezi posted:

I need to associate Things (below: strings) with float ranges, s.t. some of the ranges go to either negative or positive infinity. The "trivial" implementation would probably be something like this:
Python code:
thresholds = {
    "low": (float('-inf'), 0),
    "medium": (0, 10),
    "high": (10, float('inf'))
}

def find(x: float) -> str:
    for key, (lower_bound, upper_bound) in thresholds.items():
        if lower_bound <= x < upper_bound: 
            return key
But I dislike find having to know stuff about what I feel like are the internals of thresholds. I could, instead, do this with lambdas:
Python code:
thresholds = {
    "low": lambda x: x < 0,
    "medium": lambda x: 0 <= x < 10,
    "high": lambda x: 10 <= x
}

def find(x: float) -> str:
    for key, check in thresholds.items():
        if check(x): 
            return key
but I'm not thrilled about that either, because lambdas seem like a great recipe to ensure that I get some hard-to-debug bug later on.

Naturally, I could add a custom class, perhaps along the lines of following:
Python code:
class Bounds:
    def __init__(self, lower_bound, upper_bound) -> None:
        self.lower_bound = lower_bound if lower_bound is not None else float('-inf')
        self.upper_bound = upper_bound if upper_bound is not None else float('inf')

    def __contains__(self, value: float) -> bool:
        return self.lower_bound <= value < self.upper_bound

thresholds = {
    "low": Bounds(None, 0),
    "medium": Bounds(0, 10),
    "high": Bounds(10, None)
}

def find(x: float) -> str:
    for key, bounds in thresholds.items():
        if x in bounds: 
            return key
This feels like I'm making a custom implementation of range which I dislike immensely.

Is there some part of the stdlib that I'm missing here, that would make this pattern nice and concise without going all lambda?

I feel like you're overthinking this and the first solution is completely fine.

# ? Sep 14, 2021 21:06

Mycroft Holmes: Mar 26, 2010; by Azathoth

I'm attempting to program adding and removing items from an array for class. I'm throwing up an error when I attempt to grow the array. My array is [3,77,2,1,0] and I am attempting to add 88 at position 2. My insert code is this:

code:

   def insert(self, index, newItem):
        """Inserts item at index in the array."""
        # grow if array is full
        if index > self.logicalSize:
            self.grow()
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1
        else:
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1

What am I doing wrong? When i modify the code to this:

code:

  def insert(self, index, newItem):
        """Inserts item at index in the array."""
        # grow if array is full
        if index >= self.logicalSize:
            self.grow()
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1
        else:
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1

It grows before it is supposed to.

Mycroft Holmes fucked around with this message at 23:54 on Sep 14, 2021

# ? Sep 14, 2021 23:52

QuarkJets: Sep 8, 2008

Mycroft Holmes posted:

code:

   def insert(self, index, newItem):
        """Inserts item at index in the array."""
        # grow if array is full
        if index > self.logicalSize:
            self.grow()
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1
        else:
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1

What am I doing wrong? When i modify the code to this:

code:

  def insert(self, index, newItem):
        """Inserts item at index in the array."""
        # grow if array is full
        if index >= self.logicalSize:
            self.grow()
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1
        else:
            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]
            self.items[index] = newItem
            self.logicalSize += 1

It grows before it is supposed to.

You should post the specific exception, along with the lines that Python will tell you about when the exception is raised

But look at this:

Python code:

  

            for i in range(self.size(), index, -1):
                self.items[i] = self.items[i-1]

Presuming that size() is the length of your list (or whatever it is), you are immediately going out of bounds.

# ? Sep 15, 2021 02:08

QuarkJets: Sep 8, 2008

Also a lot of your code is repeated, you could do this:

code:

   def insert(self, index, newItem):
        """Inserts item at index in the array."""
        # grow if array is full
        while index > self.logicalSize:
            self.grow()

        for i in range(self.size(), index, -1):
            self.items[i] = self.items[i-1]
        self.items[index] = newItem
        self.logicalSize += 1

Same problem though, e.g. If the size is 5 then index 5 is out of bounds.

# ? Sep 15, 2021 02:13

Da Mott Man: Aug 3, 2012

Unless I'm missing something the easy way to do what you want is this. List manipulation functions are builtins.

Python code:

class Thing:
    def __init__(self):
        self.items = [3,77,2,1,0]

    def insert(self, index, new_item):
        self.items.insert(index, new_item)

    def remove_by_index(self, index):
        self.items.pop(index)

    def remove_by_value(self, value):
        self.items.remove(value)

object = Thing()

object.insert(2, 88)

print(object.items)

object.remove_by_index(2)

print(object.items)

object.remove_by_value(3)

print(object.items)

Da Mott Man fucked around with this message at 04:28 on Sep 15, 2021

# ? Sep 15, 2021 04:22

Loezi: Dec 18, 2012; Never buy the cheap stuff

HappyHippo posted:

I feel like you're overthinking this and the first solution is completely fine.

The toy example I've been using is naturally just that, a toy example. There's definitely value in hiding most of the logic re: processing the upper and lower bounds to a separate class in the actual thing I'm doing, rather than replicating that same logic in a billion places.

That being said, it might very well turn out that I was overthinking this in the long run. For now, I'm rather happy with the dictionary-based implementation for a "thing that represents potentially unbounded number ranges that I can query for membership, each range associated with a label"

# ? Sep 15, 2021 11:35

D34THROW: Jan 29, 2012; RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL

Okay, now I have a real question.

I have a main menu wherein the user can select one of the calculators, or exit the program. The feature QPushButtons are in a QButtonGroup.

I have a constant list declared at the top of my guiQt module, ENABLED_FEATURES, which contains boolean values to control, in the MainMenu class, whether or not each button is enabled.

Python code:

ENABLED_FEATURES = [True, False, True, False]
...
class MainMenu(QtWidgets.QDialog):
    def __init__(self):
    ...
    # Set enabled buttons.
    self.enable_by_feature()

    def enable_by_feature(self):
        # Set up the id for each button.
        self.ui.grp_features.setId(self.ui.button_panroof, 0)
        self.ui.grp_features.setId(self.ui.button_comproof, 1)
        self.ui.grp_features.setId(self.ui.button_stormprotsf, 2)
        self.ui.grp_features.setId(self.ui.button_stormpanel, 3)
        log_debug(f"")
        for button in self.ui.grp_features.findChildren(
            PyQt5.QtWidgets.QPushButton):
            button.setEnabled(ENABLED_FEATURES[button.id()])

This is the code as it stands now - the intent is to set the ID of each QPushButton in the QButtonGroup, then loop through the QButtonGroup by id and perform setEnabled() based on the id's position in the ENABLED_FEATURES list. However, it is not doing...jack poo poo. And the internet is little help.

Thinking about it a second time, the easier solution would be to make ENABLED_FEATURES a dict with button names as the keys and bools as the values. The goal is so that I can just add it to the dict at the top of the code rather than hard-coding it into enable_by_feature(), further down the class - probably with a default value of True if the button can't be found in the list.

# ? Sep 15, 2021 15:38

Wallet: Jun 19, 2006

D34THROW posted:

Thinking about it a second time, the easier solution would be to make ENABLED_FEATURES a dict with button names as the keys and bools as the values. The goal is so that I can just add it to the dict at the top of the code rather than hard-coding it into enable_by_feature(), further down the class - probably with a default value of True if the button can't be found in the list.

I can't speak to Qt much/at all but are you sure it's redrawing the buttons after you're disabling them?

Also, yeah, doing it by the index in a list of booleans without context is basically the same as just hard-coding the config values anyway. configparser is pretty quick to set up for this kind of thing to get them out of code entirely.

# ? Sep 15, 2021 15:51

AfricanBootyShine: Jan 9, 2006; Snake wins.

I have what I think is a really simple question with numPy. I finally have a job where I can actually use it for work, but now I've forgotten it all.

I have a csv that contains readings for a bunch of samples at different wavelengths. I've pasted an example portion of it below. Normally it'll go all the way down to 300 nm. But I've trimmed it for everyone's sanity.

code:

Baseline 100%T,,SampleOx,,SampleOx1,,SampleOx2,,SampleRed,,SampleRed1,,SampleRed2,
Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs
700,2.521076918,700,0.051371451,700,0.020255247,700,-0.000277047,700,-0.013994155,700,-0.040811472,700,-0.046730809
699,2.515768766,699,0.056336451,699,0.021696234,699,0.002584572,699,-0.014951141,699,-0.038384374,699,-0.042782523
698,2.51525569,698,0.054913107,698,0.020626975,698,0.005365098,698,-0.013208756,698,-0.039243225,698,-0.044276398
697,2.517320871,697,0.051321168,697,0.018043108,697,-0.001523819,697,-0.01844346,697,-0.039591964,697,-0.044799961
696,2.516803503,696,0.048457876,696,0.016133199,696,-0.003205611,696,-0.019673269,696,-0.042768963,696,-0.048874158

First row are the sample names, which are spaced because each sample contains two values: a wavelength and an absorbance reading. I want to make a 3D array so that I can easily pick the absorbance at 400 nm for SampleRed. What's the easiest way to feed this into a 3D array, but still retain info like the sample names and the wavelengths?

# ? Sep 17, 2021 14:46

AfricanBootyShine: Jan 9, 2006; Snake wins.

I have what I think is a really simple question with numPy. I finally have a job where I can actually use it for work, but it's been years since I did any real python work so I'm a bit lost.

I have a csv that contains readings for a bunch of samples at different wavelengths. I've pasted an example portion of it below. Normally it'll go all the way down to 300 nm. But I've trimmed it for everyone's sanity.

code:

Baseline 100%T,,SampleOx,,SampleOx1,,SampleOx2,,SampleRed,,SampleRed1,,SampleRed2,
Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs
700,2.521076918,700,0.051371451,700,0.020255247,700,-0.000277047,700,-0.013994155,700,-0.040811472,700,-0.046730809
699,2.515768766,699,0.056336451,699,0.021696234,699,0.002584572,699,-0.014951141,699,-0.038384374,699,-0.042782523
698,2.51525569,698,0.054913107,698,0.020626975,698,0.005365098,698,-0.013208756,698,-0.039243225,698,-0.044276398
697,2.517320871,697,0.051321168,697,0.018043108,697,-0.001523819,697,-0.01844346,697,-0.039591964,697,-0.044799961
696,2.516803503,696,0.048457876,696,0.016133199,696,-0.003205611,696,-0.019673269,696,-0.042768963,696,-0.048874158

# ? Sep 17, 2021 14:46

QuarkJets: Sep 8, 2008

That csv layout is janky as gently caress and I suspect that you will need to write something custom to deal with it. It feels like you want a pandas multiindex dataframe for this but I don't think that the pandas csv reader will be able to easily figure out the layout on its own

E: although first thing you should do is try the pandas csv reader and see what it does

QuarkJets fucked around with this message at 19:48 on Sep 17, 2021

# ? Sep 17, 2021 16:56

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

QuarkJets posted:

That csv layout is janky as gently caress and I suspect that you will need to write something custom to deal with it. It feels like you want a pandas multiindex dataframe for this but I don't think that the pandas csv reader will be able to easily figure out the layout on its own

E: although first thing you should do is try the pandas csv reader and see what it does

This looks pretty easy though, I'm not sure I fully understand the file layout. My first instinct is: ignore the first two lines of the file w/ pandas read_csv(), supply it with column headings. It looks like its one row, one data set.

e.g. columns: [Baseline 100%T,SampleOx_Wavelength (nm),SampleOx_Abs,and so on]

# ? Sep 17, 2021 20:53

QuarkJets: Sep 8, 2008

Yeah it's easy enough to create a 2D table with labels, I just don't think there's an obvious way to make one that's 3D without some twiddling

# ? Sep 17, 2021 20:59

DoctorTristan: Mar 11, 2006; I would look up into your lifeless eyes and wave, like this. Can you and your associates arrange that for me, Mr. Morden?

AfricanBootyShine posted:

I have what I think is a really simple question with numPy. I finally have a job where I can actually use it for work, but it's been years since I did any real python work so I'm a bit lost.

I have a csv that contains readings for a bunch of samples at different wavelengths. I've pasted an example portion of it below. Normally it'll go all the way down to 300 nm. But I've trimmed it for everyone's sanity.
code:
Baseline 100%T,,SampleOx,,SampleOx1,,SampleOx2,,SampleRed,,SampleRed1,,SampleRed2,
Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs,Wavelength (nm),Abs
700,2.521076918,700,0.051371451,700,0.020255247,700,-0.000277047,700,-0.013994155,700,-0.040811472,700,-0.046730809
699,2.515768766,699,0.056336451,699,0.021696234,699,0.002584572,699,-0.014951141,699,-0.038384374,699,-0.042782523
698,2.51525569,698,0.054913107,698,0.020626975,698,0.005365098,698,-0.013208756,698,-0.039243225,698,-0.044276398
697,2.517320871,697,0.051321168,697,0.018043108,697,-0.001523819,697,-0.01844346,697,-0.039591964,697,-0.044799961
696,2.516803503,696,0.048457876,696,0.016133199,696,-0.003205611,696,-0.019673269,696,-0.042768963,696,-0.048874158
First row are the sample names, which are spaced because each sample contains two values: a wavelength and an absorbance reading. I want to make a 3D array so that I can easily pick the absorbance at 400 nm for SampleRed. What's the easiest way to feed this into a 3D array, but still retain info like the sample names and the wavelengths?

I want to build this to be extensible, as I will be taking readings using this system for the next few years.

As posters above have commented, this is easy enough to turn into a pandas DataFrame via the DataFrame.read_csv() function. This is almost certainly what you actually want to do - a NumPy 3D array is going to be a lot more awkward to retrieve the correct data from.

Your data structure does look quite odd, though. Is there any reason you've arranged things as

code:

|    Baseline 100%T     |       SampleOx        |      SampleOx1        |   ...
| Wavelength (nm) | Abs | Wavelength (nm) | Abs | Wavelength (nm) | Abs |   ...
|       700       |     |       700       |     |       700       |     |   ...
|       699       |     |       699       |     |       699       |     |   ...
|       698       |     |       698       |     |       698       |     |   ...

All of the wavelength entries in each row seem identical, so wouldn't it make much more sense (and be much easier to work with) to arrange the data like this:

code:

| Wavelength (nm) | Baseline 100%T Abs | SampleOx Abs | SampleOx1 Abs | ...
|       700       |                    |              |               | ...
|       699       |                    |              |               | ...
|       698       |                    |              |               | ...

# ? Sep 17, 2021 21:23

Biffmotron: Jan 12, 2007

I'm going to third that this is a really weird data format. Unless there's a very good reason to do otherwise, data should be tidy where every row is a unique observation and each column is measurement of the same type across observations. If your data is tidy, analysis and plotting becomes much easier. If not, you're fighting the data at almost every step.

Original Data with slightly renamed columns

code:

	WL	Abs			WL1Abs.1			WL2Abs.2			WL3Abs.3			WL4Abs.4			WL5Abs.5			WL6Abs.6
0	700	2.52107691	700	0.051371451	700	0.020255247	700	-0.000277047	700	-0.013994155	700	-0.040811472	700	-0.046730809
1	699	2.51576876	699	0.056336450        699	0.021696234	699	0.002584572	699	-0.014951141	699	-0.038384374	699	-0.042782522
2	698	2.51525569	698	0.054913107	698	0.020626975	698	0.005365098	698	-0.013208756	698	-0.039243225	698	-0.044276398
3	697	2.51732087	697	0.051321168	697	0.018043108	697	-0.001523819	697	-0.018443460	697	-0.039591964	697	-0.044799961
4	696	2.51680350	696	0.048457876	696	0.016133198	696	-0.003205610	696	-0.019673269	696	-0.042768963	696	-0.048874158

Same data, but tidy

code:

	Sample		700_nm			699_nm			698_nm			697_nm			696_nm
0	Baseline 	2.521076918	2.515768766	2.51525569		2.517320871	2.516803503
1	SampleOx	0.051371451	0.056336450	0.054913107	0.051321168	0.048457876
2	SampleOx1	0.020255247	0.021696234	0.020626975	0.018043108	0.016133198999999997
3	SampleOx2	-0.000277047	0.002584572	0.005365098	-0.001523819	-0.0032056109999999997
4	SampleRed	-0.013994155	-0.014951141	-0.013208756	-0.01844346		-0.019673269
5	SampleRed1	-0.040811472	-0.038384374	-0.039243225	-0.039591964	-0.042768963
6	SampleRed2	-0.046730809	-0.042782522	-0.044276398	-0.044799961	-0.048874158

There are multiple ways to do this, but the one that I like is reading in the data starting with the second row as dfx, and then constructing a new dataframe from the columns. The dataframe is built from a dictionary with keys that become the column names, and then a list of cell entries as the values. The list of cell entries is created by selecting the Abs.* columns in each row from the raw data. This can be turned into a function that takes the names of the samples and the corresponding Abs.* columns as arguments.

Python code:

dfx = pd.read_csv('pth/your_file.csv', header=1)

dfy = pd.DataFrame({'Sample':['Baseline 100%T','SampleOx','SampleOx1','SampleOx2','SampleRed','SampleRed1','SampleRed2',],
					'700_nm':dfx[['Abs', 'Abs.1', 'Abs.2', 'Abs.3', 'Abs.4', 'Abs.5', 'Abs.6']].loc[0],
					'699_nm':dfx[['Abs', 'Abs.1', 'Abs.2', 'Abs.3', 'Abs.4', 'Abs.5', 'Abs.6']].loc[1],
					'698_nm':dfx[['Abs', 'Abs.1', 'Abs.2', 'Abs.3', 'Abs.4', 'Abs.5', 'Abs.6']].loc[2],
					'697_nm':dfx[['Abs', 'Abs.1', 'Abs.2', 'Abs.3', 'Abs.4', 'Abs.5', 'Abs.6']].loc[3],
					'696_nm':dfx[['Abs', 'Abs.1', 'Abs.2', 'Abs.3', 'Abs.4', 'Abs.5', 'Abs.6']].loc[4]}).reset_index(drop=True)

# ? Sep 17, 2021 22:01

AfricanBootyShine: Jan 9, 2006; Snake wins.

Thanks for all the help. Looks like I need to sit down with pandas for a few hours.

I agree that the format of the data is incredibly goofy. It's what the instrument spits out when data is exported, so tidying the dataset is something I'd like to write a script to automate.

The initial analysis is dead simple- I can do it in excel in ten minutes. But I also to do some deconvolution on the spectra, which requires some real tools.

# ? Sep 17, 2021 23:45

SurgicalOntologist: Jun 17, 2004

It sort of looks like a pandas MultiIndex as columns, except the labels aren't repeated. I would suggest to "manually" construct a MultiIndex for the column axis. They you can stack or similar to tidy the dataset.

If you really want to try the 3D thing, the library you want is xarray. But I don't think it would actually help reading the data, just manipulating it, depending on what you need to do. And it's probably overkill in this case, it really shines with data on a grid (eg volumetric).

# ? Sep 18, 2021 02:32

Dawncloack: Nov 26, 2007; ECKS DEE!; Nap Ghost

I have a question about python, but it's more of a strategic question than a specific doubt.

I'm slowly working towards the point where I can jump into computer touching if (when) my current career becomes a gently caress. I am following a little study plan gently provided by a goon, that reccomended I learn python as a scripting language.

I got a book and started working through it, and I am handy with the basics, to the point when I made myself a script that backups my files in a specific manner (tons of directory management and such, essentially a reimplementation of rsync). Now I'm working on parallelism.

In your opinion, what areas of python are important to learn, resume-wise? At what point can I say I know python without it being a massive lie?
Or perhaps that is the wrong question and I'm just going to be dropped in an unknown area and have to google stuff, so what areas are useful? I imagine parallelism and networking?

This, I bet, is a highly subjective question but I'd appreciate opinions.

Thanks in advance.

# ? Sep 18, 2021 13:00

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Dawncloack posted:

I got a book and started working through it, and I am handy with the basics, to the point when I made myself a script that backups my files in a specific manner (tons of directory management and such, essentially a reimplementation of rsync). Now I'm working on parallelism.

In your opinion, what areas of python are important to learn, resume-wise? At what point can I say I know python without it being a massive lie?
Or perhaps that is the wrong question and I'm just going to be dropped in an unknown area and have to google stuff, so what areas are useful? I imagine parallelism and networking?

This, I bet, is a highly subjective question but I'd appreciate opinions.

Thanks in advance.

Writing Python in a list of skills won�t provide any credibility that you can do work as a programmer. It�s not really the right question or method to build a resume.

You need to show that you can complete work and projects. Pick a thing you wish existed and build a project, deploy it and put the code on GitHub. Include your GitHub on your resume.

Some suggestions:
-Pick an open source Python package you like and ask the devs what the process to make changes and pull requests is. See if you can contribute a PR. This is something that can go on a resume.
-Pick a topic you like and deploy a flask/django/dash web app about that thing. Put the code on GitHub with a working demo. One which makes some API calls and does some business process. Also popular: Build an ML/AI project in a Jupyter notebook. Then, deploy the model you built with a web app.

Have working demos, it will put you above the many, many, many other just starting out Python coders without engineering degrees.

# ? Sep 18, 2021 14:04

Dawncloack: Nov 26, 2007; ECKS DEE!; Nap Ghost

I had no idea! Thanks a bunch!

# ? Sep 18, 2021 17:25

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Dawncloack posted:

I had no idea! Thanks a bunch!

You're welcome. Feel free to stop by the resume thread in BFC or the YOSPOS interviewing thread for advice from a broader audience. Several people, including myself, who hire computer touchers regularly post there.

Here is the breakdown of me hiring an entry-level python person late 2020 at a startup to give you an idea of competition. Going rate is $20-30/hr depending on job and just how entry level they are.

CarForumPoster posted:

Yep, single position in 3 weeks. Here was the breakdown from my 91 applicants for an entry-level python job.

I was very curious what the breakdown was since the advice I am leaning toward with strawberrymoose with coding is that he'd better have some good projects to show off because he's gonna have a tough go at coding jobs.

~40% were Indian developers with hilariously bullshit resumes that didnt provide a GitHub despite me saying it was the one hard qualification requirement in bold and including it as a required question to apply.

Of the remaining ~60% I went through and wrote their degrees. I may have missed a couple. I separated degrees into the tier I view them in for entry-level developers. Bold are the ones I interviewed or I at least wrote some favorable notes on. Ugrad means they're currently in undergrad, otherwise they'd graduated. The places in parenthesis are where their undergrad was from.

P.S. An open source package that has very little support and a creator that actively wants help is moviepy. Its useful, but has been stalled on a major version update for a long time. Has 5k+ stars so decently popular.

CarForumPoster fucked around with this message at 17:58 on Sep 18, 2021

# ? Sep 18, 2021 17:41

SirPablo: May 1, 2004; Pillbug

Any recommendations of a pythonic way to make a density map of hundreds of small shapefiles on to a lat/lon grid? I found one option, geocube, but the code below just returns a single coverage and not a density.

code:

import geopandas as gpd
from geocube.api.core import make_geocube

wwas = gpd.read_file(f'https://mesonet.agron.iastate.edu/pickup/wwa/2021_tsmf_sbw.zip')
wwas = wwas[(wwas.PHENOM=='FF')&(wwas.STATUS=='NEW')&(wwas.SIG=='W')]
wwas['Z'] = 1

C = make_geocube(
    vector_data=wwas,
    measurements=['Z'],
    resolution=(-0.01, 0.01))

# ? Sep 25, 2021 01:42

Biffmotron: Jan 12, 2007

I last did this three years ago, but the package that I found most helpful was geopandas, which you're using and then Bokeh for the plotting. Bokeh is nice because it makes it easy to add geographic tiles, so you get streets and other features below your data. Bokeh is kinda weird, but there are plenty of tutorials floating around for pretty similar cases.

# ? Sep 25, 2021 03:35

SirPablo: May 1, 2004; Pillbug

Biffmotron posted:

I last did this three years ago, but the package that I found most helpful was geopandas, which you're using and then Bokeh for the plotting. Bokeh is nice because it makes it easy to add geographic tiles, so you get streets and other features below your data. Bokeh is kinda weird, but there are plenty of tutorials floating around for pretty similar cases.

Not sure that is quite what I'm aiming at. He's an example of one polygon that is rasterized at 0.01�x0.01�. I'd like to do this for hundreds of similar polygons, but the step I'm scratching my head on is counting them up grid by grid on a much larger domain, thus giving me a polygon density.

Only registered members can see post attachments!

# ? Sep 25, 2021 04:08

SirPablo: May 1, 2004; Pillbug

Now with a white background. edit damnit lol

Only registered members can see post attachments!

# ? Sep 25, 2021 04:09

accipter: Sep 12, 2003

SirPablo posted:

Not sure that is quite what I'm aiming at. He's an example of one polygon that is rasterized at 0.01°x0.01°. I'd like to do this for hundreds of similar polygons, but the step I'm scratching my head on is counting them up grid by grid on a much larger domain, thus giving me a polygon density.

Make a raster of the entire area with a value of zero. Loop over each polygon, and increment all points in it by one. You should be able to do this with rasterio.

# ? Sep 25, 2021 05:47

Bad Munki: Nov 4, 2008; We're all mad here.

You could also do it through something like qgis pretty readily. That�s what we do, composing hundreds of thousands up to millions of shapes, via a Python script that makes a few basic qgis calls.

# ? Sep 25, 2021 13:25

SirPablo: May 1, 2004; Pillbug

Here's what I ended up doing.

code:

# Download data
wwas = gpd.read_file(f'https://mesonet.agron.iastate.edu/pickup/wwa/2021_tsmf_sbw.zip')
wwas['Z'] = 1

# Make array to add counts
lons, lats = np.meshgrid(np.arange(-120,-70,0.1),np.arange(25,60,0.1))
shape = lons.shape
lons = lons.ravel()
lats = lats.ravel()
counts = lons * 0

# Loop through each FFW
for w in range(wwas.shape[0]):

  # Rasterize
  C = make_geocube(
      vector_data=[wwas.iloc[w]],
      measurements=['Z'],
      resolution=(-0.01, 0.01))

  # Get geometry and values
  xs, ys = np.meshgrid(C.x.values, C.y.values)
  xs, ys = xs.flatten().round(1), ys.flatten().round(1)
  Zs = np.nan_to_num(C.Z.values.flatten(), 0)

  # Add to density array
  before = counts.copy()
  for i in zip(xs, ys, Zs):
    iz = np.argmin((lons-i[0])**2 + (lats-i[1])**2)
    counts[iz] += i[2]
  
  # Clip to make sure aren't adding to too many grids
  after = counts.copy()
  delta = np.clip(after-before, 0, 1)
  counts = before + delta
  del before, delta

# ? Sep 27, 2021 21:40

Ranzear: Jul 25, 2013

Just found out dictionaries are now ordered in 3.7+, but they didn't add any native way to sort by key. That's the most pythonian thing ever.

# ? Sep 28, 2021 06:44

QuarkJets: Sep 8, 2008

Ranzear posted:

Just found out dictionaries are now ordered in 3.7+, but they didn't add any native way to sort by key. That's the most pythonian thing ever.

I thought sorted already did that

# ? Sep 28, 2021 20:02

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 17:16

nullfunction: Jan 24, 2005; Nap Ghost

QuarkJets posted:

I thought sorted already did that

sorted() gives you the sorted keys, not key-value pairs.

You can do something like this but I'm not aware of a native method on dict that would do this for you.

Python code:

>>> d = {"a": 1, "c": 3, "b": 2}
>>> s = {k: d[k] for k in sorted(d)}
>>> s
{'a': 1, 'b': 2, 'c': 3}

# ? Sep 28, 2021 21:27

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »