|
Not sure if this is better suited as a Python or general programming question, but I'll ask here since I'm using Python and pytesseract specifically. Anyone have experience with OCR? I think I'm just misunderstanding something about Tesseract, but I'm at my wit's end trying to get this to work here. I have a bunch of images like this - pre-processed, binary black and white images of numbers of interest: I need to extract these numbers out of here. I know it's possible because if I drop this image into the web demo of Tesseract here, it picks it up fine: This is my call to PyTesseract, and I do not get any usable results out of it. I've tried other PSM modes as well. I have Tesseract v5.0.0-alpha.20210811 installed locally. I figured PSM 5 should be ideal because it's described as "a single uniform block of vertically aligned text," which this is, is it not? Python code:
|
# ? Sep 13, 2021 17:04 |
|
|
# ? May 15, 2024 03:52 |
|
How much should I worry about setting things in os.environ? There's a warning in the docs about memory leaks but it's unclear to me how much this can matter. The warning if you dig into it is about assigning values with different length/sizes. Is this really a problem in real world usage? https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/putenv.3.htmlquote:BUGS I think in my case where I am making a dictionary copy and then passing it to subprocess.run is probably fine anyway (it should clean up when subprocess finishes, yes?) code:
I'm thinking we're talking about a few bytes per day so odds are it would take years to notice anyway..? edit: someone please tell me how to generate that awesome linted "python code" quoted text above. I've seen it a few times but it's not in the documented PHPBB codes is it? mr_package fucked around with this message at 19:05 on Sep 13, 2021 |
# ? Sep 13, 2021 19:02 |
|
mr_package posted:How much should I worry about setting things in os.environ? Python code:
mr_package posted:edit: someone please tell me how to generate that awesome linted "python code" quoted text above. I've seen it a few times but it's not in the documented PHPBB codes is it? Instead of regular code tags you use code=python to get the highlights, you can see it if you quote someone using it. Wallet fucked around with this message at 19:19 on Sep 13, 2021 |
# ? Sep 13, 2021 19:17 |
|
mr_package posted:How much should I worry about setting things in os.environ? There's a warning in the docs about memory leaks but it's unclear to me how much this can matter. The warning if you dig into it is about assigning values with different length/sizes. Is this really a problem in real world usage? https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/putenv.3.html The memory leak only applies to successive calls to setenv when trying to set the same key with differently sized values. In other words, the value persists even if the variable name is deleted or set to some other value with a different size. In practice, this is an insane edge case that doesn't have any meaningful impact; if you are repeatedly setting the env for a continuous process then you're doing something weird and there's almost certainly a better way. You are not doing that; each new process is getting its env set once, and that env should disappear along with the new process ending
|
# ? Sep 13, 2021 20:49 |
|
Loezi posted:I need to associate Things (below: strings) with float ranges, s.t. some of the ranges go to either negative or positive infinity. The "trivial" implementation would probably be something like this: This feels like something you should just use a numpy array for, using dtype='object' assuming you're dealing with actual objects and not just strings code:
|
# ? Sep 13, 2021 21:01 |
|
cinci zoo sniper posted:PEP-636 if you’re on 3.10, otherwise I would subclass dictionary to implement range checking inside dictionary key. Thanks, PEP636 seems worth keeping in mind. Playing around a bit with a dictionary subclass, this seems like a neat approach, allowing me to write code like this: Python code:
QuarkJets posted:This feels like something you should just use a numpy array for, using dtype='object' assuming you're dealing with actual objects and not just strings Not too hot on adding a dependency to numpy just to do something as simple as this. Loezi fucked around with this message at 10:44 on Sep 14, 2021 |
# ? Sep 14, 2021 10:38 |
|
Wow this caused me to look at 3.10 and the structural pattern matching looks neat. Coming from doing a bit of Rust lately the lack of having to always declare a catch-all makes me nervous but could lead to some “interesting” uses.
|
# ? Sep 14, 2021 15:50 |
|
Loezi posted:Thanks, PEP636 seems worth keeping in mind. Playing around a bit with a dictionary subclass, this seems like a neat approach, allowing me to write code like this: If that's the case then I would use map: Python code:
|
# ? Sep 14, 2021 16:19 |
|
Hopefully this makes sense as I'm writing it, like it does in my head. I have a QWidget window with two QLineEdit boxes and four QRadioButtons, grouped into two QButtonGroups of two each. I want to verify, before enabling the "Finish" button, that both QLineEdits are populated and both QButtonGroups have a checkedId() less than -1. Where I'm running into issue is finding some sort of editingFinished or focusOut type event for a QButtonGroup. I need self.check_complete() to run on that event of the QButtonGroups. Python code:
D34THROW fucked around with this message at 19:48 on Sep 14, 2021 |
# ? Sep 14, 2021 19:43 |
|
Loezi posted:I need to associate Things (below: strings) with float ranges, s.t. some of the ranges go to either negative or positive infinity. The "trivial" implementation would probably be something like this: I feel like you're overthinking this and the first solution is completely fine.
|
# ? Sep 14, 2021 21:06 |
|
I'm attempting to program adding and removing items from an array for class. I'm throwing up an error when I attempt to grow the array. My array is [3,77,2,1,0] and I am attempting to add 88 at position 2. My insert code is this:code:
code:
Mycroft Holmes fucked around with this message at 23:54 on Sep 14, 2021 |
# ? Sep 14, 2021 23:52 |
|
Mycroft Holmes posted:I'm attempting to program adding and removing items from an array for class. I'm throwing up an error when I attempt to grow the array. My array is [3,77,2,1,0] and I am attempting to add 88 at position 2. My insert code is this: You should post the specific exception, along with the lines that Python will tell you about when the exception is raised But look at this: Python code:
Presuming that size() is the length of your list (or whatever it is), you are immediately going out of bounds.
|
# ? Sep 15, 2021 02:08 |
|
Also a lot of your code is repeated, you could do this:code:
Same problem though, e.g. If the size is 5 then index 5 is out of bounds.
|
# ? Sep 15, 2021 02:13 |
|
Unless I'm missing something the easy way to do what you want is this. List manipulation functions are builtins.Python code:
Da Mott Man fucked around with this message at 04:28 on Sep 15, 2021 |
# ? Sep 15, 2021 04:22 |
|
HappyHippo posted:I feel like you're overthinking this and the first solution is completely fine. The toy example I've been using is naturally just that, a toy example. There's definitely value in hiding most of the logic re: processing the upper and lower bounds to a separate class in the actual thing I'm doing, rather than replicating that same logic in a billion places. That being said, it might very well turn out that I was overthinking this in the long run. For now, I'm rather happy with the dictionary-based implementation for a "thing that represents potentially unbounded number ranges that I can query for membership, each range associated with a label"
|
# ? Sep 15, 2021 11:35 |
|
Okay, now I have a real question. I have a main menu wherein the user can select one of the calculators, or exit the program. The feature QPushButtons are in a QButtonGroup. I have a constant list declared at the top of my guiQt module, ENABLED_FEATURES, which contains boolean values to control, in the MainMenu class, whether or not each button is enabled. Python code:
Thinking about it a second time, the easier solution would be to make ENABLED_FEATURES a dict with button names as the keys and bools as the values. The goal is so that I can just add it to the dict at the top of the code rather than hard-coding it into enable_by_feature(), further down the class - probably with a default value of True if the button can't be found in the list.
|
# ? Sep 15, 2021 15:38 |
|
D34THROW posted:Thinking about it a second time, the easier solution would be to make ENABLED_FEATURES a dict with button names as the keys and bools as the values. The goal is so that I can just add it to the dict at the top of the code rather than hard-coding it into enable_by_feature(), further down the class - probably with a default value of True if the button can't be found in the list. I can't speak to Qt much/at all but are you sure it's redrawing the buttons after you're disabling them? Also, yeah, doing it by the index in a list of booleans without context is basically the same as just hard-coding the config values anyway. configparser is pretty quick to set up for this kind of thing to get them out of code entirely.
|
# ? Sep 15, 2021 15:51 |
|
I have what I think is a really simple question with numPy. I finally have a job where I can actually use it for work, but now I've forgotten it all. I have a csv that contains readings for a bunch of samples at different wavelengths. I've pasted an example portion of it below. Normally it'll go all the way down to 300 nm. But I've trimmed it for everyone's sanity. code:
|
# ? Sep 17, 2021 14:46 |
|
I have what I think is a really simple question with numPy. I finally have a job where I can actually use it for work, but it's been years since I did any real python work so I'm a bit lost. I have a csv that contains readings for a bunch of samples at different wavelengths. I've pasted an example portion of it below. Normally it'll go all the way down to 300 nm. But I've trimmed it for everyone's sanity. code:
I want to build this to be extensible, as I will be taking readings using this system for the next few years.
|
# ? Sep 17, 2021 14:46 |
|
That csv layout is janky as gently caress and I suspect that you will need to write something custom to deal with it. It feels like you want a pandas multiindex dataframe for this but I don't think that the pandas csv reader will be able to easily figure out the layout on its own E: although first thing you should do is try the pandas csv reader and see what it does QuarkJets fucked around with this message at 19:48 on Sep 17, 2021 |
# ? Sep 17, 2021 16:56 |
|
QuarkJets posted:That csv layout is janky as gently caress and I suspect that you will need to write something custom to deal with it. It feels like you want a pandas multiindex dataframe for this but I don't think that the pandas csv reader will be able to easily figure out the layout on its own This looks pretty easy though, I'm not sure I fully understand the file layout. My first instinct is: ignore the first two lines of the file w/ pandas read_csv(), supply it with column headings. It looks like its one row, one data set. e.g. columns: [Baseline 100%T,SampleOx_Wavelength (nm),SampleOx_Abs,and so on]
|
# ? Sep 17, 2021 20:53 |
|
Yeah it's easy enough to create a 2D table with labels, I just don't think there's an obvious way to make one that's 3D without some twiddling
|
# ? Sep 17, 2021 20:59 |
|
AfricanBootyShine posted:I have what I think is a really simple question with numPy. I finally have a job where I can actually use it for work, but it's been years since I did any real python work so I'm a bit lost. As posters above have commented, this is easy enough to turn into a pandas DataFrame via the DataFrame.read_csv() function. This is almost certainly what you actually want to do - a NumPy 3D array is going to be a lot more awkward to retrieve the correct data from. Your data structure does look quite odd, though. Is there any reason you've arranged things as code:
code:
|
# ? Sep 17, 2021 21:23 |
|
I'm going to third that this is a really weird data format. Unless there's a very good reason to do otherwise, data should be tidy where every row is a unique observation and each column is measurement of the same type across observations. If your data is tidy, analysis and plotting becomes much easier. If not, you're fighting the data at almost every step. Original Data with slightly renamed columns code:
code:
Python code:
|
# ? Sep 17, 2021 22:01 |
|
Thanks for all the help. Looks like I need to sit down with pandas for a few hours. I agree that the format of the data is incredibly goofy. It's what the instrument spits out when data is exported, so tidying the dataset is something I'd like to write a script to automate. The initial analysis is dead simple- I can do it in excel in ten minutes. But I also to do some deconvolution on the spectra, which requires some real tools.
|
# ? Sep 17, 2021 23:45 |
|
It sort of looks like a pandas MultiIndex as columns, except the labels aren't repeated. I would suggest to "manually" construct a MultiIndex for the column axis. They you can stack or similar to tidy the dataset. If you really want to try the 3D thing, the library you want is xarray. But I don't think it would actually help reading the data, just manipulating it, depending on what you need to do. And it's probably overkill in this case, it really shines with data on a grid (eg volumetric).
|
# ? Sep 18, 2021 02:32 |
|
I have a question about python, but it's more of a strategic question than a specific doubt. I'm slowly working towards the point where I can jump into computer touching if (when) my current career becomes a gently caress. I am following a little study plan gently provided by a goon, that reccomended I learn python as a scripting language. I got a book and started working through it, and I am handy with the basics, to the point when I made myself a script that backups my files in a specific manner (tons of directory management and such, essentially a reimplementation of rsync). Now I'm working on parallelism. In your opinion, what areas of python are important to learn, resume-wise? At what point can I say I know python without it being a massive lie? Or perhaps that is the wrong question and I'm just going to be dropped in an unknown area and have to google stuff, so what areas are useful? I imagine parallelism and networking? This, I bet, is a highly subjective question but I'd appreciate opinions. Thanks in advance.
|
# ? Sep 18, 2021 13:00 |
|
Dawncloack posted:I got a book and started working through it, and I am handy with the basics, to the point when I made myself a script that backups my files in a specific manner (tons of directory management and such, essentially a reimplementation of rsync). Now I'm working on parallelism. Writing Python in a list of skills won’t provide any credibility that you can do work as a programmer. It’s not really the right question or method to build a resume. You need to show that you can complete work and projects. Pick a thing you wish existed and build a project, deploy it and put the code on GitHub. Include your GitHub on your resume. Some suggestions: -Pick an open source Python package you like and ask the devs what the process to make changes and pull requests is. See if you can contribute a PR. This is something that can go on a resume. -Pick a topic you like and deploy a flask/django/dash web app about that thing. Put the code on GitHub with a working demo. One which makes some API calls and does some business process. Also popular: Build an ML/AI project in a Jupyter notebook. Then, deploy the model you built with a web app. Have working demos, it will put you above the many, many, many other just starting out Python coders without engineering degrees.
|
# ? Sep 18, 2021 14:04 |
|
I had no idea! Thanks a bunch!
|
# ? Sep 18, 2021 17:25 |
|
Dawncloack posted:I had no idea! Thanks a bunch! You're welcome. Feel free to stop by the resume thread in BFC or the YOSPOS interviewing thread for advice from a broader audience. Several people, including myself, who hire computer touchers regularly post there. Here is the breakdown of me hiring an entry-level python person late 2020 at a startup to give you an idea of competition. Going rate is $20-30/hr depending on job and just how entry level they are. CarForumPoster posted:Yep, single position in 3 weeks. Here was the breakdown from my 91 applicants for an entry-level python job. P.S. An open source package that has very little support and a creator that actively wants help is moviepy. Its useful, but has been stalled on a major version update for a long time. Has 5k+ stars so decently popular. CarForumPoster fucked around with this message at 17:58 on Sep 18, 2021 |
# ? Sep 18, 2021 17:41 |
|
Any recommendations of a pythonic way to make a density map of hundreds of small shapefiles on to a lat/lon grid? I found one option, geocube, but the code below just returns a single coverage and not a density.code:
|
# ? Sep 25, 2021 01:42 |
|
I last did this three years ago, but the package that I found most helpful was geopandas, which you're using and then Bokeh for the plotting. Bokeh is nice because it makes it easy to add geographic tiles, so you get streets and other features below your data. Bokeh is kinda weird, but there are plenty of tutorials floating around for pretty similar cases.
|
# ? Sep 25, 2021 03:35 |
|
Biffmotron posted:I last did this three years ago, but the package that I found most helpful was geopandas, which you're using and then Bokeh for the plotting. Bokeh is nice because it makes it easy to add geographic tiles, so you get streets and other features below your data. Bokeh is kinda weird, but there are plenty of tutorials floating around for pretty similar cases. Not sure that is quite what I'm aiming at. He's an example of one polygon that is rasterized at 0.01°x0.01°. I'd like to do this for hundreds of similar polygons, but the step I'm scratching my head on is counting them up grid by grid on a much larger domain, thus giving me a polygon density.
|
# ? Sep 25, 2021 04:08 |
|
Now with a white background. edit damnit lol
|
# ? Sep 25, 2021 04:09 |
|
SirPablo posted:Not sure that is quite what I'm aiming at. He's an example of one polygon that is rasterized at 0.01°x0.01°. I'd like to do this for hundreds of similar polygons, but the step I'm scratching my head on is counting them up grid by grid on a much larger domain, thus giving me a polygon density. Make a raster of the entire area with a value of zero. Loop over each polygon, and increment all points in it by one. You should be able to do this with rasterio.
|
# ? Sep 25, 2021 05:47 |
You could also do it through something like qgis pretty readily. That’s what we do, composing hundreds of thousands up to millions of shapes, via a Python script that makes a few basic qgis calls.
|
|
# ? Sep 25, 2021 13:25 |
|
Here's what I ended up doing.code:
|
# ? Sep 27, 2021 21:40 |
|
Just found out dictionaries are now ordered in 3.7+, but they didn't add any native way to sort by key. That's the most pythonian thing ever.
|
# ? Sep 28, 2021 06:44 |
|
Ranzear posted:Just found out dictionaries are now ordered in 3.7+, but they didn't add any native way to sort by key. That's the most pythonian thing ever. I thought sorted already did that
|
# ? Sep 28, 2021 20:02 |
|
|
# ? May 15, 2024 03:52 |
|
QuarkJets posted:I thought sorted already did that sorted() gives you the sorted keys, not key-value pairs. You can do something like this but I'm not aware of a native method on dict that would do this for you. Python code:
|
# ? Sep 28, 2021 21:27 |