Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
SurgicalOntologist
Jun 17, 2004

I would fork remotezip and modify it to support passing a session optionally. Looks like it's less than 200 lines so it should be pretty easy and is a worthwhile contribution to give back to the project

Adbot
ADBOT LOVES YOU

Bad Munki
Nov 4, 2008

We're all mad here.


Yeah, a PR was actually one of our considered options. Pretty sure the required changes would only amount to a few lines.

C2C - 2.0
May 14, 2006

Dubs In The Key Of Life


Lipstick Apathy
Is there a detailed explainer somewhere about imports/packages?

I'm on an M1 MacBook Pro & have also installed 3.10 via Homebrew. I've been back & forth between the "Non-Programmers Guide to Python 3" and "The Hitchhiker's Guide to Python" working through the chapters. The past couple of days, I've also been perusing "project"-type repos on Github and have found myself at odds with imports, packages, & things like 'requirements.txt'. As the screenshot below illustrates, I'm unable to run the programs but also unable (to reconcile?) the packages required for the programs. My guess is that there's a disparity between the version of Python pre-installed on the laptop (3.9?) and the version I have installed via Homebrew.

I'm not looking for a complete solution here on the forums & don't wanna' bog down the thread with baby's-first-python; any suggestions for another site that explains this process in detail would be fine.

QuarkJets
Sep 8, 2008

C2C - 2.0 posted:

Is there a detailed explainer somewhere about imports/packages?

I'm on an M1 MacBook Pro & have also installed 3.10 via Homebrew. I've been back & forth between the "Non-Programmers Guide to Python 3" and "The Hitchhiker's Guide to Python" working through the chapters. The past couple of days, I've also been perusing "project"-type repos on Github and have found myself at odds with imports, packages, & things like 'requirements.txt'. As the screenshot below illustrates, I'm unable to run the programs but also unable (to reconcile?) the packages required for the programs. My guess is that there's a disparity between the version of Python pre-installed on the laptop (3.9?) and the version I have installed via Homebrew.

I'm not looking for a complete solution here on the forums & don't wanna' bog down the thread with baby's-first-python; any suggestions for another site that explains this process in detail would be fine.



I think that your hunch is right.

Python and Pip are companion packages, a specific installation of Python also has a Pip associated with it. It looks like your pip3 is identifying that it does not have write permissions to whatever location it's associated with, so that may be the pip that came with your system python rather than the homebrew version of python. Note how it's installing to a location with "Python/3.9" in the path. Try using `which pip3` and `which python3` to investigate the paths of these executables and see if there's a different pip (possibly just "pip") in your Homebrew location that you need to be invoking.

"pip3 freeze > requirements.txt" - This dumps packages you'd need to set up your current configuration to a text file. This line overwrote the requirements.txt that was already in that directory (Todo_app). If this is a git repository you should be able to just `git checkout requirements.txt` to restore it. You could type `git status` first to see that you did indeed modify the file (`git status` will tell you which files have been modified and not committed yet, if any).

We've had other posters come into this thread to complain about Homebrew giving them broken python installations, I don't know what the deal is but maybe you need to install pip separately, or maybe it has a different executable name (such as simply "pip")? You might also consider creating your own environment using miniconda. This process is painless and can write to any directory you want, it'll give you a lightweight python environment that's sandboxed away from everything else. You activate the environment with "conda deactivate <environment name>" and deactivate with "conda deactivate". Activation adds new paths to your environment so that the "python" and "pip" executables will be using your conda environment, and this will be way better than what you're dealing with now, this nebulous state of "am I using my system python or homebrew python?" Once your environment is active you should then be able to use pip to install whatever you need, "pip install requirements.txt" (using the repository's requirements.txt, not one you created yourself with pip freeze) should basically just work.

Asleep Style
Oct 20, 2010

C2C - 2.0 posted:

Is there a detailed explainer somewhere about imports/packages?

I'm on an M1 MacBook Pro & have also installed 3.10 via Homebrew. I've been back & forth between the "Non-Programmers Guide to Python 3" and "The Hitchhiker's Guide to Python" working through the chapters. The past couple of days, I've also been perusing "project"-type repos on Github and have found myself at odds with imports, packages, & things like 'requirements.txt'. As the screenshot below illustrates, I'm unable to run the programs but also unable (to reconcile?) the packages required for the programs. My guess is that there's a disparity between the version of Python pre-installed on the laptop (3.9?) and the version I have installed via Homebrew.

I'm not looking for a complete solution here on the forums & don't wanna' bog down the thread with baby's-first-python; any suggestions for another site that explains this process in detail would be fine.



there's a lot of moving parts here with python. it's a confusing topic for sure. I don't know the mac specific things like homebrew but I can answer generally

you're right that this is a python version issue. you can tell when pip says that the requirements are already satisfied under ".../Library/Python/3.9/...". This tells you that pip is being run from python 3.9. If you run `which python3` you should see the path to the actual python3 executable, which will probably tell you the version. `python3 -V` will definitely tell you what version you're running

also when you're running `pip install` you're doing so in the global python environment. this isn't recommended because the packages you install might conflict with python packages needed by your system. the way python solves this is with a virtual environment, which isolates your system python packages from the packages needed by your application.

to create a venv, run this command in your project folder
`$ python3 -m venv env`

this will create a venv in a directory called env. when you want to switch from your global python environment to your venv, you run
` $ source env/bin/activate`
at this point your prompt should include the name of the venv you have active. now you can run `pip install` without causing conflicts with your system python packages. note that you will also want to have your venv active before you run your code. the venv can be deactivated with `$ deactivate`.

your problem isn't totally solved, because your venv will be created with the python version that ran the `$ python3 -m venv env`. to create a venv from a different version of python you have to run the venv module using the version of python that you want. no idea where homebrew puts them, but it would look something like this:
`$ /absolute/path/to/my/python3.10/install -m venv env`

now when you activate that environment and run `python3 -V` you should see the version number that you expect

this is the standard way to do this, but there are a couple tools that I like that make it less cumbersome to manage multiple python versions and venvs:
- pyenv (https://github.com/pyenv/pyenv)
- pyenv-virtualenv (https://github.com/pyenv/pyenv-virtualenv)

this makes it easier to manage multiple versions of python and create virtual environments based off of whatever version you need. following the installation instructions will also result in your venvs being activated whenever you cd into the project directory, which is very convenient

for more reading:
https://docs.python.org/3/library/venv.html

CarForumPoster
Jun 26, 2013

⚡POWER⚡
IDK if anyone needs this but I've wanted a simple to use and install way to OCR PDFs that might be a mix of already OCR'd, scanned documents with automated text added or just images for a while.

Here is a better-than-pytesseract way to take in PDFs, flatten to an image then OCR. Very little testing but its the most promising and easiest to set up of any of the solutions I've tried. Assumes Anaconda which made this easier.

conda install pytorch using the proper lines for your OS/CUDA version from their website
code:
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install easyocr
conda install -c conda-forge poppler
pip install pdf2image
code:
import easyocr
reader = easyocr.Reader(['en'])

from pdf2image import convert_from_path
# pages is a list of PIL objects
pages = convert_from_path('file.pdf', 250)

import io
img_byte_arr = io.BytesIO()
pages[1].save(img_byte_arr, format='PNG')
img_byte_arr = img_byte_arr.getvalue()

text_output = reader.readtext(img_byte_arr, detail=0, paragraph=True)
for text in text_output:
    print(text)

C2C - 2.0
May 14, 2006

Dubs In The Key Of Life


Lipstick Apathy
Sorry for the tardy reply, y'all. I'm a bartender & just settled in to read the responses. I'll troubleshoot according to the guidance here & post back when I've successfully got things working on my end.

Thanks a TON for the advice & consideration!!!

QuarkJets
Sep 8, 2008

No worries drop by any time

Nigel Tufnel
Jan 4, 2005
You can't really dust for vomit.
I am writing a twitter bot and to do twitter threads I need to get the ID of each previous tweet and then each subsequent tweet is a reply to the previous one (I think, unless there's a way tweepy can do this automatically).

When I use tweepy (Twitter API v2) to post that first tweet

code:
import tweepy

client = tweepy.Client(consumer_key=config['consumer_key'], consumer_secret=config['consumer_secret'], access_token=config['access_token'], access_token_secret=config['access_token_secret'])
info = client.create_tweet(text="I am tweet number one")

print(type(info))
print(info)
those print statements return

code:
<class 'tweepy.client.Response'>
Response(data={'id': '1234567890', 'text': "I am tweet number one"}, includes={}, errors=[], meta={})
What I can't figure out is how to access that id and assign it to a variable which is what I need in order to reply to that first tweet.

boofhead
Feb 18, 2021

try:

code:
tweet_id = info.data['id']
Response is the class (from tweepy.client.Response) which has the properties data, includes, errors, and meta, so:

info.data
info.includes
info.errors
info.meta

and within info.data you have a dict with keys 'id' and 'text':

info.data['id']
info.data['text']

Nigel Tufnel
Jan 4, 2005
You can't really dust for vomit.
That worked! Amazing, thank you.

Bad Munki
Nov 4, 2008

We're all mad here.


SurgicalOntologist posted:

I would fork remotezip and modify it to support passing a session optionally. Looks like it's less than 200 lines so it should be pretty easy and is a worthwhile contribution to give back to the project

Bad Munki posted:

Yeah, a PR was actually one of our considered options. Pretty sure the required changes would only amount to a few lines.
I don't actually write any code anymore :corsair: but I had one of my junior devs do this, seemed like a good grab for them to have under their belt and it looks like we're in business with the changes. Thanks for the input. :)

SurgicalOntologist
Jun 17, 2004

Nice! I didn't know about remotezip before you mentioned it, I'm going to keep it in mind in case the need arises.

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.
After spending a couple of days wrestling with an unrelated problem, I'm back to my project to create a game book solver, using "Impudent Peasant!" as a test case. My code's getting pretty long, so I won't paste the whole thing, just the bits I have questions or comments about.

f-strings are my new best friend, by the way, QuarkJets. I've discovered their limitations, though (they don't always handle backslashes well), and in the process learned that implicit string joining only works on literal strings, so a number of f-strings became parenthesized items became "".join() instances. (flake8 has been helpful for error-checking, so thanks for that too.)

I changed the way state strings work. State strings are now a serialized version of a dictionary containing two keys:
  • section_data is everything you need to know about the character's current condition - skills, equipment, event flags, that sort of thing.
  • section_function is the function that will be called with section_data as input to return a state complex, which is a dictionary containing the following keys:
    • text is what you see when you read that section, including any choices to be made.
    • is_random is True when the choice in question is random.
    • choice_keys is a dictionary of choice keys, which are tuples with two elements:
      • choice_key[0] is the text of a given choice, such as "If you want to go left, turn to 3." (It'll actually contain BBcode.)
      • choice_key[1] is either the state string that choice would lead to or False if the choice cannot actually be made for some reason. (The reason will be in choice_key[0]; for example, "If you want to go left, turn to 3. (You can't do that because you don't have a torch.)")
      The values of these keys are the weights of the choices, used for random decisions. (Choices you can't actually make have a weight of 0.)
    • score: if given, this is a dead-end page, which means you've won or lost (or, in future books, gotten some score or other). If given, is_random is ignored and choice_keys must be empty.
    • silent: if given, we want to silently elide this section and the next. (I'll explain what eliding, silently or otherwise, means in a bit.)
So instead of a big section lookup table thing, the state string now contains the function for the section itself, which means get_state_complex looks like this:

Python code:
def get_state_complex(state_string):
    game_state = loads(state_string)
    section_data = game_state.get("section_data", {})
    section_function = game_state.get("section_function", start_of_game)
    return elide(section_function(section_data))

As before, "{}" is the starting state string, so we have a few lines to fill in the defaults there. Then we call the section function, which returns a state complex. We make a decision from that complex's choice keys, which gives us a new state string to put back into get_state_complex. When scoring, we'll score each possible state string; then, when playing, we can display the score for each choice at a given point.

But what's elide, you ask? Well, it's important to keep the state space small, but it's also useful to have section functions do small things. There will also be times when reaching the end of a section just means turning to a new one. So I wrote this (editor's notes are things I'd like an answer about):

Python code:
def elide(state_complex):
    """
    In order to reduce the number of game states we need to consider, we're going to elide sections together if the
    first section has only one valid choice.
    """
    valid_choices = dict((choice_key, weight) for choice_key, weight in state_complex["choice_keys"].items()
                         if choice_key[1])  # if choice_key[1] is falsy, this isn't a valid choice
    if len(valid_choices) != 1:  # more than one valid choice, so we can't elide
        return state_complex
    for key in valid_choices:  # when there's only one candidate...
        choice_key = key  # ...there's only one choice. (Editor's note: is this the best way to do this?)
    new_state_complex = get_state_complex(choice_key[1])
    # combine the text from the old and new states. This is USUALLY simple, but there's one exception.
    if len(state_complex["choice_keys"]) == 1:  # if there was only one choice to begin with...
        if state_complex.get("silent", False):  # ...and we want the transition to be silent...
            # ...then we need to do this. Editor's note: the text of each state complex is formatted for posting on SA,
            # with a BBcode quote header (for instance, '[quote="Section 1"]') and footer ('[/quote]'). Normally,
            # if we're on (say) section 3 and we want to turn to section 47, we want to keep the texts separate with
            # their own quote blocks, but if we're doing a SILENT transition, we're pretending the two sections are the
            # same section. (I'll give an example of WHY we'd want to do this in a later post.)
            new_state_complex["text"] = "".join(
                state_complex["text"][0:-len("\n[/quote]")],  # calculating for future-proofing.
                re.match("\\[[^\\]]*\\](.*)", new_state_complex["text"]).group(1)
                )
    else:  # either there's more than one choice, or we want to display the text for the one choice we have.
        new_state_complex["text"] = f"{state_complex['text']}\n\n{new_state_complex['text']}"
    return elide(new_state_complex)  # can we elide again?

With me so far? Anything I messed up? If I'm good to proceed, I'll put start_of_game(), and some helper functions, up next.

Bad Munki
Nov 4, 2008

We're all mad here.


SurgicalOntologist posted:

Nice! I didn't know about remotezip before you mentioned it, I'm going to keep it in mind in case the need arises.

It’s such a huge life saver. We work with data products that are stored as zip archives running in the 4-12GB range. Often a user only wants one of the files therein, and having to pull 10GB to examine a 100kb annotation file is a major downer.

One can make all sorts of arguments about more suitable storage schemes, that horse is dead and pulped, but here we are. This pretty much entirely alleviates the shortcomings of that particular aspect.

In this particular case, it worked out of the box for some auth methods, but there were others we needed to support as well.

Bad Munki fucked around with this message at 22:44 on Oct 6, 2022

Seventh Arrow
Jan 26, 2005

I'm still chugging along with the codecademy python course, but I don't want to get rusty with the skills I've developed thus far. Is there a good not-leetcode site to this end?

I suspect that some might answer "do projects", but even this will require suggestions.

QuarkJets
Sep 8, 2008

FredMSloniker posted:

After spending a couple of days wrestling with an unrelated problem, I'm back to my project to create a game book solver, using "Impudent Peasant!" as a test case. My code's getting pretty long, so I won't paste the whole thing, just the bits I have questions or comments about.

f-strings are my new best friend, by the way, QuarkJets. I've discovered their limitations, though (they don't always handle backslashes well), and in the process learned that implicit string joining only works on literal strings, so a number of f-strings became parenthesized items became "".join() instances. (flake8 has been helpful for error-checking, so thanks for that too.)

Can you provide examples of these? I haven't experienced this but maybe that's just a fluke in how I write f-strings

Jigsaw
Aug 14, 2008

FredMSloniker posted:

f-strings are my new best friend, by the way, QuarkJets. I've discovered their limitations, though (they don't always handle backslashes well), and in the process learned that implicit string joining only works on literal strings, so a number of f-strings became parenthesized items became "".join() instances. (flake8 has been helpful for error-checking, so thanks for that too.)
You need to put an f before each string you want to implicitly join that has a formatted part, like this.
code:
x = 1
y = 2
f"{x}" f"{y}"
# produces "12"

f"{x}" "{y}"
# produces "1{y}"
The f-specifier doesn’t “carry over” to the other parts.

You’re right about the backslashes; you can’t have a backslash inside a formatted expression within an f-string.

Jigsaw fucked around with this message at 14:34 on Oct 7, 2022

Jigsaw
Aug 14, 2008

CarForumPoster posted:

IDK if anyone needs this but I've wanted a simple to use and install way to OCR PDFs that might be a mix of already OCR'd, scanned documents with automated text added or just images for a while.

Here is a better-than-pytesseract way to take in PDFs, flatten to an image then OCR. Very little testing but its the most promising and easiest to set up of any of the solutions I've tried. Assumes Anaconda which made this easier.

conda install pytorch using the proper lines for your OS/CUDA version from their website
code:
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install easyocr
conda install -c conda-forge poppler
pip install pdf2image
code:
import easyocr
reader = easyocr.Reader(['en'])

from pdf2image import convert_from_path
# pages is a list of PIL objects
pages = convert_from_path('file.pdf', 250)

import io
img_byte_arr = io.BytesIO()
pages[1].save(img_byte_arr, format='PNG')
img_byte_arr = img_byte_arr.getvalue()

text_output = reader.readtext(img_byte_arr, detail=0, paragraph=True)
for text in text_output:
    print(text)

Do you know how this compares to ocrmypdf? I’ve been using that and found it simple to use in the cases you describe and that it works pretty well.

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.

QuarkJets posted:

Can you provide examples of these? I haven't experienced this but maybe that's just a fluke in how I write f-strings

As I recall, the problem I ran into was something like this:

Python code:
f'[quote={quote_header}]\n{text_block}\n{is_additional_info and f"{additional_info}\n" or ""}[/quote]'

This gives the error "f-string expression part cannot include a backslash', which I presume is about the {additional_info}\n bit of the line. So I changed it to:

Python code:
(
    f'[quote={quote_header}]\n{text_block}\n'
    (is_additional_info and f"{additional_info}\n" or "")
    '[/quote]'
)

But this gives the error "invalid syntax. Perhaps you forgot a comma?" on the line with is_additional_info. So I changed it to:

Python code:
"".join(
    f'[quote={quote_header}]\n{text_block}\n',
    (is_additional_info and f"{additional_info}\n" or ""),
    '[/quote]'
)

...which gave the error "str.join() takes exactly one argument (3 given)". Which is interesting, because flake8 did not catch that; I only saw it when I pasted that code into the py interpreter just now to make sure there weren't any typos. So a quick check of the documents later, I finally get it working with:

Python code:
"".join([
    f'[quote={quote_header}]\n{text_block}\n',
    (is_additional_info and f"{additional_info}\n" or ""),
    '[/quote]'
])

And I take a quick break to amend my code appropriately.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Jigsaw posted:

Do you know how this compares to ocrmypdf? I’ve been using that and found it simple to use in the cases you describe and that it works pretty well.

Haven’t tried it but a few years ago I tried pytesseract, in which OCRmyPDF is based, and it did not work well on reliably getting the text right. May have improved. In particular it created a lot of oddball characters on scanned docs that were relatively clean and straight.

I will give it a try!

CarForumPoster fucked around with this message at 01:27 on Oct 8, 2022

Zephirus
May 18, 2004

BRRRR......CHK

FredMSloniker posted:

As I recall, the problem I ran into was something like this:

And I take a quick break to amend my code appropriately.

You can do something like this to make things slightly more concise. While you can't use \n in a f-string you can call chr() to put it in - chr(10) returns the unicode character for the int you give it, in this case decimal 10 which is newline/linefeed.

You can use conditionals to put the additional info in without having to embed another f-string. Because it's only using one set of "'s to delimit the start and end of the f-string you can use '' inside the conditional to represent no value which is what you want if is_additional_info is not truthy.


Python code:
f"[quote={quote_header}]{chr(10)}{text_block}{chr(10)}{(additional_info + chr(10)) if is_additional_info else ''}[/quote]"
If additiional_info is false(y) (i.e, zero length, or None) you don't have to use the additional boolean, you can instead just do this for the conditional.

Python code:
f"[{(additional_info + chr(10)) if additional_info else ''}"
If you've not come across truthiness before, this doc does some explaining on how/why a value might evaluate in an if/while expression https://mathspp.com/blog/pydonts/truthy-falsy-and-bool

E: Foxfire_ is right - you totally can and what I posted was wrong.

Zephirus fucked around with this message at 02:19 on Oct 8, 2022

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.
Some further testing of my code has discovered a problem - namely, my plan for storing the section function itself in the state string won't work, or at least won't work the way I had planned. The error in question is:

TypeError: Object of type function is not JSON serializable

So that's a problem. Fortunately, it looks like I just need to use pickle instead of json. Unfortunately, I ran into this problem again:

Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pickle import dumps, loads
>>> a = {"foo": 1, "bar": 2}
>>> b = {"bar": 2, "foo": 1}
>>> a == b
True
>>> dumps(a) == dumps(b)
False

And unlike json, there's no sort_keys option. I need the serialized version of a dictionary to be identical no matter what order the keys are defined in, because I don't care in what order I wrote stuff down on the character sheet, just what I wrote down. What would you suggest? Do I need to pickle.dumps the section function and json.dumps the section data?

Also, on the topic of falsiness, since Zephirus brought it up: one thing that's annoyed me about Python versus Lua is that, in Lua, something being falsy means it's false or nil and truthy means it's anything else. In Python, though, some other things are falsy, including the number 0 and the string "", which keeps tripping me up. If I want to write a pluralizing function, for instance, (n == 1) and "" or "s" always returns "s"; I have to write (n != 1) and "s" or "" for it to work right. And I can't test if something is exactly True or False with a simple ==; I also have to check that it's the correct type.

That (condition) if (true result) else (false result) construction isn't something I've seen before, and I'll keep it in mind.

Foxfire_
Nov 8, 2010

If it isn't from you copying to forum, part of your first error is because you have some screwed up braces.

Merging adjacent string literals is a parsing thing, not an execution thing.

f"AB{thing1}" "CD" "D{thing2} is just another way to write f"AB{thing1}CDD{thing2}" in the same way that 12, 0xC, 0x0000_000C, or 0b1100_0000 are all ways to write the same number.

If you want to concatenate strings at execution-time, you'd use the + operator:
Python code:
def a_function_that_returns_a_string() -> str:
    return "butt"

# Doesn't work, a_function_that_return_a_string() is a function call, not a string literal
"dick" a_function_that_returns_a_string()

# Fine, this is creating two strings and then concatenating them into a third string
"dick" + a_function_that_returns_a_string()

Zephirus posted:

While you can't use \n in a f-string
There's nothing wrong with using a \n in a f-string:
Python code:
In [62]: f"this is fine: {some_number}\n"
Out[62]: 'this is fine: 1\n'

FredMSloniker posted:

my plan for storing the section function itself in the state string won't work,
This seems like a very bad idea. You are on-purpose trying to store a snapshot of the program in your database state?

FredMSloniker posted:

I need the serialized version of a dictionary to be identical no matter what order the keys are defined in, because I don't care in what order I wrote stuff down on the character sheet, just what I wrote down.
Why do you care about the serialized representation? They will load back into the same key:value mappings. Order is also a sketchy concept to begin with for python dicts. They were unordered in python 2, and are only ordered in 3 because people kept writing buggy code that relied on that behavior.

If you do need that for some reason, sort the dictionary keys into some canonical order prior to serializing:
Python code:
def serialize_canonically(some_dict):
     canonicalized = OrderedDict([(key, some_dict[key]) for key in sorted(some_dict.keys())])
     return pickle.dumps(canonicalized)

Foxfire_ fucked around with this message at 02:04 on Oct 8, 2022

QuarkJets
Sep 8, 2008

FredMSloniker posted:

Some further testing of my code has discovered a problem - namely, my plan for storing the section function itself in the state string won't work, or at least won't work the way I had planned. The error in question is:

TypeError: Object of type function is not JSON serializable

So that's a problem. Fortunately, it looks like I just need to use pickle instead of json. Unfortunately, I ran into this problem again:

Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pickle import dumps, loads
>>> a = {"foo": 1, "bar": 2}
>>> b = {"bar": 2, "foo": 1}
>>> a == b
True
>>> dumps(a) == dumps(b)
False

And unlike json, there's no sort_keys option. I need the serialized version of a dictionary to be identical no matter what order the keys are defined in, because I don't care in what order I wrote stuff down on the character sheet, just what I wrote down. What would you suggest? Do I need to pickle.dumps the section function and json.dumps the section data?

Also, on the topic of falsiness, since Zephirus brought it up: one thing that's annoyed me about Python versus Lua is that, in Lua, something being falsy means it's false or nil and truthy means it's anything else. In Python, though, some other things are falsy, including the number 0 and the string "", which keeps tripping me up. If I want to write a pluralizing function, for instance, (n == 1) and "" or "s" always returns "s"; I have to write (n != 1) and "s" or "" for it to work right. And I can't test if something is exactly True or False with a simple ==; I also have to check that it's the correct type.

That (condition) if (true result) else (false result) construction isn't something I've seen before, and I'll keep it in mind.

What if you just write the function name instead of the function? That should give you a json-serializable dict. I would avoid using pickle at all

When you write the dictionary, convert functions to function names (you should be able to use the function's __name__ attribute)

When you load the dictionary remap function names to functions. You're using a specific key for this already so these cases should be easy to find.

On truthiness, you've got the construction mixed up. It should be
(true result) if (condition) else (false result)
That may be why you're seeing unexpected results. For instance
x = "" if n == 1 else "s"
This will set x to "" if n equals 1, otherwise x is set to "s"

You should avoid relying on truthiness and instead create actual bools based on conditionals. Instead of checking "if x" where x is an integer, which would use integer truthiness, I like to check the value of x: "if x > 0" checks an actual bool.

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.

Foxfire_ posted:

If it isn't from you copying to forum, part of your first error is because you have some screwed up braces.

Where did I mismatch? I presume you're talking about

Python code:
f'[quote={quote_header}]\n{text_block}\n{is_additional_info and f"{additional_info}\n" or ""}[/quote]'

But I don't see any messed-up braces.

Foxfire_ posted:

There's nothing wrong with using a \n in a f-string:
Python code:
In [62]: f"this is fine: {some_number}\n"
Out[62]: 'this is fine: 1\n'

The issue is the newline I bolded here:

f'[quote={quote_header}]\n{text_block}\n{is_additional_info and f"{additional_info}\n" or ""}[/quote]'

If I remove that, I don't get the error.

Foxfire_ posted:

This seems like a very bad idea. You are on-purpose trying to store a snapshot of the program in your database state?

This is what I'm trying to do.

I'm reading a game book, in this case one of the Fighting Fantasy game books. I know what section I'm on, what my character's stats are, what equipment I have, and so forth. For clarity, I'm going to call this the game state: everything I'd need to remember if I put the book down for a while and wanted to pick it back up and keep playing. Given the choices the game book presents me in a given game state, how likely am I to win if I choose each one? Indeed, how likely am I to win the book as a whole with perfect play?

To answer this question, I'm putting the game state into some form I can write down. I'm calling it the state string for now, though it may not wind up being an actual Python string. The important thing is that it doesn't matter how, exactly, I got to a specific game state; if the game state is the same, the state string should be as well. I consider {"foo": 1, "bar": 2} and {"bar": 2, "foo": 1} to be the same state, but json.dumps will produce different output for each unless I specify sort_keys = True. So I do.

At any rate, in order to score the choices I'm looking from a given game state, I need to know what those choices are and what game states they lead to. As I play, eventually I'll reach a game state that has no choices - either I've won or I've lost. In my scoring database, I mark the corresponding state string with a score of 1 or 0, respectively. (This is why I need state strings in the first place: to use as keys.) Any choice that leads to a game state has that game state's score.

Once I've scored all of the game states that can be reached from a given game state, I can then score that game state. If the choice is mine to make, obviously I'll make the choice with the highest score, so that's the score of this game state as well. However, in Fighting Fantasy books, often you have to roll some dice to make a decision. So I take the weighted average of the scores of the choices, depending on how likely each choice is to come up, and make that the score of this game state.

By doing this, I can work my way through the book, giving a score to every possible game state, including the very first - which is the probability of winning the book with perfect play.

However, I'm not just trying to get that number. Once I've solved the game book, I then want to show it off in a Let's Play. For that, I need to be able to show the information for any arbitrary (but actually reachable) game state: the text of the section (which we didn't previously care about), the available choices, and their scores. So right now, I'm looking at three programs:
  • A game book-specific program. Since I'm using the fan-made game book "Impudent Peasant!" as a test case, this is peasant.py. The primary job of this program is to take a state string and return the text of the corresponding section, the choices that are available from that section, and whether or not the choice is made randomly.
  • The solver program. Using the game book program, this will create a database of the scores of every possible game state.
  • The playing program. Using the game book program and the output of the solver program, this will allow me to actually play the game book. At each choice, it will provide neatly BBcoded output that I can paste into an LP, with each choice marked with the probability of success.
As for storing functions in a state string, that's simply my latest attempt to approach coding the solver program nicely. My initial approach was simply having a very large function with a long 'if we're on this section, do this, elif we're on this section, do this, elif ... else something has gone wrong'. Folks said I should give each section its own function, but then the main function would be 'if we're on this section, call this function, elif we're on this section, call this function, elif ... else something has gone wrong'. That doesn't seem much better to me. So my next thought was to split the game state into the section function (the function we use to determine what to do on this section) and the section data (everything but what section we're on, which is implicit in which section function we're using) and store both in a state string. Which json complained about.

Which brings me to this:

QuarkJets posted:

What if you just write the function name instead of the function? That should give you a json-serializable dict. I would avoid using pickle at all

When you write the dictionary, convert functions to function names (you should be able to use the function's __name__ attribute)

When you load the dictionary remap function names to functions. You're using a specific key for this already so these cases should be easy to find.

I follow you on steps one and two, but not three. Is there some way to get from a function name to a function without having to manually create a link, be it a case-select or a dictionary, for every single function? Can I do some sort of call_function(name, args)?

...actually, hold that thought. (Googles)

So are you suggesting I do locals()[name](args)? Because I like that idea. In fact, I suggested:

FredMSloniker posted:

return vars().get(section, lambda a: None)(state_string)

But you said:

QuarkJets posted:

I don't think that this does what you want it to do, and it'd be bad code even if it did work. Don't do this. If you absolutely need to be able to map function string names to functions, create a dict that contains those mappings and then refer to that.

It'd be better still if you didn't need to refer to a lookup table of function names at all and instead just called the functions directly.

So now I'm kind of lost as to what you think I should actually do.

Also:

QuarkJets posted:


On truthiness, you've got the construction mixed up. It should be
(true result) if (condition) else (false result)
That may be why you're seeing unexpected results. For instance
x = "" if n == 1 else "s"
This will set x to "" if n equals 1, otherwise x is set to "s"

My example was actually x = (n == 1) and "" or "s", but thanks for the correction on the if-else construction.

QuarkJets
Sep 8, 2008


f-strings don't handle backslashes inside of f-string expressions, that's the problem that you're seeing. You can solve this by just assigning that part of the expression to its own variable (e.g. the expression inside of the expression). In general you should try to avoid writing expressions with any amount of complexity in f-strings; you can do it but it makes your code way harder to read

What you wrote was this:
Python code:
new_string = f'[quote={quote_header}]\n{text_block}\n{is_additional_info and f"{additional_info}\n" or ""}[/quote]'
It'd be better to write something like this:

Python code:
new_additional_info = f"{additional_info}\n" if is_additional_info else ""
new_string = f'[quote={quote_header}]\n{text_block}\n{new_additional_info}[/quote]'
Or broken up over multiple lines:

Python code:
new_additional_info = f"{additional_info}\n" if is_additional_info else ""
new_string = (f'[quote={quote_header}]\n'
              f'{text_block}\n'
              f'{new_additional_info}[/quote]')
Or as a list join:
Python code:
new_strings = [f'[quote={quote_header}]', text_block]
if is_additional_info:
    new_strings.append(additional_info)
new_strings.append("[/quote]")
new_string = "\n".join(new_strings)

QuarkJets
Sep 8, 2008

FredMSloniker posted:

So now I'm kind of lost as to what you think I should actually do.

I think that you should make that mapping yourself rather than directly looking up functions out of locals(). The problem with using locals is that it gives you everything, even functions or variables that you might not want to use in your mapping. Usually you shouldn't be using locals() at all, but this is one case where it makes sense to use a filtered form of it to define this mapping. You could put this somewhere near the bottom of your code, after all functions have been defined:

Python code:
story_functions = {name: value for name, value in locals().items() if callable(value) and not name.startswith('_')}
Explaining what is going on here:
1. We're trying to create mappings between names and values
2. Iterating over all items (e.g. the key/value pairs) returned by locals()
3. Only keep pairs where the value is callable (e.g. a function or a class with the __call__ method defined). We use the built-in callable() function for this check
4. Only keep pairs where the name does not start with "_". Functions defined in this way are considered private (but being Python, it's private in the loosest possible sense). This means you can create this mapping without accidentally importing things like __loader__ or other functions that you may decide later shouldn't be included (e.g. if you had roll_luck, roll_strength, etc. calling a common roll_skills function, you could define its name as _roll_skills to exclude it from this mapping)

If you decided to label all of your functions in a common way (e.g. by having them start with the word "story" or something) then you could make this dict comprehension even more precise.

Either way, I don't think that you should be serializing code state, but I think story state should be fine and accomplishable with json. Using serialized function names instead of serialized functions is a lot cleaner and lets you represent everything in nice clean json. Even just importing pickle is often a code smell, it's very rare for it to make sense for a program to use pickle. The serializations that you want I think are better accomplished with function names, especially since IIRC you wanted these serializations to be human readable later

quote:

My example was actually x = (n == 1) and "" or "s", but thanks for the correction on the if-else construction.

This is not if-else assignment. What you're doing here is truthy assignment, which is why it's giving you results that you don't expect.

Breaking it down:
1. `n == 1` is evaluating to a bool (True or False)
2. `(n == 1) and "" or "s"` is a truthy evaluation. This isn't giving you either "" or "s" depending on the state of n, it's always going to give you "s".

This is not an effective way to assign values to things, as it'd require pretty deep understanding of both Python's order of operations and how truthy comparisons work. What you probably want is:
x = "" if (n == 1) else "s"
This sets x to "" if n == 1, otherwise it sets x to "s"

QuarkJets fucked around with this message at 05:40 on Oct 8, 2022

lazerwolf
Dec 22, 2009

Orange and Black
I have a situation where I have several workflows that require different input formats. Is creating a dict of workflow: function the best way to implement this?

SirPablo
May 1, 2004

Pillbug
Since this thread seems more active than the the AI thread and Data Science thread, figured I would ask here. I have a science background (Master's degree) and have been working with python for 10 years now. I'm trying to get more into ML, maybe even to switch careers. Are there any certification programs that employers actually value? I found some options listed here and here, but not sure if any of these are considered worthwhile. I'm not completely opposed to a graduate certificate from a university, but is that even necessary or just throwing money away. Any thoughts or recommendations?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

SirPablo posted:

Since this thread seems more active than the the AI thread and Data Science thread, figured I would ask here. I have a science background (Master's degree) and have been working with python for 10 years now. I'm trying to get more into ML, maybe even to switch careers. Are there any certification programs that employers actually value? I found some options listed here and here, but not sure if any of these are considered worthwhile. I'm not completely opposed to a graduate certificate from a university, but is that even necessary or just throwing money away. Any thoughts or recommendations?

Do projects and make them accessible to laypeople on github, then heavily promote your github on your resume, linkedin, personal webpage, etc. Contribute to OSS projects.

Accessible to laypeople means deploying the model as single page webapp with a working demo and pretty graphs they can interact with and which tell a story of the project. A good readme that has pictures or a GIF of it working and an explanation of what it is.

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.

QuarkJets posted:

f-strings don't handle backslashes inside of f-string expressions, that's the problem that you're seeing. You can solve this by just assigning that part of the expression to its own variable (e.g. the expression inside of the expression). In general you should try to avoid writing expressions with any amount of complexity in f-strings; you can do it but it makes your code way harder to read

Yeah, I got that.

QuarkJets posted:

This is not if-else assignment. What you're doing here is truthy assignment, which is why it's giving you results that you don't expect.

That's what I said, actually. I was using the truthy assignment because I didn't know about if-else assignment. I appreciate your correction on the structure of if-else assignment, though.

QuarkJets posted:

I think that you should make that mapping yourself rather than directly looking up functions out of locals().

So you think I should do something like:

Python code:
def section_1(section_data):
    # stuff
def section_2(section_data):
    # stuff
def section_3(section_data):
    # stuff
# and so on until
def section_1000(section_data):
    # stuff

section_dictionary = {
    "section_1": section_1,
    "section_2": section_2,
    "section_3": section_3,
    # and so on until
    "section_1000": section_1000
}

def process_state(game_state):
    section_dictionary[game_state["section_number"]](game_state["section_data"])

Or am I misunderstanding you? Because I'm really not seeing how a long list of functions followed by a long dictionary I have to roll by hand is better than:

Python code:
def section_1(section_data):
    # stuff
def section_2(section_data):
    # stuff
def section_3(section_data):
    # stuff
# and so on until
def section_1000(section_data):
    # stuff

section_dictionary = {name: value for name, value in locals().items() if callable(value) and name.startswith("section")}

def process_state(game_state):
    section_dictionary[game_state["section_number"]](game_state["section_data"])

QuarkJets posted:

Usually you shouldn't be using locals() at all,

Why not? What are the hazards of using it, and why does it exist if I shouldn't use it?

QuarkJets
Sep 8, 2008

SirPablo posted:

Since this thread seems more active than the the AI thread and Data Science thread, figured I would ask here. I have a science background (Master's degree) and have been working with python for 10 years now. I'm trying to get more into ML, maybe even to switch careers. Are there any certification programs that employers actually value? I found some options listed here and here, but not sure if any of these are considered worthwhile. I'm not completely opposed to a graduate certificate from a university, but is that even necessary or just throwing money away. Any thoughts or recommendations?

Are you interested in an entry-level position or a position that's more in-line with having 10 years of Python experience? If the former, you could probably just start applying to places at any time, employers are desperate for practitioners and if you can claim any competency at all in software development or data analysis then they will train you in the tools that they're using - but you'll have all of the downsides of an entry-level position, so if you're already in a career then this might not be a great option. If you want a more senior position, a certificate program could help you learn the basics of ML specifically, then start applying for the positions you want to see if anyone bites, then if no one does create a github project to toy around with real projects, then apply to places again, and just repeat that loop.

YMMV as always, but experience > certification programs > anything else on your resume IMO. I speak as someone who has been on hiring boards for ML positions. The absolute best thing you can do is what CarForumPoster laid out, write some actual ML code and display the results. If you have little/no experience then a certification could be the first step in figuring out how to do that effectively and may even be enough to get hired.

In terms of which certificate programs may look better on your resume, I don't think that a $5000 online course is necessarily better than something that's low-cost, but name recognition (from a university) does count for a little. MIT's Open Learning catalog is what I'd check first, they have a big catalog of stuff that is literally free and will sometimes charge just a little bit for a certificate but I'm not sure how much ML content is there.
https://openlearning.mit.edu/courses-programs/mitx-courses?f%5B0%5D=course_availability%3A62

Seventh Arrow
Jan 26, 2005

SirPablo posted:

Since this thread seems more active than the the AI thread and Data Science thread, figured I would ask here. I have a science background (Master's degree) and have been working with python for 10 years now. I'm trying to get more into ML, maybe even to switch careers. Are there any certification programs that employers actually value? I found some options listed here and here, but not sure if any of these are considered worthwhile. I'm not completely opposed to a graduate certificate from a university, but is that even necessary or just throwing money away. Any thoughts or recommendations?

All of these certifications are pretty useless from a resume/interview standpoint. As mentioned, having some projects on github would be good. If you want to pass a data science interview, you're probably going to want to grind leetcode-style puzzles because that's mainly what your coding test is going to consist of. You will also probably need to learn SQL as well.

edit: ok it might be a bit much to say that certifications are useless en toto, but from my own experience I would get one (1) cloud certification (AWS/Azure/GCP) and leave it at that.

Seventh Arrow fucked around with this message at 19:35 on Oct 8, 2022

QuarkJets
Sep 8, 2008

FredMSloniker posted:

So you think I should do something like:

Python code:
def section_1(section_data):
    # stuff
def section_2(section_data):
    # stuff
def section_3(section_data):
    # stuff
# and so on until
def section_1000(section_data):
    # stuff

section_dictionary = {
    "section_1": section_1,
    "section_2": section_2,
    "section_3": section_3,
    # and so on until
    "section_1000": section_1000
}

def process_state(game_state):
    section_dictionary[game_state["section_number"]](game_state["section_data"])

Or am I misunderstanding you? Because I'm really not seeing how a long list of functions followed by a long dictionary I have to roll by hand is better than:

Python code:
def section_1(section_data):
    # stuff
def section_2(section_data):
    # stuff
def section_3(section_data):
    # stuff
# and so on until
def section_1000(section_data):
    # stuff

section_dictionary = {name: value for name, value in locals().items() if callable(value) and name.startswith("section")}

def process_state(game_state):
    section_dictionary[game_state["section_number"]](game_state["section_data"])

I think that second form is fine, I just want to discourage direct lookup out of locals(): locals()[game_state["section_number"]...) would be less good

The second form is a pre-filtered version of locals() defined in the global scope while the module is being imported, so it is way more predictable than directly looking up stuff out of locals() from inside of a function

quote:

Why not? What are the hazards of using it, and why does it exist if I shouldn't use it?

locals() will give you a lookup table of all of the symbols available to the interpreter at the time of execution. Since it's returning all of the symbols there's the possibility of having a locally-defined variable (or even a function/class) that shadows one of your functions in the outer scope. Consider the hypothesis where you didn't define a pre-filtered function mapping and just tried to look up a function name directly out of a call to locals() from inside a function. Then, consider a scenario where you had temporarily redefined the function "section_2" inside of "section_3" (which you should not do, but we'll hypothesize; maybe it was done accidentally); then a call to locals() would find that locally-defined version of the function and you'd look up the wrong object when you called locals() again somewhere else, even though both objects have the same name.

The reason that most projects shouldn't use locals() is that it's not usually not necessary, so it's a code smell. But a code smell does not mean that you should never use it - it means that you should carefully assess whether it's necessary. This is a case where I think it makes sense to use it

SirPablo
May 1, 2004

Pillbug
Thanks for the recommendations, I'll focus on improving my GitHub presentation (which is zero!). My experience is certainly more hacky I feel, being one of the few coders in a scientific office where most of the software used by the staff is handed down from a contractor. Most of my work is quality-of-life type projects, somewhat small and only beneficial to what my office does.

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.

QuarkJets posted:

The reason that most projects shouldn't use locals() is that it's not usually not necessary, so it's a code smell. But a code smell does not mean that you should never use it - it means that you should carefully assess whether it's necessary. This is a case where I think it makes sense to use it

That's an interesting term, code smell. I can glark it, though. At any rate, I'll be back to the thread once I have more to show. Thanks for the help!

e: quick question. It's okay if I define function_dictionary after the functions that rely on it as long as I don't actually call any of those functions before then, right? I'm under the impression that Python isn't picky about that sort of thing.

FredMSloniker fucked around with this message at 22:00 on Oct 8, 2022

QuarkJets
Sep 8, 2008

FredMSloniker posted:

That's an interesting term, code smell. I can glark it, though. At any rate, I'll be back to the thread once I have more to show. Thanks for the help!

e: quick question. It's okay if I define function_dictionary after the functions that rely on it as long as I don't actually call any of those functions before then, right? I'm under the impression that Python isn't picky about that sort of thing.

It doesn't really matter whether the functions have been called yet or not, you just have to define function_dictionary after you've finished defining (or importing) all of the functions that you want it to contain.

The code inside of those functions won't execute until you actually call them, that's correct; not sure if that's also what you're asking. So storing the function names and pointers in a dict should be fine. You can also do goofy stuff like this:

Python code:
def temp():
    print(x)
x = 5
temp()
What I wrote here is pretty bad form, you usually shouldn't write code like this because it's hard for people to read. But sometimes it comes in handy.

FredMSloniker
Jan 2, 2008

Why, yes, I do like Kirby games.

QuarkJets posted:

The code inside of those functions won't execute until you actually call them, that's correct; not sure if that's also what you're asking.

What I mean is, in (say) Lua, if I had a function like

Lua code:
local look_at_thing = function(s)
    return my_big_table[s]
end

local my_big_table = {["foo"] = "bar", ["baz"] = "bow"}

print(look_at_thing("foo"))

I'd get the error attempt to index a nil value (global 'my_big_table'). It works if I take off the local, but in Lua, I want to keep the scope of things as small as possible, yes? But Python has only the vaguest notions of scope, from what I understand (is there any way to keep an import from grabbing everything in a module?), so as long as my_big_table is defined before I call look_at_thing, I'm good?

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡

SirPablo posted:

Thanks for the recommendations, I'll focus on improving my GitHub presentation (which is zero!). My experience is certainly more hacky I feel, being one of the few coders in a scientific office where most of the software used by the staff is handed down from a contractor. Most of my work is quality-of-life type projects, somewhat small and only beneficial to what my office does.

Here's some example single page apps to give you ideas.


Reddit post with a Network Diagram of NSFW subreddits and their moderators + an explanation and an interactive map - the interactive map

Plotly Dash (i.e. Python-only) Gallery

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply