Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Jose Cuervo: Aug 25, 2004

Foxfire_ posted:

You have a meaningless-outside-your-application patient ID that uniquely identifies a patient. All of your deidentified tables reference that for knowing what patient a row is about. Then you have a table that maps from the the patient ID to all the PHI about that patient (MRN, names, DOB, ...). That table could be either inside the same database or elsewhere if you wanted to segregate the dataset into PHI and deidentified parts.

You might also need multiple PHI tables for some stuff, like if you want to store multiple different-system/different-site MRNs for one actual person. For each table in the schema, you should be able to classify it as PHI or not and shouldn't be mixing PHI and not-PHI together in the same table. If you've done stuff right, you ought to be able to hand someone all your non-PHI tables and they'd be able to follow information about a particular subject, but not have any PHI about them

I guess I am struggling to understand how I would structure this. Based on what has been said I would only need the MRN of the patient and the health system the patient belongs to to uniquely identify them. In the definition of the Patient model below I have set the primary key to be a compound key (pid, mrn, and health_system).

Python code:

class Patient(Base):
    __tablename__ = 'patients'

    mrn = Column(Integer, primary_key=True)
    health_system = Column(String, primary_key=True)
    pid = Column(Integer)
    first_name = Column(String)
    last_name = Column(String)
    date_of_birth = Column(Date) 

    diabetic = relationship('Diabetic', 
                            back_populates='patient', 
                            cascade='delete, all')


class Diabetic(Base):
    __tablename__ = 'diabetic'
    
    pid = Column(Integer, ForeignKey('patients.pid'), primary_key=True)
    date = Column(Date, primary_key=True)
    diabetic = Column(Boolean)  
    
    patient = relationship('Patient', back_populates='diabetic')

However,
1. Where do I get the value for pid from? Do I need to generate that separately each time I want to enter a new patient?
2. Suppose I now have data from a given patient (I know their MRN and health system). In order to get their pid, do I first perform a search in Patient to find the pid, and then use the pid to insert their data into the relevant tables?

EDIT: If the above is correct, I just tried adding data to the database while adding "autoincrement=True" to the pid:

Python code:

class Patient(Base):
    __tablename__ = 'patients'

    health_system = Column(String, primary_key=True)
    mrn = Column(Integer, primary_key=True)   
    pid = Column(Integer, autoincrement=True)
...

However, this does not automatically add a new unique integer to the pid column each time a new patient is added. Not sure what I am doing incorrectly here.

Jose Cuervo fucked around with this message at 20:55 on Mar 24, 2021

# ¿ Mar 24, 2021 20:41

Adbot: ADBOT LOVES YOU

# ¿ May 22, 2024 18:53

Jose Cuervo: Aug 25, 2004

Hollow Talk posted:

Postgres even has a UUID type, but sqlite doesn't support that.

Ok, so to get around the autoincrement issue in SQLite I decided to simply generate the unique ID (integer) myself (see check_add_patient below).

I am now working on trying to put everything together with code to add data to the database:

Python code:

def check_add_patient(provider, MRN, df, session):
    """
    Checks the 'patients' table to see if the
    patient exists and adds them if they do not.
    Returns the patient.
    """
    pt = session.query(ddb.Patient) \
                .filter(and_(ddb.Patient.provider == provider, 
                             ddb.Patient.mrn == MRN)) \
                .first()
    if pt:
        return pt
    
    session.add(ddb.Patient(mrn=int(MRN), provider=provider, 
                            pid=session.query(ddb.Patient).count() + 1, 
                            first_name=df.iloc[0]['First_Name'],
                            last_name=df.iloc[0]['Last_Name']))
    session.commit()
    return session.query(ddb.Patient) \
                .filter(and_(ddb.Patient.provider == provider, 
                             ddb.Patient.mrn == MRN)) \
                .first()
    

def add_dose_data(provider, MRN, df, session):
    pt = check_add_patient(provider, MRN, df, session)
    print(pt)

    dosing = []
    for _, row in df.iterrows():
        pt.dosing.append(ddb.Dose(date=row['Admin_Date'],
                                  med_name='ESA',
                                  units='Units',
                                  value=row['ESA_Dose'],
                                  description=row['Description']))
        session.commit()


Session = sessionmaker(bind=engine)
for MRN, MRN_df in esa.groupby('MRN'):
    add_dose_data('Health_System', MRN, MRN_df, Session())

The check_add_patient function successfully checks to see if a patient is already in the 'patients' table and does not throw the IntegrityError because I am trying to add someone who already exists. Is the way I have written that function un-pythonic or not the best way for a database?

In the add_dose_data function, should I only perform a session.commit() call once I have added all the rows of data?

I have not been able to figure out how to use context managers with this code - the documentation seems to say I should be using a context manager... Any pointers?

Finally, is there try-except block a standard way to deal with the IntegrityError which is thrown when I try to add in a row of data which has previously been added?

Thanks for all of the help so far.

# ¿ Mar 25, 2021 19:59

Jose Cuervo: Aug 25, 2004

QuarkJets posted:

Simplest way to implement a context manager: open a single session outside of your for loop with one. Example:
Python code:
Session = sessionmaker(bind=engine)
with Session() as session:
    for MRN, MRN_df in esa.groupby('MRN'):
        add_dose_data('Health_System', MRN, MRN_df, session)
You were opening 1 session per patient, and you weren't closing them. With a context manager you can just open 1 session, do all of your work, and then the context manager closes it for you. Or if you want 1 session per patient, you can just use the context manager inside of the for loop instead of outside of it.

To get even finer-grain than that, you could pass the Session (e.g. the session maker) and create sessions on the spot with a context manager using Session.begin(); then when the context closes the transaction is also committed for you, or a rollback occurs if an exception is raised. Much more slick than doing your own commits or rollbacks. Example:
Python code:
# Note: this function expects a session maker now
# e.g. Session instead of Session()
# You should probably call it something else for easy reading, like session_maker
def check_add_patient(provider, MRN, df, Session):
    ...
    with Session.begin() as session:
        session.add(ddb.Patient(mrn=int(MRN), provider=provider, 
                                pid=session.query(ddb.Patient).count() + 1, 
                                first_name=df.iloc[0]['First_Name'],
                                last_name=df.iloc[0]['Last_Name']))
    # Session is committed and closed here, no need commit
    ...

So (I think) I tried what you suggested, by wrapping the call to check_add_patient in a context manager and wrapping the .add() in a context manager as well:

Python code:

def check_add_patient(provider, MRN, df, session):
    """
    Checks the 'patients' table to see if the
    patient exists and adds them if they do not.
    Returns the patient.
    """
    pt = session.query(ddb.Patient) \
                        .filter(and_(ddb.Patient.provider == provider, 
                                     ddb.Patient.mrn == MRN)) \
                        .first()
    if pt:
        return pt
    
    session.add(ddb.Patient(mrn=int(MRN), provider=provider, 
                            pid=session.query(ddb.Patient).count() + 1, 
                            first_name=df.iloc[0]['First_Name'],
                            last_name=df.iloc[0]['Last_Name']))
    session.commit()
    return session.query(ddb.Patient) \
                    .filter(and_(ddb.Patient.provider == provider, 
                                 ddb.Patient.mrn == MRN)) \
                    .first()
    

def add_esa_data(provider, MRN, df, session_maker):
    """
    """
    with session_maker.begin() as session:
        pt = check_add_patient(provider, MRN, df, session)
    print(pt)

    dosing = []
    for _, row in df.iterrows():
        with session_maker.begin() as session:
            pt.dosing.append(ddb.Dose(date=row['Admin_Date'],
                                      med_name='ESA',
                                      units='Units',
                                      value=row['ESA_Dose'],
                                      description=row['Description']))


session_maker = sessionmaker(bind=engine)
for MRN, MRN_df in esa.groupby('MRN'):
    print(MRN)
    add_esa_data('Health_System', MRN, MRN_df, session_maker)

But running this produces a ResourceClosedError, which as far as I can tell is referring to the session being closed? But where is it being closed?

# ¿ Mar 26, 2021 01:44

Jose Cuervo: Aug 25, 2004

I have written a function in Python named load_clustering. I have used this function in a number of places in other code. I now realise that a more appropriate name for this function is load_classification. Is there a way to modify the name of the function so that from now on I can use it as load_clustering, but keep the old name as well so that the older code with the old name of load_clustering does not break?

# ¿ Jul 26, 2021 19:12

Jose Cuervo: Aug 25, 2004

CarForumPoster posted:

Pycharm has refactoring options that I'd bet cover this use case by simply replacing that function where called.

I would normally do that but another person is using the code as well, and for the sake of argument assume I cannot get them to refactor the code which they are using.

# ¿ Jul 26, 2021 19:22

Jose Cuervo: Aug 25, 2004

Okay, it seems like the right thing to do is refactoring and getting the other person to start using the new code. Will do. Thanks.

# ¿ Jul 26, 2021 19:31

Jose Cuervo: Aug 25, 2004

In JupyterLab how would I change the terminal from Windows Powershell to the anaconda prompt? Been searching for a while and I thought

conda init

might be what does it, but that does not seem to work.

# ¿ Mar 18, 2022 19:43

Jose Cuervo: Aug 25, 2004

CarForumPoster posted:

I use Jupyter Notebook but the general way you get to the "Anaconda" terminal from command prompt is by running activate (activate.bat on windows). This isn't a Jupyter Lab specific answer though. I have a batch file that runs a jupyter notebook on a schedule this way.

Explained a bit more: https://stackoverflow.com/questions/46305569/how-to-make-batch-files-run-in-anaconda-prompt

Got it, thanks.

# ¿ Mar 19, 2022 00:26

Jose Cuervo: Aug 25, 2004

Not saying this is the optimal way to do this, but I have a file params.py which looks like this:

Python code:

#params.py
top_threshold = None
lag = None
delta = None

In short, it contains the various parameters for an algorithm (the actual list is about 15 parameters). Now I would like to make several runs of the algorithm using different parameter values. Using a Juptyer notebook, I have imported the module in the first cell and then in the second cell I assign the new parameter values as follows:

Python code:

import params

params.top_threshold = 15
params.lag = 2
params.delta = 3.9

However, when I run the actual algorithm by calling it in subsequent cells, an error is thrown in the code whenever one of the parameters is used because the parameter is still None. Why?

# ¿ Sep 13, 2022 19:20

Jose Cuervo: Aug 25, 2004

Based on the replies what I wrote was confusing.

I have a module named params.py which I was using to store the names of variables which are used by an algorithm so that in the algorithm (defined in another module named algorithm.py which imports params.py) I can simply write params.lag whenever I wish to use the lag variable. In my Jupyter notebook I import both params.py and algorithm.py at the very top, then in the following cell set the value of each parameter, and then finally call the algorithm (e.g., algorithm.run_alg()).

I think the explanation by Foxfire_ is what is happening and is why this is not working. I also agree with QuarkJets that using a dataclass is what I should do and just pass the parameters as an argument.

# ¿ Sep 14, 2022 03:23

Jose Cuervo: Aug 25, 2004

I am looking through free text strings (short hospitalization reasons) for the word 'parathyroidectomy'. I am able to do simple string matching (e.g., looking for 'para' in example_string), but this type of searching assumes that the word has been spelled correctly and will not catch paarthyroidectomy, even though that would be a relevant result. Is there any library which would help me search these strings for misspelled matches?

# ¿ Oct 14, 2022 19:28

Jose Cuervo: Aug 25, 2004

I have a large amount of blood glucose time series data from here: https://public.jaeb.org/datasets/diabetes. Each row of the time series data has at least 4 columns - an identifier of which trial the data belongs to, an identifier of which subject from the trial the data belongs to, the date and time of the blood glucose, and the actual value of the blood glucose, and conservatively there are over 20 million rows.

Given a list of subjects and a date time for each subject, I would like to be able to quickly obtain the next 2 hours of blood glucose data for each of those subjects, and I will be performing this query repeatedly.

I have no real experience with databases, but given the size of the data it seems like it would be best to store this data in a database and then run queries to extract the data I want.

Is a Sqlite database with a single table the best way to store this data given the queries I want to run? The single table would have four columns (trial ID, subject ID, date time, and blood glucose value).

# ¿ Oct 30, 2022 19:34

Jose Cuervo: Aug 25, 2004

Twerk from Home posted:

SQLite with a multi-column index on subject id and timestamp should work well. If you think you'll need to query by timestamp only but not subject only, reverse that order.

https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

The subject ID is not unique across different trials, so I would want a multi-column index with trial ID, subject ID, and timestamp, right? Or would it be more efficient to create a new column (say trial_subject) which concatenates trial ID and subject ID which would be a unique identifier for each subject's data, and then use a multi-column index with trial_subject and timestamp?

I will never need to query by timestamp only because even subjects in the same trial participated over different weeks / months, depending on when they enrolled, so I will always be specifying the trial, the subject, and the timestamp.

# ¿ Oct 30, 2022 21:37

Jose Cuervo: Aug 25, 2004

Foxfire_ posted:

Don't make a new column that's duplicating information from other columns. Having the primary key be (trial ID, subject ID, draw time timestamp) is fine if those are always unique. If those can be duplicated (i.e. data entry error, or two measurements from the same tube of blood), just have a surrogate primary key (autocreated meaningless ascending integer). Throw some indexes at it later if it actually is slow, but I'd be surprised if it matters.

You shouldn't overthink this; it's not a particularly large database and it's going to be bottlenecked by python slowness, not the database. Even for a poorly designed schema, most of the time is going to be spent creating python object versions of the results, not looking up the results themselves

Making up pessimistic numbers:
- 64 bytes text trial ID
- 64 bytes text subject ID
- 64 bytes timestamp because SQLite does datatypes bizarrely and will be storing text
- 8 bytes of floating point measurement
= 200 bytes per record * 20 million rows => ~4GB of database. That isn't particularly big.

So I am running into my first problems. I create my database and table as follows:

Python code:

import sqlite3

conn = sqlite3.connect('bg_time_series.db')
c = conn.cursor()
c.execute("""CREATE TABLE blood_glucose (
            trial_ID text,
            subject_ID text,
            date_time text,
            bg integer
            )""")

I have my time series data stored as CSV files (different CSV files for each of the trials), and I can read each CSV file into a pandas DataFrame. My plan then is to iterate over each row in the DataFrame and add that row of data to the database as follows:

Python code:

for _, row in df.iterrows():
    values_str = f'DCLP3, {row["SID"]}, {row["Date_Time"]}, {row["Value"]}'
    print(values_str)
    c.execute(f"INSERT INTO blood_glucose VALUES ({values_str})")

Which prints out the following error:

code:

DCLP3, DCLP3-001-001, 2017-12-02 00:05:33, 131

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
<ipython-input-42-c97328299c38> in <module>
      2     values_str = f'DCLP3, {row["SID"]}, {row["Date_Time"]}, {row["Value"]}'
      3     print(values_str)
----> 4     c.execute(f"INSERT INTO blood_glucose VALUES ({values_str})")

OperationalError: near "00": syntax error

Not quite sure how to troubleshoot this.

Jose Cuervo fucked around with this message at 02:08 on Oct 31, 2022

# ¿ Oct 31, 2022 02:04

Jose Cuervo: Aug 25, 2004

QuarkJets posted:

Don't form your own string of insert values, use placeholder binding instead:
https://docs.python.org/3/library/sqlite3.html#sqlite3-placeholders

Thanks. I am going to read through the documentation more carefully which is what I should have done in the first place.

Zoracle Zed posted:

Because DCLP3 isn't quoted, it's being interpreted as a variable, not a string. Ideally you want your database api to handle string escaping for you because 1) it's a huge headache and 2) the whole sql injection thing once you need to worry about malicious data, etc. Should look more like this (can't remember if sqlite uses %s or just % as the placeholder):
code:
c.execute("INSERT INTO blood_glucose VALUES (%s, %s, %s, %s)", ('DCLP3, row["SID"], row["Date_Time"], row["Value"]))
note instead there's also a pandas method for dumping a dataframe to sqlite: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

but... if you've already got the data in memory as a dataframe, are you sure you want/need to use a database at all, instead of working directly on the dataframe?

I have a single one of the trials in a DataFrame, but there are multiple trials and I estimate there will be about 20 million rows or more once everything is read in, and I figured this would be a good time to use a database.

Foxfire_ posted:

You're handing "INSERT INTO blood_glucose VALUES (DCLP3, DCLP3-001-001, 2017-12-02 00:05:33, 131)" with all the values substituted into a string to the database engine to execute. That's problematic because it has to parse the individual values back out, and its parsing expects to see strings in quotes, not bare. For this insert, it'd want:
code:
INSERT INTO blood_glucose VALUES ("DCLP3", "DCLP3-001-001", "2017-12-02 00:05:33", 131)
You still don't want to try to do that escaping yourself because it's hard and error-prone (if this were important and random people had access to it, bugs would also lead to security problems i.e. https://xkcd.com/327/).

Instead of handing a mixed data+instructions query to the database engine, you want to be telling it "Execute this purely instructions query (that is coming from a string constant) using these purely-data parameter bindings"

To do that with python's sqlite interface, you use either "?" for the parameter values, or ":somename" for a named one, then provide the bindings when you run the query. QuarkJets is pointing you at the documentation for it.

The query would look something like this:
Python code:
c.execute(
    "INSERT INTO blood_glucose VALUES (?, ?, ?, ?)", 
    ("DCLP3", row["SID"], row["Date_Time"], row["Value"])
)
or
Python code:
c.execute(
    "INSERT INTO blood_glucose VALUES (:study, :subject, :draw_time, :measurement)", 
    {
        "study": "DCLP3", 
        "subject": row["SID"], 
        "draw_time": row["Date_Time"], 
        "measurement": row["Value"])
    }
)
There is also executemany() that will insert multiple rows at once.

Other stuff:
- If you don't ever want to have null/blank values, add "NOT NULL" to the columns when you create the database. It will yell at you if you try to do an insert or edit that does that.
- If your sqlite installation is new enough, adding STRICT to the table creation options will turn on strict mode and disable a bunch of Javascript-esque "when I ask for something that doesn't make sense, apply type conversions instead of reporting an error"
- If it were me and this was something that I might keep around after a couple month gaps, I'd stick the units for the glucose measurement in the column name

This made things work just right. I am going to read through the documentation some more, but things seem to be working just fine now and I seem to be able to query with no issues.

# ¿ Nov 1, 2022 02:07

Jose Cuervo: Aug 25, 2004

Finally getting a chance to keep working on this database. This is the code I have to set up and populate the database:

Python code:

import sqlite3
import pandas as pd

conn = sqlite3.connect('study_data.db')
c = conn.cursor()
c.execute("""CREATE TABLE blood_glucose (
            study_name text NOT NULL,
            SID text NOT NULL,
            date_time text NOT NULL,
            bg_mg_per_dL integer NOT NULL
            )""")

def insert_dataframe_data(study_name, df, c):
    for _, row in df.iterrows():
        c.execute(
            "INSERT INTO blood_glucose VALUES (:study_name, :SID, :date_time, :bg_mg_per_dL)", 
            {
                'study_name': study_name,
                'SID': row['SID'], 
                'date_time': row['Date_Time'], 
                'bg_mg_per_dL': row['Value']
            }
        )

for study_name in ['dclp3', 'wisdm', 'dclp5', 'sence']:
    df = pd.read_csv(f'/data/{study_name}_cgm_plus_features.csv')
    
    print(study_name)
    insert_dataframe_data(study_name, df, c)
    conn.commit()

conn.close()

This code generates a database with a single table with just over 25 million rows.

I then have the following code to obtain each time series subsequences I want for the list of (SID, date_time) pairs I need data for (on average there are between 10 to 100 pairs):

Python code:

ts_data = []
for SID, date_time in SID_date_time_pairs:
        ts_data += conn.execute("""
                                SELECT SID, date_time, bg_mg_per_dL from blood_glucose 
                                WHERE SID == :SID
                                AND datetime(date_time) >= datetime(:date_time)
                                AND datetime(date_time) =< datetime(:date_time, '+24 hours')
                                """,
                               {'SID': SID,
                               'date_time': str(date_time)}).fetchall()

return pd.DataFrame(ts_data, columns=['SID', 'Date_Time', 'Value'])

The problem is that searching for a time series subsequence from a single pair takes several seconds, where I thought it would be much faster (sub 1 second), so the overall for loop can take minutes to run if there are 100 pairs to iterate over. Am I doing something wrong?

# ¿ Nov 19, 2022 02:16

Jose Cuervo: Aug 25, 2004

UraniumAnchor posted:

My first off the hip guess is that you want to add an index to whatever column(s) you're searching on, otherwise it's doing a linear scan. Try checking https://www.sqlite.org/lang_createindex.html and see if that improves your times.

That did the trick, thank you!

# ¿ Nov 19, 2022 04:17

Jose Cuervo: Aug 25, 2004

Lets say I have a pandas dataframe with a date column and I want to iterate over all the unique dates in that column.

Is

Python code:

for date in df['Dates'].unique():
    pass

equivalent to

Python code:

unique_dates = df['Dates'].unique()

for date in unique_dates:
    pass

As in, in the first bit of code the "df['Dates'].unique()" only gets evaluated once, right?

# ¿ Dec 20, 2022 04:40

Jose Cuervo: Aug 25, 2004

Back with a SQLite question. I want to query a table and return all rows where the value in the SID column matches one of the SIDs in a list that I have.

I have tried

Python code:

c.execute("""SELECT SID, start_date FROM start_dates
            WHERE SID IN SIDs_to_use""",
          {'SIDs_to_use': SIDs_to_use}).fetchall()

where 'SIDS_to_use' is a Python list containing 10 SIDs. The error message I get is 'OperationalError: no such table: SIDs_to_use' and I cannot figure out how to get rid of it.

Edit: Figured out I was missing the :

Python code:

c.execute("""SELECT SID, start_date FROM start_dates
            WHERE SID IN :SIDs_to_use""",
          {'SIDs_to_use': SIDs_to_use}).fetchall()

but now the error is 'OperationalError: near ":SIDS_to_use": syntax error'.

Jose Cuervo fucked around with this message at 21:26 on Jan 27, 2023

# ¿ Jan 27, 2023 21:23

Jose Cuervo: Aug 25, 2004

QuarkJets posted:

It seems like what you're trying to do is create a statement that looks like this:
Python code:
"""SELECT SID, start_date FROM start_dates
            WHERE SID IN (1, 2, 3, ...)"""
But assuming that SIDs are integers, you can't just provide a list and have it auto-convert to a string-encased tuple; the typing is all wrong, SQLite wants you to be a lot more specific than that. You need to have a statement that explicitly maps each SID to a binding as part of a statement:
Python code:
c.execute("""SELECT SID, start_date FROM start_dates
            WHERE SID IN (?, ?, ?, ?)""", [sid1, sid2, sid3, sid4])
Since you just want to query every item in the list, it's pretty easy to generate that mapping yourself:
Python code:
sids = [1, 2, 5, 7, 10]  # for example
sid_bindings = ",".join("?"*len(sids))  # becomes "?,?,?,?,?"
cur.execute(f'select SID, start_date from start_dates
    where SID in ({sid_bindings})', sids).fetchall()
Since we need 1 binding per list entry, we create a string with the correct number of bindings then insert it using an f-string.

Got it. Is writing basic SQL statements like this the best way to interact with a database, or should I be trying to learn SQLAlchemy or something similar instead?

# ¿ Jan 28, 2023 19:55

Jose Cuervo: Aug 25, 2004

I am running into an issue. I have a Jupyter notebook which contains code that I want to run multiple times using different parameters. The table in the SQLite3 database from which the data will be pulled depends on a parameter passed in (there is one table with 4 hour data, one table with 6 hour data, and one table with 8 hour data). From reading stackoveflow posts (for example this one) it looks like the table name cannot be parameterized. The reasoning behind this is that they want to prevent SQL injection.

There are ways around this:

Python code:

select_str = f'SELECT SID, start_dt FROM tss_{tss_length_hrs}hr'
trlnt_SDTs = c.execute(f'{select_str} WHERE study_name == "trlnt"').fetchall()

where I create the select string first and then use it in the query.

I understand this is bad practice in general, but in this particular context where I am the user and will be supplying the tss_length_hrs string which will not be malicious, it should be ok, right?

# ¿ Feb 4, 2023 17:45

Jose Cuervo: Aug 25, 2004

QuarkJets posted:

SQL Injection is only possible if you accept unsanitized user input, there is nothing wrong with what you've written here. Put the f-string directly into the execute statement if you want, even. Just so long as you control the content of 'tss_length_hrs' and aren't just slamming in whatever a user provides. Dictionary dispatch would work well for that

Can you elaborate on what you mean by dictionary dispatch and also explain why it would work well in this context?

# ¿ Feb 5, 2023 02:33

Jose Cuervo: Aug 25, 2004

QuarkJets posted:

Sure, say that you want a user to be able to select which table is queried via a command line argument. Passing input directly to the query is dangerous because it would leave you vulnerable to SQL injection, but you can easily pass that value to a dictionary that maps to only valid names:
Python code:
user_input = sys.argv[1] # or use argparse
table_name = {'table1': 'table1',
              'table2': 'table2_with_nulls,
              'usernames': 'usernames'}[user_input]
This is dictionary dispatch; it's just mapping some input to a set of outputs. You could use the table_name variable here to build a SQL query without issue, because you know the valid table names already. If an invalid table name is provided then an exception is raised. If someone tries to attack you with an SQL injection input then it just dies at this dictionary lookup

Got it, thanks!

# ¿ Feb 5, 2023 03:36

Jose Cuervo: Aug 25, 2004

I keep having SQL questions which I cannot seem to locate the answer to:

This code works just fine and returns the blood glucose values which fall in the 6 hour interval

Python code:

c.execute("""
             SELECT dt_index, bg_mg_per_dL from blood_glucose
             WHERE SID == :SID
             AND datetime(dt_index) >= :start_dt
             AND datetime(dt_index) < datetime(:start_dt, '+6 hours')
             """,
             {'SID': SID,
              'start_dt': str(start_dt)}).fetchall()

Now I want to generalize the code and be able to specify a different time interval length:

Python code:

c.execute("""
             SELECT dt_index, bg_mg_per_dL from blood_glucose
             WHERE SID == :SID
             AND datetime(dt_index) >= :start_dt
             AND datetime(dt_index) < datetime(:start_dt, '+:tss hours')
             """,
             {'SID': SID,
              'start_dt': str(start_dt),
              'tss': str(tss_length_hrs)}).fetchall()

but the above code returns an empty list.

I know I can write this query using fstrings (because I am the one writing the query so there is not any user input to worry about sanitizing):

Python code:

query = f"SELECT dt_index, bg_mg_per_dL from blood_glucose " \
        f"WHERE SID == '{SID}' " \
        f"AND datetime(dt_index) >= '{start_dt}' " \
        f"AND datetime(dt_index) < datetime('{start_dt}', '+{tss_length_hrs} hours')"

but I would like to know how to write this query "properly".

# ¿ Feb 28, 2023 02:15

Jose Cuervo: Aug 25, 2004

Seventh Arrow posted:

Not even the SQL thread?

Sorry I missed that but will ask in that thread in the future. I had previously asked SQLite questions in here and been answered, so more I meant my googling of search terms etc had failed.

Zoracle Zed posted:

just guessing but I bet your SQL driver is (correctly, imo) not interpolating the tss variable here:
code:
datetime(:start_dt, '+:tss hours')
because it's inside a string literal. try it something like this, (modulo whatever the appropriate string concatenation operator is for your db)
code:
datetime(:start_dt, STR_CONCAT('+', :tss, ' hours'))

String concatenation is the search term I needed to look for. In SQLite the concatenation operator is ||, so the following code works:

Python code:

c.execute("""
             SELECT dt_index, bg_mg_per_dL from blood_glucose
             WHERE SID == :SID
             AND datetime(dt_index) >= :start_dt
             AND datetime(dt_index) < datetime(:start_dt, '+' || :tss || ' hours')
             """,
             {'SID': SID,
              'start_dt': str(start_dt),
              'tss': tss_length_hrs}).fetchall()

Thanks!

# ¿ Mar 1, 2023 03:51

Jose Cuervo: Aug 25, 2004

I am trying to parallelize (using the joblib module) some code which involves querying a database (the query I had issues with earlier).

Python code:

conn = sqlite3.connect('study_data.db')
c = conn.cursor()

parallel_ts3 = joblib.Parallel(n_jobs=n_cpus)(
                        joblib.delayed(diag.parallel_tss_entry)(c, sub_SID_SDTs, tss_length_hrs)
                        for sub_SID_SDTs in chunked_iterable(SID_SDTs, chunk_size))

The above code produced the following error:

code:

"""
Traceback (most recent call last):
  File "/home/bjl2n/.conda/envs/jdrf/lib/python3.9/site-packages/joblib/externals/loky/backend/queues.py", line 153, in _feed
    obj_ = dumps(obj, reducers=reducers)
  File "/home/bjl2n/.conda/envs/jdrf/lib/python3.9/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "/home/bjl2n/.conda/envs/jdrf/lib/python3.9/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "/home/bjl2n/.conda/envs/jdrf/lib/python3.9/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'sqlite3.Cursor' object

...

PicklingError: Could not pickle the task to send it to the workers.
"""

Is there something I can do differently to run the code in parallel?

# ¿ Mar 1, 2023 22:07

Jose Cuervo: Aug 25, 2004

C2C - 2.0 posted:

More questions:

I used Pandas to create a DataFrame from my .json file and am working on learning MatplotLib to create graphs from the .json data. Some of the fields in the .json (like 'bpm', 'year', et. al.) are integers but they're saved as strings in the .json file. Short of going back and editing the .json file itself, is there a way to order the strings that are actually integers in either ascending or descending order? Right now, my x- and y-axes are displaying the values in the order (I assume) they're appearing in the DataFrame. Would using int() on those strings then creating a function to order them be possible? Here's one of the graphs as-is with the DataFrame ordering (again, I assume):

(Some of the .json entries don't have BPM values yet as I haven't calculated all of them; thus the initial empty BPM value on the x-axis)

You can use df['column_name'].astype(int) to convert the entries in a column to integers (note this will throw an error if any row entry in that column cannot be converted into an integer, e.g., if the entry is 'dog').

# ¿ Mar 1, 2023 22:08

Jose Cuervo: Aug 25, 2004

Zoracle Zed posted:

first: sqlite (afaik) isn't great for parallel writing, so this is only worth doing if each task is only reading from the database.

second: the error you have there says "TypeError: cannot pickle 'sqlite3.Cursor' object". Can you see how to rearrange some things so the cursor object doesn't need to be shared between the jobs running in parallel?

The queries are only reading from the database.

And yes, I actually do not need to share the cursor but I did not realize that until you pointed it out, thanks!

# ¿ Mar 1, 2023 22:35

Jose Cuervo: Aug 25, 2004

I have a sqlite database with a table with the following columns: SID (the subject ID that the data belongs to), dt_index (contains the date and time that the blood glucose value was collected), and bg_mg_per_dL (the blood glucose value).

I now have a list of tuples where each tuple contains a SID and a start date and time indicating that I want to retrieve the blood glucose values for that SID starting at that start date and time and for the next 6 hours.

Is the best (fastest?) way to achieve this to use a for loop? The length of the list of tuples can be anywhere from 5 entries to 1000.

Python code:

forecasts = []
for SID, start_dt in the_matches:
	forecasts.append(c.execute("""
	                                 SELECT dt_index, bg_mg_per_dL from blood_glucose 
					 WHERE SID == :SID
					 AND datetime(dt_index) >= datetime(:date_time)
					 AND datetime(dt_index) < datetime(:date_time, '+' || :hrs || ' hours')
					""",
					{'SID': SID,
					 'date_time': start_dt,
					'hrs': 6}).fetchall())

Jose Cuervo fucked around with this message at 21:26 on May 25, 2023

# ¿ May 25, 2023 21:24

Jose Cuervo: Aug 25, 2004

pmchem posted:

what's the easiest way to plot county-by-county geographic data in python these days along with some good data sources?

like say I want to plot rainfall, or average seasonal temperatures, and mosquito density for a county-by-county map of the USA. counties colored by amount. googling this leads down many not entirely useful rabbit holes.

are arcgis and geopandas the current hotness? does arcgis only interface with... arcgis, which I generally understand is a nonfree api?

I have used geopandas in the past and found it very powerful (for creating static maps, if you want interactive maps I don't know).

County shapefiles can be found here at different resolutions: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html

Perhaps some of the data you need can be found here: https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/county/mapping

Although they have the data plotted at the county level already, so perhaps you can just use those maps if they are plotting the data you want plotted (they allow you to export as a png)?

# ¿ Jul 11, 2023 13:49

Jose Cuervo: Aug 25, 2004

Writing a small Flask website that displays a bar chart. The bar chart displays the data for a single unit and there are 12 units. I want to easily navigate between the bar chart for each of the 12 units. I know I can build a form with a drop down selector and a submit button, and when I click the submit button the entire page will reload with the new data. I want to avoid reloading the entire page (and just update the bar chart) by using the drop down selector and fetch to accomplish this functionality.

What event do I need to listen out for with the drop down selector? I tried using 'click', but since you have to click once to bring up the list of options the fetch is being fired before a new unit has been selected (note that qrData is a variable which is declared earlier on the page, and I am trying to set it to the new data passed back by the "/qr_data" route).

JavaScript code:

    selectBtn.addEventListener('click',
                               function() {
                                 console.log('Clicked on the drop down');
                                 fetch("/qr_data", {method: "POST"})
                                    .then(newData => qrData=newData);
                               }
                              );

EDIT: Figured it out. I need to listen for the 'change' event.

Jose Cuervo fucked around with this message at 00:55 on Oct 3, 2023

# ¿ Oct 2, 2023 19:31

Jose Cuervo: Aug 25, 2004

I want to make sure that a string that is passed in has the year-month format 'yyyy-mm'. So for example, '2022-09' would work or even '2022-9', but not '2022 09' or '22-09' etc. Should I be trying to use a regular expression for this?

# ¿ Oct 6, 2023 03:13

Jose Cuervo: Aug 25, 2004

thought I must be missing some slick way to use regular expressions for this task, but I I think I will stick to checking the length of the string is 7 or 6. If the length is 7, then I will check for the 5th character being '-', the last two characters need to be one of 12 options ('01' through '12'), and that the first 4 four characters are a number between 2006 and 2023 (there is no data pre-2006). If the length is 6, then the only thing that changes is that the last character needs to be one of 9 options ('1' through '9').

# ¿ Oct 6, 2023 19:03

Jose Cuervo: Aug 25, 2004

I have been watching Corey Schafer's Python Flask tutorial on youtube and have made it to past part 5:
https://www.youtube.com/watch?v=44PvX0Yv368
where he restructures the app to avoid issues with circular dependencies etc, and past part 7:
https://www.youtube.com/watch?v=803Ei2Sq-Zs
where he creates the user account etc.

The tutorial is somewhat old and uses the old style of SQLAlchemy queries, and I was trying to update the way in which the models were defined to be in the declarative mapping style.
This is the version which worked:

Python code:

File: __init__.py

from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_login import LoginManager

app = Flask(__name__)
app.config['SECRET_KEY'] = 'mysecretkeygoeshere'
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///tutorial_database.db'

db = SQLAlchemy(model_class=Base)

login_manager = LoginManager(app)

Python code:

File: models.py

from tutorial import db, login_manager
from flask_login import UserMixin

class User(db.Model, UserMixin):
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(120), unique=True, nullable=False)
    password = db.Column(db.String(60), nullable=False)

This code worked fine (it follows along with what Corey has in his tutorial).

After reading the SQLAlchemy documentation and the flask-sqlalchemy documentation (https://flask-sqlalchemy.palletsprojects.com/en/3.1.x/models/), I updated the files to the following:

Python code:

from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_login import LoginManager
from sqlalchemy import MetaData
from sqlalchemy.orm import DeclarativeBase


class Base(DeclarativeBase):
    metadata = MetaData(naming_convention={
        "ix": 'ix_%(column_0_label)s',
        "uq": "uq_%(table_name)s_%(column_0_name)s",
        "ck": "ck_%(table_name)s_%(constraint_name)s",
        "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
        "pk": "pk_%(table_name)s"
    })


app = Flask(__name__)
app.config['SECRET_KEY'] = 'mysecretkeygoeshere'
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///tutorial_database.db'

db = SQLAlchemy(app=app, model_class=Base)

login_manager = LoginManager(app)

Python code:

File: models.py

from tutorial import db, login_manager
from flask_login import UserMixin
from sqlalchemy import Integer, String, Float, Boolean, DateTime, ForeignKey
from sqlalchemy.orm import Mapped, mapped_column, relationship


class User(db.Model, UserMixin):
    __tablename__ = 'users'

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    email: Mapped[str] = mapped_column(String(120), unique=True, nullable=False)
    password: Mapped[str] = mapped_column(String(60), nullable=False)

however, now when I attempt to run the website I get the following error message:
"sqlalchemy.exc.ArgumentError: Class '<class 'tutorial.models.User'>' already has a primary mapper defined."

I have tried looking this up but cannot find any results which point me towards what I am doing wrong.

# ¿ Nov 15, 2023 23:06

Jose Cuervo: Aug 25, 2004

boofhead posted:

I'm not a flask/sqlalchemy person but is it possible that there already exists a table mapped to the User class (created during previous steps) called "user", and now in the latest step you're trying to tell it that the User class actually maps to an overriding table called "users" ? It looks like you added this in after already creating the table from the class:
code:
__tablename__ = 'users'
and I think flask-sqlalchemy might automatically create table names from class names by just snakecasing it, rather than automatically adding plurals. Because the error you've posted seems to suggest that there's a conflict where the 'User' class is already mapped to a table somewhere, and it's not the table specified by __tablename__

try removing the above line from the code and see if that works

I tried that but then it fails with the following error:
"sqlalchemy.exc.InvalidRequestError: Class <class 'tutorial.models.User'> does not have a __table__ or __tablename__ specified and does not inherit from an existing table-mapped class."

I also tried using the name 'user' but that did not work either (threw the same error message).

Generic Monk posted:

I want to say that they're providing the table name like that because 'user' is a reserved word in postgres. Better solution than mine when following that tutorial which was to call it 'AppUser' lmao

Agree on starting the database afresh and doing a db.create_all() just eliminate database fuckery as a cause tho

I named the table 'users' because that is what was suggested in Section 2.3 of the book 'SQLAlchemy 2 in Practice' by Miguel Grinberg where he states "A very common naming convention for database tables is to use the plural form of the entity in lowercase, so in this case the table [whose class name is Product] is given the products name. This contrasts with the convention used for the model class names, which prefers the singular form in camel case."

Both errors are being thrown when I try to run my populate_db.py script where I have the following code:

Python code:

File populate_db.py

from tutorial import app, db, bcrypt

with app.app_context():
    db.drop_all()
    db.create_all()

    # Add Test Users to DB
    db.session.add(User(email='test@tutorial.com',
                        password=bcrypt.generate_password_hash('password').decode('utf-8')))
    db.session.commit()

Surprisingly the code works if I do not try to include the Base class, that is I keep the original code for __init__.py:

Python code:

File: __init__.py

from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_login import LoginManager

app = Flask(__name__)
app.config['SECRET_KEY'] = 'mysecretkeygoeshere'
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///tutorial_database.db'

db = SQLAlchemy(app)   # <------ I made a mistake when copy pasting originally and this is what I should have had in the original code for __init__.py, not db = SQLAlchemy(model_class=Base)

login_manager = LoginManager(app)

but have the declarative style of model class definition in models.py:

Python code:

File: models.py

from tutorial import db, login_manager
from flask_login import UserMixin
from sqlalchemy import Integer, String, Float, Boolean, DateTime, ForeignKey
from sqlalchemy.orm import Mapped, mapped_column, relationship


class User(db.Model, UserMixin):
    __tablename__ = 'users'

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    email: Mapped[str] = mapped_column(String(120), unique=True, nullable=False)
    password: Mapped[str] = mapped_column(String(60), nullable=False)

# ¿ Nov 16, 2023 16:11

Jose Cuervo: Aug 25, 2004

SQLAchemy question here. I have two models:

Python code:

class Subject(Base):
    __tablename__ = 'subjects'

    id: Mapped[int] = mapped_column(primary_key=True)
    first_name: Mapped[str] = mapped_column(String(60), nullable=False)
    last_name: Mapped[str] = mapped_column(String(60), nullable=False)
    visits: Mapped[list['Visit']] = relationship(cascade='all, delete-orphan',
                                                                      back_populates='subject')

class Visit(Base):
    __tablename__ = 'visits'

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    date: Mapped[str] = mapped_column(DateTime, nullable=False)
    amount_spent: Mapped[int] = mapped_column(Integer, nullable=False)
    units: Mapped[str] = mapped_column(String, nullable=False)
    subject_id: Mapped[int] = mapped_column(Integer, ForeignKey('subjects.id'), index=True)
    subject: Mapped['Subject'] = relationship(back_populates='visits')

I now want to write a query which returns the most recent visit for each subject in the database.
This is the query I have so far, but it does not work (results in a DetachedInstanceError error message):

Python code:

    q = select(Visit) \
                .join(Subject.visits) \
                .order_by(Visit.date.desc()).limit(1)

Any thoughts on what I am doing wrong?

# ¿ Dec 7, 2023 17:52

Jose Cuervo: Aug 25, 2004

Vulture Culture posted:

DetachedInstanceError usually results from trying to query without a session, or using a session that's already been closed. What's your full error text, and what's the code that actually executes this statement?

I guess I should have said I am using a Jupyter notebook and in previous cells I have been able to successfully run other (simpler) queries against the database using the with Session() context managers, so I do not think the issue is that the session has already been closed.

The code for executing the statement is:

Python code:

with Session() as session:
    with session.begin():
        q = select(Visit) \
                .join(Subject.visits) \
                .order_by(Visit.date.desc()).limit(1)
        r = session.scalars(q).all()
        print(r[0])

and the full error message is:

Python code:

DetachedInstanceError                     Traceback (most recent call last)
File ~\Anaconda3\envs\cam\Lib\site-packages\IPython\core\formatters.py:708, in PlainTextFormatter.__call__(self, obj)
    701 stream = StringIO()
    702 printer = pretty.RepresentationPrinter(stream, self.verbose,
    703     self.max_width, self.newline,
    704     max_seq_length=self.max_seq_length,
    705     singleton_pprinters=self.singleton_printers,
    706     type_pprinters=self.type_printers,
    707     deferred_pprinters=self.deferred_printers)
--> 708 printer.pretty(obj)
    709 printer.flush()
    710 return stream.getvalue()

File ~\Anaconda3\envs\cam\Lib\site-packages\IPython\lib\pretty.py:410, in RepresentationPrinter.pretty(self, obj)
    407                         return meth(obj, self, cycle)
    408                 if cls is not object \
    409                         and callable(cls.__dict__.get('__repr__')):
--> 410                     return _repr_pprint(obj, self, cycle)
    412     return _default_pprint(obj, self, cycle)
    413 finally:

File ~\Anaconda3\envs\cam\Lib\site-packages\IPython\lib\pretty.py:778, in _repr_pprint(obj, p, cycle)
    776 """A pprint that just redirects to the normal repr function."""
    777 # Find newlines and replace them with p.break_()
--> 778 output = repr(obj)
    779 lines = output.splitlines()
    780 with p.group():

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\engine\row.py:245, in Row.__repr__(self)
    244 def __repr__(self) -> str:
--> 245     return repr(sql_util._repr_row(self))

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\sql\util.py:598, in _repr_row.__repr__(self)
    595 def __repr__(self) -> str:
    596     trunc = self.trunc
    597     return "(%s%s)" % (
--> 598         ", ".join(trunc(value) for value in self.row),
    599         "," if len(self.row) == 1 else "",
    600     )

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\sql\util.py:598, in <genexpr>(.0)
    595 def __repr__(self) -> str:
    596     trunc = self.trunc
    597     return "(%s%s)" % (
--> 598         ", ".join(trunc(value) for value in self.row),
    599         "," if len(self.row) == 1 else "",
    600     )

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\sql\util.py:565, in _repr_base.trunc(self, value)
    564 def trunc(self, value: Any) -> str:
--> 565     rep = repr(value)
    566     lenrep = len(rep)
    567     if lenrep > self.max_chars:

File models.py:93, in Visit.__repr__(self)
      92 def __repr__(self):
 ---> 93     return f"{self.date.strftime('%Y-%m-%d')}"

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\orm\attributes.py:566, in InstrumentedAttribute.__get__(self, instance, owner)
    564 except AttributeError as err:
    565     raise orm_exc.UnmappedInstanceError(instance) from err
--> 566 return self.impl.get(state, dict_)

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\orm\attributes.py:1086, in AttributeImpl.get(self, state, dict_, passive)
   1083 if not passive & CALLABLES_OK:
   1084     return PASSIVE_NO_RESULT
-> 1086 value = self._fire_loader_callables(state, key, passive)
   1088 if value is PASSIVE_NO_RESULT or value is NO_VALUE:
   1089     return value

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy \orm\attributes.py:1116, in AttributeImpl._fire_loader_callables(self, state, key, passive)
   1108 def _fire_loader_callables(
   1109     self, state: InstanceState[Any], key: str, passive:  PassiveFlag
   1110 ) -> Any:
   1111     if (
   1112         self.accepts_scalar_loader
   1113         and self.load_on_unexpire
   1114         and key in state.expired_attributes
   1115     ):
-> 1116         return state._load_expired(state, passive)
   1117     elif key in state.callables:
   1118         callable_ = state.callables[key]

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\orm\state.py:798, in InstanceState._load_expired(self, state, passive)
    791 toload = self.expired_attributes.intersection(self.unmodified)
    792 toload = toload.difference(
    793     attr
    794     for attr in toload
    795     if not self.manager[attr].impl.load_on_unexpire
    796 )
--> 798 self.manager.expired_attribute_loader(self, toload, passive)
    800 # if the loader failed, or this
    801 # instance state didn't have an identity,
    802 # the attributes still might be in the callables
    803 # dict.  ensure they are removed.
    804 self.expired_attributes.clear()

File ~\Anaconda3\envs\cam\Lib\site-packages\sqlalchemy\orm\loading.py:1582, in load_scalar_attributes(mapper, state, attribute_names, passive)
   1580 session = state.session
   1581 if not session:
-> 1582     raise orm_exc.DetachedInstanceError(
   1583         "Instance %s is not bound to a Session; "
   1584         "attribute refresh operation cannot proceed" % (state_str(state))
   1585     )
   1587 no_autoflush = bool(passive & attributes.NO_AUTOFLUSH)
   1589 # in the case of inheritance, particularly concrete and abstract
   1590 # concrete inheritance, the class manager might have some keys
   1591 # of attributes on the superclass that we didn't actually map.
   1592 # These could be mapped as "concrete, don't load" or could be completely
   1593 # excluded from the mapping and we know nothing about them.  Filter them
   1594 # here to prevent them from coming through.

DetachedInstanceError: Instance <Visit at 0x166c3b74250> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/20/bhk3)

EDIT: You are correct. I was attempting to access r[0] from outside the cell/ scope of the context manager.

So that solves the error message, but I would still appreciate any pointers on how to write the correct query, because what I have is not correct. As written it only returns a single result, but if I remove the .limit(1) then it returns all visits ordered by date, and not just the single most recent visit for each subject.

Jose Cuervo fucked around with this message at 02:17 on Dec 8, 2023

# ¿ Dec 7, 2023 22:02

Jose Cuervo: Aug 25, 2004

Jose Cuervo posted:

SQLAchemy question here. I have two models:

Python code:

class Subject(Base):
    __tablename__ = 'subjects'

    id: Mapped[int] = mapped_column(primary_key=True)
    first_name: Mapped[str] = mapped_column(String(60), nullable=False)
    last_name: Mapped[str] = mapped_column(String(60), nullable=False)
    visits: Mapped[list['Visit']] = relationship(cascade='all, delete-orphan',
                                                                      back_populates='subject')

class Visit(Base):
    __tablename__ = 'visits'

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    date: Mapped[str] = mapped_column(DateTime, nullable=False)
    amount_spent: Mapped[int] = mapped_column(Integer, nullable=False)
    units: Mapped[str] = mapped_column(String, nullable=False)
    subject_id: Mapped[int] = mapped_column(Integer, ForeignKey('subjects.id'), index=True)
    subject: Mapped['Subject'] = relationship(back_populates='visits')

Python code:

    q = select(Visit) \
                .join(Subject.visits) \
                .order_by(Visit.date.desc()).limit(1)

Any thoughts on what I am doing wrong?

Once I addressed the issue of trying to access the result outside of the context manager, the query which returns what I want is:

Python code:

q = Subject.id, func.max(Visit.date)) \
        .join(Subject.visits) \
        .group_by(Visit.subject_id) \
        .order_by(Subject.last_name)

# ¿ Dec 8, 2023 19:14

Jose Cuervo: Aug 25, 2004

Is there a way to have a Python script run automatically when a file (or really a set of files) is placed in a folder? I think I have to set up some sort of process which listens(?) for files being placed in the folder, and then once that is detected I can trigger the script, but I don't know what keywords to look for to start researching the solution.

Fuller story:
I have a script that cleans / formats 4 or sometimes 5 different CSV files. This process occurs 4 times during the first four days of the month. Right now I am the only one who runs the script, and the two people who rely on the output are not able to run the script. I want to avoid running into an issue where I am sick (or hit by a bus) and they are unable to access the data they need. I want to be able to designate a folder to them, and have them copy-paste the raw CSV files into the folder. Once they have done that, ideally something(?) would start the script running to clean / format the CSV files, and output the cleaned / formatted files into a different folder.

# ¿ Dec 13, 2023 19:34

Adbot: ADBOT LOVES YOU

# ¿ May 22, 2024 18:53

Jose Cuervo: Aug 25, 2004

I use PyCharm community edition as my IDE for developing a flask application. When I run my Flask application through PyCharm, it sets up what I believe is a local (local to my computer) development server which then allows me to go to http://127.0.0.1:5000 and access the website from there.

My question is, can anyone else on the network I am on also navigate to http://127.0.0.1:5000 and see the website? Or am I the only one who can see the website because the server is on my computer?

# ¿ Mar 13, 2024 01:39

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›3 »