Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
QuarkJets
Sep 8, 2008

Falcon2001 posted:

Speaking of plots and datavis/etc, I'm curious how people would approach this problem:

I've been playing around with Advent of Code stuff, which has a lot of problems involving 2d maps, often on cartesian planes (so X/Y can be negative) - I've struggled quite a bit with storing/handling these. Here's one example (note that probably storing the actual full grid is not the right solution for this puzzle, but it's indicative of the style of problem): https://adventofcode.com/2022/day/15 - I'm trying to get some stuff in order for this year's advent.

I've used numpy ndarray, forcing the indexes to work by offsetting things, but that was super weird to work with and my code became very hard to read. I've recently moved over to using a pandas dataframe with a custom index ( as seen here: https://stackoverflow.com/questions/53494616/how-to-create-a-matrix-with-negative-index-position) but that has it's own weirdnesses.

You might say 'well just store the points in a list!' but there's a fair number of these problems where you're doing neighbor lookups regularly, and having the full grid populated is actually very useful; not to mention debugging/checking your work becomes a lot easier when you have a fully populated grid. I actually threw together my own ndarray -> image function using PIL to turn an ndarray into a graphical representation based on the integer value, but I'm increasingly thinking I'm doing it wrong.

So the question: What's the correct (or at least painless) way to store (and ideally visualize) a 2d cartesian grid where each coordinate can be an arbitrary data type? I'm not looking for code golf style solutions, as I'm using these problems as ways to explore concepts/etc, and so it's more useful to have something that I can dig through and visualize/debug/interact with than a perfect impenetrable one-line solution.

I'm not sure why you would need an arbitrary datatype, the whole problem is constrained by some integer arrays and 1 pair of integer coordinates (the offset of the starting point for the array). 0 if a beacon is not present, 1 if it is. Another array for sensors, another array for "beacon cannot be here" positions. I think that's it

If you want to get fancy you can define your own numpy dtype. 3 integer arrays or 1 array where the datatype is 3 integers, two ways of representing the same problem.

Adbot
ADBOT LOVES YOU

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

QuarkJets posted:

I'm not sure why you would need an arbitrary datatype, the whole problem is constrained by some integer arrays and 1 pair of integer coordinates (the offset of the starting point for the array). 0 if a beacon is not present, 1 if it is. Another array for sensors, another array for "beacon cannot be here" positions. I think that's it

If you want to get fancy you can define your own numpy dtype. 3 integer arrays or 1 array where the datatype is 3 integers, two ways of representing the same problem.

Yeah, the arbitrary datatype part was more for other problems, as this one absolutely can be handled with a single integer even, as there's no real overlap between the three states. And honestly this problem in particular is probably really solved via math around distances to various points. But it does sound like dataframes are maybe a decent place to keep iterating on.

Zoracle Zed posted:

a dict or defaultdict with a tuple of ints as the key type would be my first try

eta: not great for visualization though

Yep, exactly. I'd love to be able to visualize these setups quickly like they do in the explanations (which I've sort of got working with my PIL solution) here's what one of their example explainers looks like:

code:
               1    1    2    2
     0    5    0    5    0    5
-2 ..........#.................
-1 .........###................
 0 ....S...#####...............
 1 .......#######........S.....
 2 ......#########S............
 3 .....###########SB..........
 4 ....#############...........
 5 ...###############..........
 6 ..#################.........
 7 .#########S#######S#........
 8 ..#################.........
 9 ...###############..........
10 ....B############...........
11 ..S..###########............
12 ......#########.............
13 .......#######..............
14 ........#####.S.......S.....
15 B........###................
16 ..........#SB...............
17 ................S..........B
18 ....S.......................
19 ............................
20 ............S......S........
21 ............................
22 .......................B....

StumblyWumbly posted:

Have you tried using DataFrame.iloc calls with Pandas? That can let you have weirdo indices and also check neighbors

I think from reading that since it's using integer based indexing it falls into the whole negative values are interpreted as underflow issue, but I'll dig into it. .loc has been fine so far for testing.

So it sounds like my dataframe approach isn't insane, at least for ones where visualizing it is reasonable (there was a falling sand puzzle previously that definitely benefited from that), so I'll stick with it for now. I think any time it's super slow is an example of a time where populating the whole map probably isn't a good idea anyway.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Falcon2001 posted:

So it sounds like my dataframe approach isn't insane, at least for ones where visualizing it is reasonable (there was a falling sand puzzle previously that definitely benefited from that), so I'll stick with it for now. I think any time it's super slow is an example of a time where populating the whole map probably isn't a good idea anyway.

You are doing it like a multiindex and not just a tuple valued index right?

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Chin Strap posted:

You are doing it like a multiindex and not just a tuple valued index right?

Not a tuple valued index no, just a standard 2d dataframe with custom indices - here's the generation code.

Python code:
self.df = pd.DataFrame(arr, index=range(min_y_index, max_y_index), columns=range(min_x_index, max_x_index))

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Falcon2001 posted:

Not a tuple valued index no, just a standard 2d dataframe with custom indices - here's the generation code.

Python code:
self.df = pd.DataFrame(arr, index=range(min_y_index, max_y_index), columns=range(min_x_index, max_x_index))

Oh I missed you only wanted 2d. This is fine, or you could make a thin wrapper around a numpy ndarray with an offset for each axis stored so that when you do `x[i, j]` it actually calls `x[i + x_min, j + y_min]` maybe. Could be faster for algorithms.

Zoracle Zed
Jul 10, 2001

Falcon2001 posted:

Yep, exactly. I'd love to be able to visualize these setups quickly like they do in the explanations (which I've sort of got working with my PIL solution) here's what one of their example explainers looks like:

the ascii grid isn't too bad:
code:
grid = {}
grid[0, 0] = 'O'
grid[-2, 1] = 'B'
grid[0, 3] = 'x'
grid[2, -2] = '#'

for y in range(4, -4, -1):
    row = (grid.get((x, y), '.') for x in range(-4, 4))
    print(''.join(row))

........
........
......#.
........
....O...
..B.....
........
....x...
........
........

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Zoracle Zed posted:

the ascii grid isn't too bad:
code:
grid = {}
grid[0, 0] = 'O'
grid[-2, 1] = 'B'
grid[0, 3] = 'x'
grid[2, -2] = '#'

for y in range(4, -4, -1):
    row = (grid.get((x, y), '.') for x in range(-4, 4))
    print(''.join(row))

........
........
......#.
........
....O...
..B.....
........
....x...
........
........

Yeah, that's what I started with, and I've got a helper function that creates an ASCII grid from my dataframes, but once you get past a certain size it's hard to read, and it's not like text editors are well made for that sort of horizontal/vertical scrolling/etc.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Related to all this pandas discussion: what's y'all's take on the future of pandas now that polars exists? I know that polars doesn't yet do everything that pandas does (though I'm not an expert on the details here), but it does a hell of a lot, and with a nicer API and vastly more speed. I'll be curious to see polars' impact over time given that it seems to be surging in popularity.

Personally, I've switched over to polars when I can, though I often still bang something out in pandas when I need something quick and the execution speed differences don't really matter.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Zugzwang posted:

Related to all this pandas discussion: what's y'all's take on the future of pandas now that polars exists? I know that polars doesn't yet do everything that pandas does (though I'm not an expert on the details here), but it does a hell of a lot, and with a nicer API and vastly more speed. I'll be curious to see polars' impact over time given that it seems to be surging in popularity.

Personally, I've switched over to polars when I can, though I often still bang something out in pandas when I need something quick and the execution speed differences don't really matter.

I think "nicer API" is a bit of a stretch. Haven't used it because at Google we don't really have approved Rust support yet (cross language compilation with all the custom things we do takes a lot of support and it is still definitely alpha) but it looks more like a thin layer over SQL style operations than the pandas library. Never felt super pythonic to me to have to write something like (stolen from reddit)

code:
# pandas
prices_df.loc['2023-03'] *= 1.1

# polars
polars_df.with_column(
    pl.when(pl.col('timestamp').is_between(
        datetime('2023-03-01'),
        datetime('2023-03-31'),
        include_bounds=True
    )).then(pl.col('val') * 1.1)
    .otherwise(pl.col('val'))
    .alias('val')
)
All the pl.col stuff especially makes me unhappy. Some of this is definitely a "I'm used to it" pandas bias and "I come from R and it looks more like R" pandas bias. Any time you have meaningful indices across dataframes and want to do operations the auto alignment that pandas does can really clean up code compared to having to do joins everywhere in polars (if I'm messing any of these points up with polars ignorance feel free someone to call me out).

Also there is something to be said for a much more mature package with more books and other packages that work with it. If the dataframe interchange protocols get off the ground there is less of an argument here. And you can always convert directly to a pandas dataframe for those api boundaries I guess (but I'm guessing that doing that too often will cause slowdowns too).

But I can't deny the speed. And there is definitely something to the fact that the API has been able to be built without an extreme amount of baggage it carries around. Pandas is trying its best to make things more consistent and streamlined but deprecation and removal take forever. Copy-on-write in Pandas 2 is probably going to become the default, in which case by Pandas 3 they are liable to remove all inplace operations and make things immutable the way Polars does which would be a big step forward.

I know some people don't like the chaining syntax of Polars, but that's how modern Pandas should be written more often than not anyway (and again how you see it with tidyverse stuff in R so it is the style I am used to).

I am a Pandas contributor though so part of my Polars reluctance may also just be clinging to what I know. I certainly would be giving polars a shot if I could, but would probably wind up going with pandas for non-performance critical parts because of the syntax.

BUUNNI
Jun 23, 2023

by Pragmatica
I'm a total python noob. I'm trying to get more in-depth info on list comprehensions, it's one of the python things that I'm having some issue coding and using, since I don't think there is anything similar in Java and my instructor insists that we use them for everything.

I was wondering if there was any resources I could be pointed to that goes over, for example, how to use a dict of lambda functions to sort a list of lists using a one-liner list comprehension. I already got the MetaSnake python book and it's been very helpful, so thank you to whoever suggested it.

BUUNNI fucked around with this message at 01:58 on Oct 21, 2023

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Chin Strap posted:

I think "nicer API" is a bit of a stretch. Haven't used it because at Google we don't really have approved Rust support yet (cross language compilation with all the custom things we do takes a lot of support and it is still definitely alpha) but it looks more like a thin layer over SQL style operations than the pandas library. Never felt super pythonic to me to have to write something like (stolen from reddit)

code:
# pandas
prices_df.loc['2023-03'] *= 1.1

# polars
polars_df.with_column(
    pl.when(pl.col('timestamp').is_between(
        datetime('2023-03-01'),
        datetime('2023-03-31'),
        include_bounds=True
    )).then(pl.col('val') * 1.1)
    .otherwise(pl.col('val'))
    .alias('val')
)
All the pl.col stuff especially makes me unhappy. Some of this is definitely a "I'm used to it" pandas bias and "I come from R and it looks more like R" pandas bias. Any time you have meaningful indices across dataframes and want to do operations the auto alignment that pandas does can really clean up code compared to having to do joins everywhere in polars (if I'm messing any of these points up with polars ignorance feel free someone to call me out).

Also there is something to be said for a much more mature package with more books and other packages that work with it. If the dataframe interchange protocols get off the ground there is less of an argument here. And you can always convert directly to a pandas dataframe for those api boundaries I guess (but I'm guessing that doing that too often will cause slowdowns too).

But I can't deny the speed. And there is definitely something to the fact that the API has been able to be built without an extreme amount of baggage it carries around. Pandas is trying its best to make things more consistent and streamlined but deprecation and removal take forever. Copy-on-write in Pandas 2 is probably going to become the default, in which case by Pandas 3 they are liable to remove all inplace operations and make things immutable the way Polars does which would be a big step forward.

I know some people don't like the chaining syntax of Polars, but that's how modern Pandas should be written more often than not anyway (and again how you see it with tidyverse stuff in R so it is the style I am used to).

I am a Pandas contributor though so part of my Polars reluctance may also just be clinging to what I know. I certainly would be giving polars a shot if I could, but would probably wind up going with pandas for non-performance critical parts because of the syntax.
Thanks for the perspective (and for contributing to pandas!). I agree 100% that using a well-established, mature package has a lot to be said for it. Basically everything in the Python data world supports pandas, or even relies on it. Given that, I should probably just get a lot better at pandas, and (at least for now) use polars for use cases when performance matters a lot. I'll check out that book you recommended.

QuarkJets
Sep 8, 2008

BUUNNI posted:

I'm a total python noob. I'm trying to get more in-depth info on list comprehensions, it's one of the python things that I'm having some issue coding and using, since I don't think there is anything similar in Java and my instructor insists that we use them for everything.

List compressions are pretty intuitive, maybe you could just ask about the details you're having trouble with, they can't be that ba...

quote:


I was wondering if there was any resources I could be pointed to that goes over, for example, how to use a dict of lambda functions to sort a list of lists using a one-liner list comprehension.

... okay

What your instructor wants you to learn is that if you can write a for loop that appends entries to a list, then you can turn that into a list comprehension. It doesn't matter whether that's a nested for loop or includes conditions and break points, you can always refactor down to a list comprehension. The simplest way to figure this out is to write some dumb nested for loops that append to a list first, then build up the list comprehension starting from the innermost loop. This also lets you prove to yourself that your answer is correct, because you should feel confident about the nested append version

The biggest challenge is writing something that looks clean, people like to think that they're writing clever "one liners" that are really 10 lines long once you apply a character limit, and that look like incomprehensible bullshit. The key is to offload logic to other functions, which can also be easily tested. You can use these same functions in a dumb nested for loop or a comprehension, so it's easy to think your way through what's happening

StumblyWumbly
Sep 12, 2007

Batmanticore!
Two tips for writing list comprehension: include a comment about what you're trying to do, and use descriptive variable names, even if it triples the number of characters.

List comprehension are often obvious only while you are writing them.

Also it seems like the kind of thing chatgpt would be good at, but I have not tested that

CarForumPoster
Jun 26, 2013

⚡POWER⚡

StumblyWumbly posted:

Also it seems like the kind of thing chatgpt would be good at, but I have not tested that

Its better than me, but thats not hard.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
RealPython has a good article on list comprehensions: https://realpython.com/list-comprehension-python/

(Their stuff is usually good, FYI)

BUUNNI
Jun 23, 2023

by Pragmatica

StumblyWumbly posted:

Also it seems like the kind of thing chatgpt would be good at, but I have not tested that

We're not allowed to use AI for our examinations :v:

StumblyWumbly posted:

Two tips for writing list comprehension: include a comment about what you're trying to do, and use descriptive variable names, even if it triples the number of characters.

List comprehension are often obvious only while you are writing them.

Also it seems like the kind of thing chatgpt would be good at, but I have not tested that

Thank you for this btw

BUUNNI fucked around with this message at 03:42 on Oct 24, 2023

Son of Thunderbeast
Sep 21, 2002

Falcon2001 posted:

Why not use UUIDv4 instead of generating this weird custom hashing thing? It's basically guaranteed to not collide and v4 is generated from a seed. The collision chance of UUIDv4 is essentially zero - one explanation put it at 'generate a million UUIDv4s a day for 100 years and you'll have a 50% chance of a single duplicate.'

Just wanted to update real quick--I reached a point where I was more or less at v1 and started looking at optimizations and the first thing I did was replace generate_p_index with a simple uuid.uuid4() call, and it's over 3x faster now lmao

Much appreciated!

TasogareNoKagi
Jul 11, 2013

quote:

https://code.djangoproject.com/ticket/373
Support database tables with composite keys. Created: 18 years ago. Status: Open.
https://code.djangoproject.com/ticket/6148
Support database schemas. Created: 16 years ago. Status: Open.

Guess I won't be using Django then.

duck monster
Dec 15, 2004

TasogareNoKagi posted:

Guess I won't be using Django then.

Its actually coming reasonably soon though. Theres literally work happening right now to get CompositeField in. Its apparently fairly tricky though and its been a work in progress for quite a few months.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

TasogareNoKagi posted:

Guess I won't be using Django then.

Why would you use composite keys (idk poo poo about DBs)

Data Graham
Dec 28, 2009

📈📊🍪😋



Sometimes you want a natural key that isn’t an auto-increment because you have to regularly import new data destructively over the old data and FK relationships have to keep working; and sometimes the only way to ensure row uniqueness that way is combining multiple columns together. That’s what I’m dealing with right now, it’s a legacy ETL process and SQLalchemy supports it (though not very intuitively), but Django looks like a non-starter.

We’re probably going to use uuid pk’s instead anyway though, the composite key thing makes my brain hurt

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Even then you almost certainly want a synthetic primary key plus a separate unique constraint on the set of columns you think should be unique.

TasogareNoKagi
Jul 11, 2013

duck monster posted:

Its actually coming reasonably soon though. Theres literally work happening right now to get CompositeField in. Its apparently fairly tricky though and its been a work in progress for quite a few months.

"Soon" isn't "now", though. Though it may be moot; PHB learned that I was using Python and demanded I rewrite it all in Typescript :rolleyes:

Jabor posted:

Even then you almost certainly want a synthetic primary key plus a separate unique constraint on the set of columns you think should be unique.

That was the suggested "solution" I saw while reading up on this holy war in DB design. In my view, it only solves the "foreign keys are icky composite" :airquote: issue :airquote:, while adding additional complexity and storage space. Having the additional unique constraint means the DB is still going to create and maintain a composite index on those columns, so it isn't solving anything, just hiding it.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
The things it solves are major issues that you run into when you're trying to operate a system in reality instead of as a homework assignment that will be thrown away after it's been graded.

Notably, you now have a realistic path for changing your constraint when (not if, when) it turns out that your initial domain modelling wasn't correct.

Also you're actually saving space rather than spending more of it, since other tables referencing your synthetic key are smaller than if they had to reference your entire composite key.

duck monster
Dec 15, 2004

TasogareNoKagi posted:

"Soon" isn't "now", though. Though it may be moot; PHB learned that I was using Python and demanded I rewrite it all in Typescript :rolleyes:


Yeah I've had this happen twice before (In once case the demand was to rewrite in PHP). In both cases it resulted in sharply written resignation letters (And in the case of the PHP one, the whole loving team followed me out the door. GG shittyboss. That company doesnt exist no more. One day people might learn that 30 years of experience matters.)

rowkey bilbao
Jul 24, 2023
How would you store a string describing a series of key:values pairs in a concise human editable way, that can be safely parsed into a native python dict ? I'd rather not rely on eval() for exemple

I wrote a function with re.search and splits and it doesn't feel elegant at all.

spiritual bypass
Feb 19, 2008

Grimey Drawer
Would CSV work?
fart,1
butt,2
etc,3

rowkey bilbao
Jul 24, 2023
I should add: it's going to be on a single line with few values, keys and values are ASCII letters, it can look like this: "name:steph,position=shelter".

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I always lean towards existing standards so that I can use built in tools and make it easier for other systems to interact with it, so have you considered something like JSON?

12 rats tied together
Sep 7, 2006

1. pick 2 delimiters: one for key+value and one for element. split, split, assign.

2. use a markup language. yaml toml and json all allow mappings to be expressed in a single line. toml comes included in python but yaml is likely the most human readable. make sure to read the documentation for your yaml library because it does some type inference that might be surprising to you.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
Just use JSON.

rowkey bilbao
Jul 24, 2023
Json is ok but it needs doubles quotes which add some visual noise. It probably makes the most sense though. Thanks for the pointers.

QuarkJets
Sep 8, 2008

rowkey bilbao posted:

How would you store a string describing a series of key:values pairs in a concise human editable way, that can be safely parsed into a native python dict ? I'd rather not rely on eval() for exemple

I wrote a function with re.search and splits and it doesn't feel elegant at all.

I would use json

Foxfire_
Nov 8, 2010

Does human editable mean 'editable by a programmer' (use something standard like everyone else has said) or 'editable by Bob from Marketing who needs explicit very friendly error messages on typos'?

The Fool
Oct 16, 2003


Foxfire_ posted:

Does human editable mean 'editable by a programmer' (use something standard like everyone else has said) or 'editable by Bob from Marketing who needs explicit very friendly error messages on typos'?

code:
pd.DataFrame(data).to_excel('output.xlsx', index=False)

Macichne Leainig
Jul 26, 2012

by VG

The Fool posted:

code:
pd.DataFrame(data).to_excel('output.xlsx', index=False)

Then the fun begins when you have to reingest that XLSX :shepicide:

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Macichne Leainig posted:

Then the fun begins when you have to reingest that XLSX :shepicide:

I think Pandas reads it alright, but I'm sure there's complications.

Macichne Leainig
Jul 26, 2012

by VG
There is a read_excel but let me tell you about Bob from Marketing's inability to enter data consistently

I'd rather just use a CSV if I was doing anything tabular but even then that's not foolproof (and necessitates that the Bobs from Marketing know how to perform such an operation to begin with)

I feel like I'm having deja vu, didn't we have a discussion in this very thread a few pages back about ingesting data? Or is my brain finally loving with me too much

spiritual bypass
Feb 19, 2008

Grimey Drawer
Data ingestion is a common problem and CSV is always the solution

Adbot
ADBOT LOVES YOU

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

spiritual bypass posted:

Data ingestion is a common problem and CSV is always the solution

Speaking of which, and this isn't really a Python question, but is there an open source CSV editor that has some of the functionality of excel without all the...overhead? It'd be nice to find something I can use to just muck around quickly with tabular data without having to constantly be like 'no, don't accidentally make it an xlsx, stop formatting it' etc - not that it's a super big problem or anything.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply