|
hailthefish posted:ORM reduction 2-column foreign key of (gender_id, company_id) from table people to table company_gender_codes should allow your application to enforce its gender invariant
|
# ? Nov 12, 2018 13:59 |
|
|
# ? Jun 8, 2024 06:40 |
|
CRIP EATIN BREAD posted:the idea is you don't need a join at all, ever. the thing you described (make a column with a synthetic value, use it to look up data in another table if you need that data) is literally a join you're just pointlessly doing it in the client instead of in the database
|
# ? Nov 12, 2018 15:25 |
|
CRIP EATIN BREAD posted:you can use a lookup table like you suggest (join tables) but honestly you'll probably be fine just using a single character as your gender column and then using a separate table with no foreign key that identifies all the gender settings for a specific company, which will be a tuple like (company_id, code, description). obviously using a join table is "more correct" but constantly doing joins for some things is pointless. especially if they have changes, like say a company had A, B and C available, but then decided C isn't an option anymore. So now you'll also have to implement a "deleted" column and it all ends up being a lot of work when you could just use a column, and performance will probably be better avoiding the joins. if the performance hit of a join on a gender lookup table is an issue you probably need to rethink your database
|
# ? Nov 12, 2018 15:31 |
someone kill me, our dwh architect did finally see a blog post about big data so he now is convinced that we need big data solutions for subterabyte scale tabular data because whatever concoction he and our dba came up with can’t work through 10^8 rows of numerics and datetimes
|
|
# ? Nov 12, 2018 15:36 |
|
do your clients have a regulatory requirement for gender because if they dont then gender is varchar(255) or w/e
|
# ? Nov 12, 2018 15:47 |
|
yeah my point is with humans things get super complicated real quick in situations where normalization doesn't always make sense or is feasible. there's a lot of things you try ot treat as "flags" or whatever and it turns out that they really aren't flags and just are really a tag. uncurable mlady posted:do your clients have a regulatory requirement for gender because if they dont then gender is varchar(255) or w/e i'm going to assume yes here only because they are collecting it in the first place at all.
|
# ? Nov 12, 2018 15:57 |
uncurable mlady posted:do your clients have a regulatory requirement for gender because if they dont then gender is varchar(255) or w/e imo even if you are collecting not for a regulatory requirements then varchar is bad idea if you ever plan to analyse anything, you cant just give users variable inputs
|
|
# ? Nov 12, 2018 16:10 |
|
cinci zoo sniper posted:imo even if you are collecting not for a regulatory requirements then varchar is bad idea if you ever plan to analyse anything, you cant just give users variable inputs mefi did it iirc https://metatalk.metafilter.com/20050/Gender-Go-nuts-Somebody-did#838160 about 80% fall into easy categories but lol for the rest bob dobbs is dead fucked around with this message at 16:21 on Nov 12, 2018 |
# ? Nov 12, 2018 16:19 |
|
cinci zoo sniper posted:someone kill me, our dwh architect did finally see a blog post about big data so he now is convinced that we need big data solutions for subterabyte scale tabular data because whatever concoction he and our dba came up with can’t work through 10^8 rows of numerics and datetimes https://mobile.twitter.com/garybernhardt/status/600783770925420546
|
# ? Nov 12, 2018 16:30 |
|
cinci zoo sniper posted:someone kill me, our dwh architect did finally see a blog post about big data so he now is convinced that we need big data solutions for subterabyte scale tabular data because whatever concoction he and our dba came up with can’t work through 10^8 rows of numerics and datetimes lol if they can't handle such a small amount of rows (srsly 10^8 should not be a big deal for a properly managed rdbms) can't wait to see how their "big data" system performs.
|
# ? Nov 12, 2018 16:31 |
Finster Dexter posted:lol if they can't handle such a small amount of rows (srsly 10^8 should not be a big deal for a properly managed rdbms) can't wait to see how their "big data" system performs. i really dont mind waiting before seeing how his ""big data"" (its the guy who still cant ssh to copy-paste oneliners i wrote for him) performs, although im kinda safe here since technical implementation of "big data" would fall onto my shoulders so for now my strategy is to repeat "no, we do not need cassandra for this" every two weeks. im sure even i could figure out 10^8 rows with some indexes and maybe a few initial, expensive dtype conversion/normalisation/restructurisation operations to fix some warcrimes (like column that is literally varchar with 4 possible values - "first" through "fourth", with logically consistent with linguistics ordering wrt the entity tagging with those) normally yes but our dwh has 16gb ram lol bob dobbs is dead posted:mefi did it iirc i do this very frequently and yeah the other 20% is what you also need to do for business forecasting, e.g. "lol for the rest" is the actual work to be done. and then you have to deal with like up to half a dozen misspellings of a word *after* you santisie for whitespace and punctuation
|
|
# ? Nov 12, 2018 16:51 |
|
cinci zoo sniper posted:i do this very frequently and yeah the other 20% is what you also need to do for business forecasting, e.g. "lol for the rest" is the actual work to be done. and then you have to deal with like up to half a dozen misspellings of a word *after* you santisie for whitespace and punctuation you can also try reducing things with a phonetic approximation to get even closer.
|
# ? Nov 12, 2018 16:56 |
the weirdest part is that the guy previously worked on a dwh that was >10tb
|
|
# ? Nov 12, 2018 16:57 |
CRIP EATIN BREAD posted:you can also try reducing things with a phonetic approximation to get even closer. yeah i can do things like levenstein distance filtering to algorithmically combat minor misspellings but in the end of the day there will be good old manual labour to go through variations like a set - for instance other, etc, misc, miscellaneous, etcetera, n/a, not listed, not applicable - representing the same logical category. last case of this we had in production (partially varchar ticket resolution field for 7 years of history for customer support on 10^7 user scale) saw one of our guys go through the final 3000 variants of the field values that we couldn’t further reduce to summarise them to less than 20 resolution statuses
|
|
# ? Nov 12, 2018 17:03 |
|
cinci zoo sniper posted:the weirdest part is that the guy previously worked on a dwh that was >10tb I guess "worked on" means occasionally logged in to aws?
|
# ? Nov 12, 2018 17:19 |
Finster Dexter posted:I guess "worked on" means occasionally logged in to aws? weirdly enough no, t and l parts of etl. which is probably less weird in the hindsight since his main problem is e and the “metal” of anything that is not oracle
|
|
# ? Nov 12, 2018 17:22 |
|
CRIP EATIN BREAD posted:you can also try reducing things with a phonetic approximation to get even closer. Cool vid about a phonetic profanity filter https://m.youtube.com/watch?v=bJ5ppf0po3k
|
# ? Nov 12, 2018 17:37 |
|
I implemented a TTS for transcribing video to auto caption stuff for $AGENCY and I used Kaldi as the base with one of their pre trained models and it contains a LOT of racial slurs and generally bad terms. I’ve brought this up many times that they need to identify words they don’t want to get autotranslated (it will happen by accident etc) and they keep putting it off and now it’s gone live and I wonder how long before the software I’ve helped write will be on the news.
|
# ? Nov 12, 2018 17:56 |
|
they don’t have the time to pick words not to use but they did have plenty of time ensuring our in-dash dispatching and navigation system for drivers is blind accessible and 508 compliant
|
# ? Nov 12, 2018 18:09 |
|
accessibility is probably more important than bad words
|
# ? Nov 12, 2018 19:23 |
|
Shaggar posted:accessibility is probably more important than bad words read what I wrote. they wanted a system that was made for people driving cars to be usable by the blind.
|
# ? Nov 12, 2018 19:52 |
|
please don't be so ableist
|
# ? Nov 12, 2018 20:30 |
|
Powerful Two-Hander posted:in all seriousness though, you should use a join because if you don't and just have a single column someone is gonna be posting in this thread in future going "how the do I unfuck this single column that has some mysterious and arbitrary meaning" Let me tell you a secret friend, /* Leans in real close, in a quiet voice, "It already IS a hosed up single column that has mysterious and arbitrary meaning." I do appreciate the point of view for genders represented as IDs, therefore every organisation can specify their own genders and labels etc. Unfortunately, 1) there's already information stored for existing customers as gender in a single column that has mysterious and arbitrary meaning, 2) the assignment of meaning is hardcoded everywhere in the code base (including a hilarious SQL statement that assumes IF GENDER NOT "F" THEN "M") 3) and the real talk thing is, there's going to be a need to aggregate customers by genders over organisations. So assuming I let organisations specify id 1 = male for org A, id 123 = male for org B, id XYZ = male for org N, I still need to have some master mapping that states that "all these codes are considered male across all organisations". As the software is brutally shoved down the users throat, I'm thinking might as well just remove the choice from them to stop them from doing special snowflake things. Like, I get the ideological purity as genders represented as IDs, allowing for many genders per organisation but the reality of living in this cruel world with shitfuck sins of our forefathers architecture is pushing me to a solution where "M" is male. The gender as id solution is just niceties but it doesn't seem to bring anything to the table but more work for me.
|
# ? Nov 13, 2018 02:36 |
|
Oh, and efficiencies in join? Who cares! Codebase is so poo poo anyway we're spending a fortune on raw computing power to give the illusion that the system is just slow rather than unusable.
|
# ? Nov 13, 2018 02:37 |
|
floatman posted:Let me tell you a secret friend, your real issue is the hardcoded logic like if not f then m. switching your current system to a coded table with ids would be relatively simple, you just build a new table and insert the unique values from the current column and replace them with their newly generated ids. then you can add filtering in the UI for which client sees which ids. if at the end of the day the reality is you still need an M F or U to make a decision then the exercise is largely pointless. You can tell them instead you're going to use the hl7 administrative gender codes which are dumb but give you cover.
|
# ? Nov 13, 2018 03:14 |
|
floatman posted:Let me tell you a secret friend, do you work at epic? you're describing the exact situation I spent years trying to remedy
|
# ? Nov 13, 2018 05:01 |
|
implement a DSL using rows in a table as instructions so you can dynamically change the rules at any given time
|
# ? Nov 13, 2018 05:24 |
|
gender invariants should be maintained at the application level. change my mind
|
# ? Nov 13, 2018 05:33 |
|
represent gender as a set of floats in the range [0, 1] quantifying whatever aspects of gender are relevant to your application make them nullable and store it in a temporal table
|
# ? Nov 13, 2018 06:14 |
|
Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it.
|
# ? Nov 13, 2018 06:51 |
|
CPColin posted:Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it.
|
# ? Nov 13, 2018 07:01 |
|
CPColin posted:Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it. did that, uh, help? at all?
|
# ? Nov 13, 2018 07:08 |
redleader posted:represent gender as a set of floats in the range [0, 1] quantifying whatever aspects of gender are relevant to your application *sighs drawing 0.69 katana*
|
|
# ? Nov 13, 2018 07:09 |
|
Ultimately, each organisation's mapping of gender codes to labels will be sent from one system to another utilising a http endpoint that wraps comma separated string values in json to be further wrapped in an XML I wish I was joking.
|
# ? Nov 13, 2018 07:11 |
|
CPColin posted:Experts Exchange for a while had a class called FemaleCache because it was simpler to just hold in memory the entire set of member ID's who set their gender to Female in their profile than it would have been to query the database for this information. The class hung around for several years beyond the point when anything was reading from it. Was this cache primarily used to fulfill search requests for Females to harass
|
# ? Nov 13, 2018 07:48 |
|
caches full of women
|
# ? Nov 13, 2018 12:33 |
cjs: i asked for extra space to do pg_dumpall on dwh server and our genious sysops gave me a mount with 500 iops hard limit because we are so out of ssds on our hardware that id need to wait 40 days to get what i asked for, 500gb. guess the average iowait cpu utilisation for the last hour
|
|
# ? Nov 13, 2018 12:50 |
|
Ploft-shell crab posted:caches full of women lol
|
# ? Nov 13, 2018 13:19 |
|
let’s talk about http/3. good luck with nat lol
|
# ? Nov 13, 2018 14:01 |
|
|
# ? Jun 8, 2024 06:40 |
Shinku ABOOKEN posted:let’s talk about http/3. good luck with nat lol tfw u nat but it still sucks
|
|
# ? Nov 13, 2018 14:56 |