Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Luigi Thirty
Apr 30, 2006

Emergency confection port.

Pie Colony posted:

that's because CR isn't a line break

it is on Mac and Apple II

thanks stebe

Adbot
ADBOT LOVES YOU

vodkat
Jun 30, 2012



cannot legally be sold as vodka
Anyone got any tips for dealing with lovely json in python?

I've got a bunch of scraped json data that needs to be put into dataframes but there are errant commas, apostrophes and quotation marks in there (gently caress people and their stupid names and company) that keep breaking the standard parsing libraries and its driving me insane :psyduck:

spiritual bypass
Feb 19, 2008

Grimey Drawer
sounds malformed imo

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

ideally you should get the other party to fix their json

whether or not you can do that (but ESPECIALLY if you can't do that and need to come up with some loose frankenparser hack), put it all down in writing and make sure SOMEONE knows that you're being sent corrupted data and any complications or delays resulting are the other party's fault

vodkat
Jun 30, 2012



cannot legally be sold as vodka

NihilCredo posted:

ideally you should get the other party to fix their json

whether or not you can do that (but ESPECIALLY if you can't do that and need to come up with some loose frankenparser hack), put it all down in writing and make sure SOMEONE knows that you're being sent corrupted data and any complications or delays resulting are the other party's fault

lol im a grad student, no one values my time, not even me :negative:

Shaggar
Apr 26, 2006

vodkat posted:

Anyone got any tips for dealing with lovely json in python?

I've got a bunch of scraped json data that needs to be put into dataframes but there are errant commas, apostrophes and quotation marks in there (gently caress people and their stupid names and company) that keep breaking the standard parsing libraries and its driving me insane :psyduck:

BUT JSON IS SO GOOD AND MUCH BETTER THAN XML!! HOW CAN THIS BE???L?

qhat
Jul 6, 2015


write a custom json parser to deal specifically with your lovely json, it's not hard

Doom Mathematic
Sep 2, 2008

Shaggar posted:

BUT JSON IS SO GOOD AND MUCH BETTER THAN XML!! HOW CAN THIS BE???L?

How do you deal with corrupted or invalid XML, typically?

necrotic
Aug 2, 2005
I owe my brother big time for this!
Regex that poo poo into the ground of course

Share Bear
Apr 27, 2004

vodkat posted:

lol im a grad student, no one values my time, not even me :negative:

is it possible at all to work with the source to fix their bug

do they make it obvious they're using something specifically lovely and hand rolled

Share Bear
Apr 27, 2004

Hi,

This is the guy using your stuff. It looks like your JSON is not valid according to several known libraries and parsers. Maybe we can work together to fix up your broken poo poo?? What language are you using

Love,
the grad student with a heart of gold

vodkat
Jun 30, 2012



cannot legally be sold as vodka

qhat posted:

write a custom json parser to deal specifically with your lovely json, it's not hard

necrotic posted:

Regex that poo poo into the ground of course

this is what i've done and it seems to be working for now. still a lovely and tedious way to waste away my day

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


Doom Mathematic posted:

How do you deal with corrupted or invalid XML, typically?

i have someone unironically trying to get me to generate a checksum of an xml file because 'if we download it and corrupt it we could maybe get one character wrong somewhere in the body of the file but it would still be valid'

i mean come on, if you're so worried that ftp is gonna corrupt your downloada how do you know the checksum isn't corrupt too?

JewKiller 3000
Nov 28, 2006

by Lowtax
there's... nothing wrong with asking for a checksum???

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


JewKiller 3000 posted:

there's... nothing wrong with asking for a checksum???

if we have some subtle corruption in the file then we would probably corrupt the checksum too because we'd have corrupted it at source. actual point to point ftp corruption is a non issue as far as i am concerned

it's like the auditors that want to watch me run a db query because they want to go line by line through the output to prove that it's the same as downloading it from the UI.

like sure, you can do that but don't kid yourself that you've done anything worthwhile with your time.

redleader
Aug 18, 2005

Engage according to operational parameters
i don't really see the problem? if both the file and the checksum are corrupted, they (probably) won't match, so just retry

use a proper hash function instead of a crappy checksum to get rid of the 'probably' qualified

necrotic
Aug 2, 2005
I owe my brother big time for this!
Yeah if you care about the integrity over network use a checksum.

Soricidus
Oct 21, 2010
freedom-hating statist shill
holy loving poo poo c++ std::regex is irredeemably terrible in every way

LordSaturn
Aug 12, 2007

sadly unfunny

vodkat posted:

lol im a grad student, no one values my time, not even me :negative:

if I was doing this I'd first try to write code to sanitize the input into parseable JSON, and only if that proved to be difficult would I try to parse the lovely JSON myself

I call this principle "separation of bailing wire"

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


redleader posted:

i don't really see the problem? if both the file and the checksum are corrupted, they (probably) won't match, so just retry

use a proper hash function instead of a crappy checksum to get rid of the 'probably' qualified

my point to them was that a partial download was the only failure case and as it's xml that would always be a malformed file and fail to parse so why bother with a checksum because if you're that paranoid about random bits flipping or whatever just give up now

though i guess while they're doing this the offshore devs can't be loving anything else up so there is that

Illusive Fuck Man
Jul 5, 2004
RIP John McCain feel better xoxo 💋 🙏
Taco Defender

Soricidus posted:

holy loving poo poo c++ std::regex is irredeemably terrible in every way

I like RE2 for regexes in c++

Shaggar
Apr 26, 2006

Powerful Two-Hander posted:

i have someone unironically trying to get me to generate a checksum of an xml file because 'if we download it and corrupt it we could maybe get one character wrong somewhere in the body of the file but it would still be valid'

i mean come on, if you're so worried that ftp is gonna corrupt your downloada how do you know the checksum isn't corrupt too?

sign it instead.

redleader
Aug 18, 2005

Engage according to operational parameters

Powerful Two-Hander posted:

my point to them was that a partial download was the only failure case and as it's xml that would always be a malformed file and fail to parse so why bother with a checksum because if you're that paranoid about random bits flipping or whatever just give up now

though i guess while they're doing this the offshore devs can't be loving anything else up so there is that

i'm not familiar with the failure modes of ftp, but if you're only worried about partial downloads then normal xml parsing will do

intuitively it seems like you're more likely to run into random bit flips over the network than from local storage or memory, but idk about any of that poo poo. if so then a checksum or whatever might help a bit

it's probably one of those can't hurt, won't help situations

AggressivelyStupid
Jan 9, 2012

necrotic posted:

Regex that poo poo into the ground of course

qhat
Jul 6, 2015


",".join(list)

python can be so loving terrible sometimes

Pointsman
Oct 9, 2010

If you see me posting about fitness
ASK ME HOW MY HELLRAISER TRAINING IS GOING

qhat posted:

",".join(list)

python can be so loving terrible sometimes

I used to hate this so much (and the only reason I still don't is that I don't write much Python any more).

Star War Sex Parrot
Oct 2, 2003

Star War Sex Parrot posted:

cls:

hello gdb, my old friend


today I accomplished baby's first buffer overflow exploit albeit in a controlled/prepared environment

there's something really cool/terrifying about injecting data and/or code into memory and then overwriting a function's return address to execute that new code, all from an input string

Corla Plankun
May 8, 2007

improve the lives of everyone

qhat posted:

",".join(list)

python can be so loving terrible sometimes

i was really surprised at how quickly i became stockholm-syndromed to this particular idiom

skimothy milkerson
Nov 19, 2006

qhat posted:

",".join(list)

python can be so loving terrible sometimes

yeha there needs to be a list.join(','). like bitch you know what i want give it to me

JewKiller 3000
Nov 28, 2006

by Lowtax
there should be one, and preferably only one, obvious* way to do it

* to guido, and no one else

skimothy milkerson
Nov 19, 2006

JewKiller 3000 posted:

there should be one, and preferably only one, obvious* way to do it

* to guido, and no one else

dont quote the zen of python

Shaggar
Apr 26, 2006

qhat posted:

",".join(list)

python can be so loving terrible sometimes

python is terrible all of the time

qhat
Jul 6, 2015


Shaggar posted:

python is terrible all of the time

the more i think about this the truer it becomes

qhat
Jul 6, 2015


i think by far the worst thing about python is the people who are so called experts at python

like someone asks a question about how to do something weird in python on SO or something and gets ambushed by a half dozen cunts trying to impose their best practice autism on the poster

Shaggar
Apr 26, 2006
that's core to the Linux/p-lang culture.

Ellie Crabcakes
Feb 1, 2008

Stop emailing my boyfriend Gay Crungus

That's pretty much core to any language ever in the history of computer programming.

Corla Plankun
May 8, 2007

improve the lives of everyone

Skim Milk posted:

yeha there needs to be a list.join(','). like bitch you know what i want give it to me

"thing".join("other thing") would be impossible if it went both ways though

edit- here's what happens for those of you who havent been braindamaged by python yet
code:
>>> 'thing'.join('other thing')
'othingtthinghthingethingrthing thingtthinghthingithingnthingg'
>>> 'other thing'.join('thing')
'tother thinghother thingiother thingnother thingg'
>>>

JewKiller 3000
Nov 28, 2006

by Lowtax
the worst thing about python is that it looks so easy that it's almost impossible to keep a beginner from trying it and wasting their brain dijkstra-basic-style

Shaggar
Apr 26, 2006

John Big Booty posted:

That's pretty much core to any language ever in the history of computer programming.

not for c# or java

Adbot
ADBOT LOVES YOU

Mao Zedong Thot
Oct 16, 2008


Python is really good for little things and really bad for big things p much

  • Locked thread