Ask General Programming Questions Not Worth Their Own Thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Ask General Programming Questions Not Worth Their Own Thread

«‹›671 »

Ciaphas: Nov 20, 2005; > BEWARE, COWARD

nielsm posted:

Do those documents really have multiple root elements? If so, that's definitely a problem, and you'd have to wrap it in a new (fake) root element around the entire document minus XML declarations to make that part valid.

That case was an example of two different problem data files but now that you mention it, yes!

:suicide:

If I were higher on the totem pole I would have pushed back and said "This data freaking sucks and we can't do anything with it without more money and time" but c'est la vie.

# ? Aug 10, 2016 18:35

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 03:11

Ciaphas: Nov 20, 2005; > BEWARE, COWARD

New question only this time it's future proofing rather than an imminent problem. One document I got that they don't want me to work through--yet--looks like this (abstracted):

XML code:

<document>
  <val_1>foo</val_1>
  <val_2>bar</val_2>
  <val_3>baz</val_3>
  <!--etc to unknown limit-->
</document>

What XPath expression would I use to get all the val_X nodes in one go? Last I checked a * in XPath was just for namespace wildcards, but can it be used like *s on shells, like /document/val_* or something?

# ? Aug 10, 2016 18:39

csammis: Aug 26, 2003; Mental Institution

XPath supports string functions like starts-with and contains. You'd want something like //*[starts-with(name(), 'val_')]

http://zvon.org/xxl/XPathTutorial/Output/example8.html

# ? Aug 10, 2016 18:48

Ciaphas: Nov 20, 2005; > BEWARE, COWARD

Brilliant, thanks.

# ? Aug 10, 2016 18:58

gariig: Dec 31, 2004; Beaten into submission by my fiance; Pillbug

I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to
/<uploaddir>/foo/bar/baz.txt and http://server/files/foo/bar/baz.txt will return the file. Having SSL and HTTP Basic Auth are needed as well. I could code something up, but I have to imagine this is a solved problem except I can't find anything besides HFS

# ? Aug 11, 2016 18:58

TooMuchAbstraction: Oct 14, 2012; I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.; Fun Shoe

gariig posted:

I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to
/<uploaddir>/foo/bar/baz.txt and http://server/files/foo/bar/baz.txt will return the file. Having SSL and HTTP Basic Auth are needed as well. I could code something up, but I have to imagine this is a solved problem except I can't find anything besides HFS

Find a web framework that's easy to set up and use that? Django for example. It's stupid overkill but it gets the job done, plus then you know how to use a web framework if you need to make websites.

Or if overkill offends you, learn to write CGI scripts; it doesn't sound like what you want is all that complicated.

# ? Aug 11, 2016 20:36

The Fool: Oct 16, 2003

gariig posted:

I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to
/<uploaddir>/foo/bar/baz.txt and http://server/files/foo/bar/baz.txt will return the file. Having SSL and HTTP Basic Auth are needed as well. I could code something up, but I have to imagine this is a solved problem except I can't find anything besides HFS

I feel like node.js + express would be able to do this without too much additional work.

# ? Aug 11, 2016 20:45

PRADA SLUT: Mar 14, 2006; Inexperienced,
heartless,
but even so

I can't think of where to put this, but maybe someone here has an idea.

I'm looking for software that will take an image, pick out all the QR codes in that image, and scan them, then either save or export the data in a way that they can be "handed off" to another app. The QR codes just contain random letters and numbers, and I'm looking at about 10-50 codes per image.

Optimally it could be run automatically, where it reads a directory or whatever for images and scans them as they come in.

Commercial software is fine.

# ? Aug 12, 2016 15:16

csammis: Aug 26, 2003; Mental Institution

At my last job our ECM platform did this exact thing. We relied on a few third parties to do various OCR tasks...QR codes went through LEADTOOLS if I recall correctly but there are several heavy-lifting OCR engines that can do it. ABBYY FineReader is another.

If you want to look for software that does the job the keywords are "barcode recognition"

# ? Aug 12, 2016 15:28

Colonel J: Jan 3, 2008

I'm wondering about floating point math. I know that floats can lose precision, but I'm unsure exactly how it happens. Can I do precise math on numbers separated by a thounsandth apart? How about a millionth?

Like let's say I'm working with topological data - GPS points of two things close by can be (e.g.) 47.675432 and 47.675435 . Can I reliably work with them or I should transform them somehow, e.g. by subtracting them by 47 first, bringin them closer to zero?

This would be in c++, but how about on the GPU in a shader? I tried searching a bit but I'm unsure what to look for. If there's a good guide on this stuff I'd like to read it. Thanks!

Colonel J fucked around with this message at 20:26 on Aug 13, 2016

# ? Aug 13, 2016 20:19

pmchem: Jan 22, 2010

Colonel J posted:

If there's a good guide on this stuff I'd like to read it. Thanks!

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

# ? Aug 13, 2016 20:31

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

pmchem posted:

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

This is much easier to read as a PDF.

# ? Aug 13, 2016 20:50

sarehu: Apr 20, 2007; (call/cc call/cc)

Colonel J posted:

I'm wondering about floating point math. I know that floats can lose precision, but I'm unsure exactly how it happens. Can I do precise math on numbers separated by a thounsandth apart? How about a millionth?

Floats or doubles are numbers of the form 1.011010011... * 2^n, with either 23 or 52 bits of precision after the decimal point. (The digit before is always 1.). Basic math operations will round the exact result to the nearest representable value.

Colonel J posted:

Like let's say I'm working with topological data - GPS points of two things close by can be (e.g.) 47.675432 and 47.675435 . Can I reliably work with them or I should transform them somehow, e.g. by subtracting them by 47 first, bringin them closer to zero?

If that's useful, and it can be, you might as well subtract out something much closer than that, like their mean or one of the values. You'd want to use doubles because, well, single precision floats aren't precise enough, you can calculate that.

Colonel J posted:

This would be in c++, but how about on the GPU in a shader? I tried searching a bit but I'm unsure what to look for. If there's a good guide on this stuff I'd like to read it. Thanks!

Same thing. You could imagine starting with doubles and subtracting a local offset so that single-precision floats are good enough so that you can get better GPU performance for whatever it is you're doing, but I'd save that for when you're desperate.

Generally the time you need to worry is when accumulating a whole bunch of math operations or when doing a few risky ones. I think your limiting factor here is GPS accuracy -- let's say you're accurate to a meter. Then if you take the distance between two points, that could be off by up to 2 meters. Double precision is enough to represent lat/long values to 5 nanometers. Your numbers, for example, could represent true locations that differ somewhere between 0.000002 and 0.000004, depending which way they got rounded.

sarehu fucked around with this message at 22:40 on Aug 13, 2016

# ? Aug 13, 2016 22:28

Eela6: May 25, 2007; Shredded Hen

Numerical analysis is a tricky subject. You're already ahead of the curve by thinking about it.

As mentioned above, generally speaking, numerical error accumulates over a series of steps.

A common source of 'dangerous' error occurs when subtracting two things are very close to each other.

If you have a ~ b and you run into a calculation like f(a,b, c) = ca-cb, you might get f*(a, b, c) = 0. This is bad!

My advice is to play around with 'worst case' scenarios in your implemention. Try it with single, double, and extended precision.

If you want to learn more, Numerical Analysis by Burden and Faires is a well respected introductory textbook on the subject.

Edit: just to be clear, I am by no means ant expert. Just took a few courses on numerical analysis in undergrad. That PDF linked above seems like a good reference!

Eela6 fucked around with this message at 05:24 on Aug 15, 2016

# ? Aug 15, 2016 05:20

peramene: Oct 13, 2015; by Fluffdaddy

sorry to interrupt - "Small Excel Questions" has been archived fwiw

# ? Aug 16, 2016 06:25

csammis: Aug 26, 2003; Mental Institution

OP updated to reflect that. I left the link to the archived thread but if someone wants to start a new one just PM me and I'll add the it.

# ? Aug 16, 2016 14:50

LOOK I AM A TURTLE: May 22, 2003; "I'm actually a tortoise."; Grimey Drawer

There's been a "new" one for more than seven years: http://forums.somethingawful.com/showthread.php?threadid=3132163

# ? Aug 16, 2016 14:55

csammis: Aug 26, 2003; Mental Institution

Okay, OP updated. If anyone has any more updates they think I should make just let me know. I don't go searching for new megathreads on a regular basis (clearly)

# ? Aug 16, 2016 15:13

denzelcurrypower: Jan 28, 2011

Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features.

# ? Aug 17, 2016 18:33

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Ornithology posted:

Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features.

Huge topic so there's a huge amount of material out there.

You have broader topics like reverse engineering software, disassembling programs, using debuggers, packet tracers

Then you have books that detail exploits (that have since been patched) for Linux, Windows, Browsers, Phones...

Then you've got books on things like social engineering, phishing, the 'soft' side of security

Reading those books kind of just gives you a little more info than a script kiddie. To really 'learn' you have have a deep understanding of your particular tools and platforms which you'll gain by really digging into things yourself, and understanding how all the underlying tech works (encryption, networking, how the OS works with memory and processes...)

Edit: enter 'reverse engineering' into Amazon and look at the results and the top-rated books in those categories are pretty safe bets

# ? Aug 17, 2016 19:06

Stinky_Pete: Aug 16, 2015; Stinkier than your average bear; Lipstick Apathy

Ornithology posted:

Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features.

hackthissite.com offers a good hands-on introduction. You may also want to install Kali Linux, which is a pentesting distro with a bunch of tools to play with and learn about

But yeah security is a tough field and relies on a detailed understanding of memory on the stack and all sorts of soul-crushing (for most) stuff, so pick up a few books on the subject and get ready to look at a lot of assembly, probably.

Here's a sample syllabus I found https://www.cis.upenn.edu/~cse331/

# ? Aug 17, 2016 22:55

Linear Zoetrope: Nov 28, 2011; A hero must cook

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

# ? Aug 17, 2016 23:06

moctopus: Nov 28, 2005

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

# ? Aug 17, 2016 23:10

Stinky_Pete: Aug 16, 2015; Stinkier than your average bear; Lipstick Apathy

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

What would the API do?

# ? Aug 17, 2016 23:12

Linear Zoetrope: Nov 28, 2011; A hero must cook

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

Going to have to be more specific here. Like, you can get reams of data for machine learning in the UCI repository, though a lot of times you need to massage it since they're not always consistently formatted. You can also find a lot of massive image datasets just by searching neural net challenges, even if you don't want to actually do any machine learning/NN stuff those can be useful. Synthetic user data (e.g. fake bank records or UN/Password combos or whatever) I'm sure exist, but I don't know them off the top of my head.

# ? Aug 17, 2016 23:13

moctopus: Nov 28, 2005

Jsor posted:

Going to have to be more specific here. Like, you can get reams of data for machine learning in the UCI repository, though a lot of times you need to massage it since they're not always consistently formatted. You can also find a lot of massive image datasets just by searching neural net challenges, even if you don't want to actually do any machine learning/NN stuff those can be useful. Synthetic user data (e.g. fake bank records or UN/Password combos or whatever) I'm sure exist, but I don't know them off the top of my head.

Thanks!

Stinky_Pete posted:

What would the API do?

I know I was being terribly vague, but I was going to start with the data and figure something out around it.

I guess I got to "big heap of data that I can query" and didn't think further.

I'd like it to be not made up data though.

# ? Aug 17, 2016 23:18

Stinky_Pete: Aug 16, 2015; Stinkier than your average bear; Lipstick Apathy

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

The general answer is they don't, and it's hard and takes many many man-months of work to figure out an exploit for a given application

Here's a paper

https://www.sans.org/reading-room/whitepapers/securecode/buffer-overflow-attack-mechanism-method-prevention-386

# ? Aug 17, 2016 23:22

Stinky_Pete: Aug 16, 2015; Stinkier than your average bear; Lipstick Apathy

moctopus posted:

Thanks!

I know I was being terribly vague, but I was going to start with the data and figure something out around it.

I guess I got to "big heap of data that I can query" and didn't think further.

I'd like it to be not made up data though.

I'd recommend something in the UCI repo that jsor posted. You'll have to set up your own database and populate it with the data if you want to query it, though.

Edit: Actually, check out the BLS. They have tons of data that could be made easier to access in code

Stinky_Pete fucked around with this message at 23:30 on Aug 17, 2016

# ? Aug 17, 2016 23:24

moctopus: Nov 28, 2005

Stinky_Pete posted:

I'd recommend something in the UCI repo that jsor posted. You'll have to set up your own database and populate it with the data if you want to query it, though.

That was what I was planning on doing anyway so this is looking pretty good.

Thanks again guys!

# ? Aug 17, 2016 23:26

Peristalsis: Apr 5, 2004; Move along.

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

You should work for my new boss.

Boss: "We need a new program!"
Me: "To do what?"
Boss: "It should be written in Python, and use key-value pairs!"
Me: "Is there a spec, or some kind of use case?"
Boss: "This is agile development, the requirements come later!"
Me: "Okay, I'll start writing something. I'll have moctopus bring some data."

# ? Aug 18, 2016 00:31

moctopus: Nov 28, 2005

Peristalsis posted:

You should work for my new boss.

Boss: "We need a new program!"
Me: "To do what?"
Boss: "It should be written in Python, and use key-value pairs!"
Me: "Is there a spec, or some kind of use case?"
Boss: "This is agile development, the requirements come later!"
Me: "Okay, I'll start writing something. I'll have moctopus bring some data."

I have found locations of toilets in Moscow and pornhub comments.

# ? Aug 18, 2016 00:31

ExcessBLarg!: Sep 1, 2001

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. ... On a remote program handling multiple connections allocating memory in unpredictable addresses?

In the classical buffer overflow scenario the buffer is located on the call stack and, historically, the call stack address is deterministic. Even with multiple threads, many programs maintain a thread pool of fixed size so the process is still deterministic. This is why address space layout randomization (ASLR) was/is such a big deal, as it made the location of call stacks non-deterministic and much more difficult to exploit.

Jsor posted:

If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards.

By the way, this is called a NOP slide and 1 MB worth is hardly absurd.

# ? Aug 18, 2016 00:40

JawKnee: Mar 24, 2007; You'll take the ride to leave this town along that yellow line

moctopus posted:

I have found locations of toilets in Moscow and pornhub comments.

how many toilets are there in the comments?

# ? Aug 18, 2016 00:41

Stinky_Pete: Aug 16, 2015; Stinkier than your average bear; Lipstick Apathy

JawKnee posted:

how many toilets are there in the comments?

there should be an API function for that

# ? Aug 18, 2016 02:06

JawnV6: Jul 4, 2004; So hot ...

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work.

Some practical exercises can be found at: https://microcorruption.com/

Jsor posted:

programs generally execute pretty consistently

You're wrong about this, but I'm not sure how.

# ? Aug 18, 2016 02:29

fritz: Jul 26, 2003

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

How about : http://www.planetary.org/blogs/guest-blogs/2016/hubble-series-1-how-to-find-hubble-data.html

# ? Aug 18, 2016 03:45

dupersaurus: Aug 1, 2012; Futurism was an art movement where dudes were all 'CARS ARE COOL AND THE PAST IS FOR CHUMPS. LET'S DRAW SOME CARS.'

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

On phone so can't really link but Computerphile on YouTube did a video on it recently that explained it well.

# ? Aug 18, 2016 03:52

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

If you want really large data sets, Amazon's Large Data Sets Reposity is right up your alley.

# ? Aug 18, 2016 03:56

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it.

By the time people were exploiting remote programs, ASLR and W^X were in common practice. Before that, you could pretty much guarantee where your data would be loaded in memory statically.

Jsor posted:

If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

This is known as a heap spray and was what was used to defeat ASLR before W^X came into common use.

W^X stops people from executing memory that is also mapped writable, so you can't just jump to any writable memory.

The latest technique is to use every set of "instruction, ret" in the original compiled code -- these are called "gadgets", and you chain them together by pushing all of them onto the address stack. This allows you to chain together a bootstrap program that writes the data you want, maps a page as executable and not writable, and jumps to it. There are programs to generate ROP chains for you now, even.

You might think this would be unrealistic, but reminder that in x86 you can jump to unaligned instructions, so you basically need to find any instruction followed by "C2", "C3", "CA" or "CB", even if it's part of a larger data block.

# ? Aug 18, 2016 04:11

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 03:11

JawnV6: Jul 4, 2004; So hot ...

Suspicious Dish posted:

You might think this would be unrealistic, but reminder that in x86 you can jump to unaligned instructions, so you basically need to find any instruction followed by "C2", "C3", "CA" or "CB", even if it's part of a larger data block.

https://twitter.com/_drew_davidson/status/743451922657587200

Suspicious Dish posted:

By the time people were exploiting remote programs, ASLR and W^X were in common practice. Before that, you could pretty much guarantee where your data would be loaded in memory statically.

Fun to think about new features reducing ASLR's efficacy: Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector. Craft pages that look like one with a guess as the randomized portion, write and see if it had been deduped.

# ? Aug 18, 2016 05:06

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Ask General Programming Questions Not Worth Their Own Thread

«‹›671 »