Coding Horrors: You can gather all your technical debt into one easy framework!

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »

RobotRob: Aug 7, 2007; Let's get weird, but not end of BSG weird.

Zorro KingOfEngland posted:

is there even a feasible way of looking at a file that big?

I use a program called OoberViewer for looking through large email logs at work. It doesn't open the whole file at once so it is pretty quick to view a log. Not too sure where to download it at as it was already on this machine, but I could probably get it to you if you would like to try it.

# ? May 3, 2012 16:55

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 04:52

taqueso: Mar 8, 2004

Zorro KingOfEngland posted:

is there even a feasible way of looking at a file that big?

Maybe http://www.swiftgear.com/ltfviewer/features.html will work? I've never attempted to look at a file that big in Windows.

# ? May 3, 2012 17:04

Zorro KingOfEngland: May 7, 2008

nielsm posted:

Should have set up some log rotation.

That would have required the person who set it up to have some forethought. This is what happens when you offshore and let the offshore team do all of the environment configuration with absolutely no oversight. I had no idea we weren't rotating until I needed to look in that log for something.

Janin posted:

Am I reading that right? Does that say 163 GB?

That is correct. I couldn't believe it when I saw it either. Had to double check with google.

Zorro KingOfEngland fucked around with this message at 17:33 on May 3, 2012

# ? May 3, 2012 17:31

mister_gosh: May 24, 2002

I am in the midst of a developing a large search application [or the glue rather between my GUI, Solr and/or Lucene), a database, etc.) solution and now part of my solution may end up here one day.

One part of the process is a request gets sent to a data miner which extracts various field types from the files. Each request may have anywhere between 1 to 30,000 objects to mine at a time. The caller then would receive back an array of class objects (each object represents a file).

Each file has content that gets interspersed to at minimum 15 different fields (database id of file, file location, text, user defined metadata, references, etc.).

There is the potential to be as many as 30 different fields.

I started drafting the class object which would define these fields, but realized that there would be 30 getters and setters and that's when I thought of this thread. I am doing something wrong.

My proof of concept I designed mined the content into .property files (ie: id=1234 location=C:/abc/123.txt user=john_doe text=call me ishmael...), but writing to the property file during the mining and reading the property file back in later on seems inefficient...but now I'm wondering if passing up to 30,000 objects back to the caller with 30 fields set is even worse.

The average byte size of each object would likely be not much larger than the content of my post. Hopefully this makes some sort of sense. Just in case anyone is curious, this is Java/Groovy.

mister_gosh fucked around with this message at 20:11 on May 3, 2012

# ? May 3, 2012 19:28

trex eaterofcadrs: Jun 17, 2005; My lack of understanding is only exceeded by my lack of concern.

Any reason you're not using solr or elasticsearch? They do all that kind of crap for you.

# ? May 3, 2012 19:32

ShadoX: Oct 4, 2004; There is no W!

mister_gosh posted:

I am in the midst of a developing a large search application solution and now part of my solution may end up here one day.

One part of the process is a request gets sent to a data miner which extracts various field types from the files. Each request may have anywhere between 1 to 30,000 objects to mine at a time. The caller then would receive back an array of class objects (each object represents a file).

Each file has content that gets interspersed to at minimum 15 different fields (database id of file, file location, text, user defined metadata, references, etc.).

There is the potential to be as many as 30 different fields.

I started drafting the class object which would define these fields, but realized that there would be 30 getters and setters and that's when I thought of this thread. I am doing something wrong.

My proof of concept I designed mined the content into .property files (ie: id=1234 location=C:/abc/123.txt user=john_doe text=call me ishmael...), but writing to the property file during the mining and reading the property file back in later on seems inefficient...but now I'm wondering if passing up to 30,000 objects back to the caller with 30 fields set is even worse.

The average byte size of each object would likely be not much larger than the content of my post. Hopefully this makes some sort of sense. Just in case anyone is curious, this is Java/Groovy.

I'm not entirely certain what your question is, but you should be able to store your fields and their values in a dictionary instead of using 30 specifically defined fields on the class. This would allow it to be flexible and easily expandable because you won't have to specify in code the names of all of the different fields.

# ? May 3, 2012 19:32

mister_gosh: May 24, 2002

Thanks for the responses!

I'm not sure if a dictionary would be much different, would it? It'd be about the same and less controlled. For instance, if I want to set the id in a dictionary:

object.set("id",id)
id = object.get(id)

Whereas if I'm doing an object with named fields:

object.setID(id)
id = object.getID()

There is the benefit that I don't have 30 getters and setters, but size/traffic-wise, it'd probably be about the same. I'd love to be proven wrong/shown the light though as I hate my present idea.

[EDIT: The more I think about it, the more it probably makes sense to use, like you said, a dictionary rather than a proprietary object I define. It would allow expansion more easily and require less overhead. Thanks!]

Regarding the question about Solr. This is actually partially for a Solr application. I am using Tika and other parsers available in Solr to extract some information from types like Word and graphics, but the bulk of my data is in a proprietary XML format. Adding to that is the fact that 50% of the fields to feed into Solr for a given file comes from content outside of the file itself (such as its subversion location, revision, user, and the database mysql ID for that object).

The mining part of this application also needs to be independent of Solr as I am building another solution which uses solely Lucene. I need to mine the content with the same results but process it differently. That's why I need an array of values (or a bunch of property files to grab the data from) prior to opening the connection to Solr, Lucene or whatever to start the actual indexing process.

mister_gosh fucked around with this message at 20:25 on May 3, 2012

# ? May 3, 2012 20:03

Impotence: Nov 8, 2010; Lipstick Apathy

trex eaterofcadrs posted:

Any reason you're not using solr or elasticsearch? They do all that kind of crap for you.

Out of curiosity, what are your thoughts on indextank?

# ? May 4, 2012 02:14

Huragok: Sep 14, 2011

I'm building an iOS app using Objective-J, PhoneGap and an iOS theming kit.

I'm a horror.

# ? May 4, 2012 09:24

abraham linksys: Sep 6, 2010

Huragok posted:

I'm building an iOS app using Objective-J

The rest of it is understandable, but man. I had no idea anything used Objective-J, outside of the handful of Cappuccino apps out there.

# ? May 4, 2012 09:31

Huragok: Sep 14, 2011

Truth be told, the runtime Objective-J compiler doesn't like being paired with anything but Cappuccino. It's more of a "hey, can this be done?" experiment than anything going into production. But it can be done! :downs:

# ? May 4, 2012 09:45

Sagacity: May 2, 2003; Hopefully my epitaph will be funnier than my custom title.

Oh PHP, you so crazy!

Apparently when running PHP through CGI it forgets to strip the querystring properly, so you can send command-line parameters to the PHP-CGI binary.

Ofcourse, the PHP developers were quick to release a fix which...well, you know. They tried, though, God bless 'em.

# ? May 4, 2012 09:58

Bonfire Lit: Jul 9, 2008; If you're one of the sinners who caused this please unfriend me now.

Sagacity posted:

Apparently when running PHP through CGI it forgets to strip the querystring properly, so you can send command-line parameters to the PHP-CGI binary.

The worse thing is that they used to ignore parameters, but a couple of years ago, they removed the code because nobody remembered why they had it. This is why you write comments, gentlemen.

# ? May 4, 2012 11:16

Huragok: Sep 14, 2011

Isilkor posted:

This is why you write comments, gentlemen.

Code without comments compiles faster :shepface:

# ? May 4, 2012 11:22

Zombywuf: Mar 29, 2008

Isilkor posted:

The worse thing is that they used to ignore parameters, but a couple of years ago, they removed the code because nobody remembered why they had it. This is why you write comments, gentlemen.

And good commit messages :-)

# ? May 4, 2012 11:31

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Huragok posted:

Objective-J

Huh. When I looked at it a while ago, it was just a very hacky parser.

Isilkor posted:

The worse thing is that they used to ignore parameters, but a couple of years ago, they removed the code because nobody remembered why they had it. This is why you write comments, gentlemen.

This is why you have commit messages, people.

# ? May 4, 2012 12:04

MononcQc: May 29, 2007

Sagacity posted:

Oh PHP, you so crazy!

Apparently when running PHP through CGI it forgets to strip the querystring properly, so you can send command-line parameters to the PHP-CGI binary.

Ofcourse, the PHP developers were quick to release a fix which...well, you know. They tried, though, God bless 'em.

Thankfully nearly nobody runs PHP in that mode anymore. That bug is unlikely to show up at all (compared to other PHP vulnerabilities).

# ? May 4, 2012 12:51

Huragok: Sep 14, 2011

Suspicious Dish posted:

Huh. When I looked at it a while ago, it was just a very hacky parser.

Yeah the first version is pretty drat slow. Objective-J 2.0 is in the works according to this blog post. I have to admit for some reason Obj-C rubs me the wrong way but Obj-J doesn't.

# ? May 4, 2012 13:57

ToxicFrog: Apr 26, 2008

MononcQc posted:

Thankfully nearly nobody runs PHP in that mode anymore. That bug is unlikely to show up at all (compared to other PHP vulnerabilities).

You know who does run PHP in that mode?

Sony.

# ? May 4, 2012 13:59

MrMoo: Sep 14, 2000

Sony obviously in awe of PHP function naming standards, redirectToProperPageUrl. Stay classy,

code:

case 'admin': ...
$isAdmin = true;
$runXssClean = false;

# ? May 4, 2012 14:43

the talent deficit: Dec 20, 2003; self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture

Janin posted:

Speaking as someone who's written three JSON parsers, no, the JSON encoding of astral characters has no virtue. It exists that way only to more closely resemble Javascript, which is not a goal any sane person should strive for.

Python's handling is much better. It has fixed-width for both BMP and astral, by using \u for BMP and \U for astral.

\uFF5E -> always 6 characters
\U0001D11E -> always 10 characters

javascript dates to a time before the extended plane existed. it only has escapes for the extended plane because netscape ran on some obscure platforms where transport streams were required to be 8 (or even 7!) bit clean. you should be thankful those escapes even exist

as for your complaints about json allowing forms that aren't valid javascript, you're wrong. apart from u+2028 and u+2029 (which aren't even invalid javascript; they just cause problems when json is treated as executable code and not data) all json is also javascript

encoding detection isn't even in the scope of the discussion. if you can't determine the encoding of your javascript you can't execute it anyways. crockford's detection algorithm is just a heuristic that happens to work because of the structure of json and utf-x

(USER WAS PUT ON PROBATION FOR THIS POST)

# ? May 4, 2012 15:17

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

Huragok posted:

Truth be told, the runtime Objective-J compiler doesn't like being paired with anything but Cappuccino. It's more of a "hey, can this be done?" experiment than anything going into production. But it can be done!

I looked at Objective-J some years ago, but was put off by the documentation being "Uh ... look at the Objective-C docs. It's mostly the same."

# ? May 4, 2012 15:43

tef: May 30, 2004; -> some l-system crap ->

ToxicFrog posted:

You know who does run PHP in that mode?

Sony.

http://mobile.twitter.com/bl4sty/status/198393049502593025

quote:

-dauto_prepend_file=php://input and some POST body is enough for RCE.

# ? May 4, 2012 16:02

tef: May 30, 2004; -> some l-system crap ->

MononcQc posted:

Thankfully nearly nobody runs PHP in that mode anymore. That bug is unlikely to show up at all (compared to other PHP vulnerabilities).

https://help.us.army.mil/cgi-bin/akohd.cfg/php/enduser/home.php?-s

thanks reddit!

# ? May 4, 2012 16:26

MononcQc: May 29, 2007

tef posted:

https://help.us.army.mil/cgi-bin/akohd.cfg/php/enduser/home.php?-s

thanks reddit!

Ah yes. Government and big corps, of course, will use outdated CGI stuff. I have yet been too optimistic assuming more efficient modules from around a decade ago would have been adopted by now :suicide:

# ? May 4, 2012 16:54

Simulated: Sep 28, 2001; Lowtax giveth, and Lowtax taketh away.; College Slice

The real horror is that those things are still live (as of this posting). Google alerts, CVE alerts, etc anyone? Anyone?

# ? May 4, 2012 17:11

Zombywuf: Mar 29, 2008

MononcQc posted:

Ah yes. Government and big corps, of course, will use outdated CGI stuff. I have yet been too optimistic assuming more efficient modules from around a decade ago would have been adopted by now

CGI is a perfectly good way of running code.

# ? May 4, 2012 18:41

Golbez: Oct 9, 2002; 1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Both the Sony and Army pages have been taken down.

# ? May 4, 2012 19:22

MononcQc: May 29, 2007

Zombywuf posted:

CGI is a perfectly good way of running code.

There's been better stuff than that for a long while. Even mod_php, libapache2-mod-fcgid, or php-fpm with nginx, for example.

Then again, not all servers fully support that part of the CGI spec and not all non-apache servers are vulnerable either.

# ? May 4, 2012 19:25

Zombywuf: Mar 29, 2008

MononcQc posted:

There's been better stuff than that for a long while. Even mod_php, libapache2-mod-fcgid, or php-fpm with nginx, for example.

Then again, not all servers fully support that part of the CGI spec and not all non-apache servers are vulnerable either.

Define better. I don't like resource leaks.

# ? May 4, 2012 20:00

Zombywuf: Mar 29, 2008

Golbez posted:

Both the Sony and Army pages have been taken down.

"Taken down" you say?

# ? May 4, 2012 20:01

Golbez: Oct 9, 2002; 1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Zombywuf posted:

"Taken down" you say?

By whom, I can't say. :tinfoil:

# ? May 4, 2012 20:11

MononcQc: May 29, 2007

Zombywuf posted:

Define better. I don't like resource leaks.

Generally faster? I never had real resource leaks there, but then again, I left PHP as soon as I could.

# ? May 4, 2012 20:21

ultramiraculous: Nov 12, 2003; "No..."; Grimey Drawer

So I've been skimming through this thread and really enjoying it. The hard admission I have is that I work in a shared/coworking space. We're making a mobile app, using fancy things like Objective-C and Scala and MongoDB. The gallery for our space is also co-hosting a PBR-sponsored event and a rep just wheeled in 20 cases of beer.

Am I....a bad person?

# ? May 4, 2012 22:19

PDP-1: Oct 12, 2004; It's a beautiful day in the neighborhood.

We got a new digital camera system today that supposedly can stream video out at 640x480 resolution. My attempt at connecting to the device worked, but spiked the CPU to 100% while generating a 3FPS framerate.

I figured I must be doing something stupid like allocating new bitmaps for every frame, so I hooked up a profiler. Nope, my program was nearly garbageless and was just spending 97% of its time stuck in awesomeVendorDLL.GetFrameData().

After some Googling around I found the vendor's reference implementation. It lets you set the camera's gain and shutter speed properties, read back how many frames it's collected since being turned on, etc. They didn't bother to include any code that actually read image data back from the camera though, apparently using a camera to take pictures is such an obscure function that it's not worth documenting.

Finally, I contacted the sales rep and asked if he could put me in touch with someone on their software team. That was a no-go because "the software people don't like to be bothered with support questions".

It's late on Friday afternoon. I'm gonna take the weekend to cool down a bit and start the RMA process on Monday.

# ? May 4, 2012 22:42

New Yorp New Yorp: Jul 18, 2003; Only in Kenya.; Pillbug

PDP-1 posted:

Finally, I contacted the sales rep and asked if he could put me in touch with someone on their software team. That was a no-go because "the software people don't like to be bothered with support questions".

Haha, that means "the software people are outsourced and don't speak English, so there's no point in trying to communicate directly with them"

# ? May 4, 2012 22:54

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Zombywuf posted:

CGI is a perfectly good way of running code.

Are you talking about CGI 'classic'? Out of process? Single threaded?

# ? May 5, 2012 01:39

Simulated: Sep 28, 2001; Lowtax giveth, and Lowtax taketh away.; College Slice

Zombywuf posted:

CGI is a perfectly good way of running code.

True, just not your code :razz:

# ? May 5, 2012 03:29

Zombywuf: Mar 29, 2008

Scaramouche posted:

Are you talking about CGI 'classic'? Out of process? Single threaded?

As many threads as you want, it's your process.

# ? May 5, 2012 16:46

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 04:52

shrughes: Oct 11, 2008; (call/cc call/cc)

Official Fix for PHP Flaw Easily Bypassed, Researchers Say

# ? May 5, 2012 17:03

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›1503 »