Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
RobotRob
Aug 7, 2007

Let's get weird, but not end of BSG weird.

Zorro KingOfEngland posted:



is there even a feasible way of looking at a file that big?

I use a program called OoberViewer for looking through large email logs at work. It doesn't open the whole file at once so it is pretty quick to view a log. Not too sure where to download it at as it was already on this machine, but I could probably get it to you if you would like to try it.

Adbot
ADBOT LOVES YOU

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:

Zorro KingOfEngland posted:


is there even a feasible way of looking at a file that big?

Maybe http://www.swiftgear.com/ltfviewer/features.html will work? I've never attempted to look at a file that big in Windows.

Zorro KingOfEngland
May 7, 2008

nielsm posted:

Should have set up some log rotation.

That would have required the person who set it up to have some forethought. This is what happens when you offshore and let the offshore team do all of the environment configuration with absolutely no oversight. I had no idea we weren't rotating until I needed to look in that log for something.

Janin posted:

Am I reading that right? Does that say 163 GB?

That is correct. I couldn't believe it when I saw it either. Had to double check with google.

Zorro KingOfEngland fucked around with this message at 17:33 on May 3, 2012

mister_gosh
May 24, 2002

I am in the midst of a developing a large search application [or the glue rather between my GUI, Solr and/or Lucene), a database, etc.) solution and now part of my solution may end up here one day.

One part of the process is a request gets sent to a data miner which extracts various field types from the files. Each request may have anywhere between 1 to 30,000 objects to mine at a time. The caller then would receive back an array of class objects (each object represents a file).

Each file has content that gets interspersed to at minimum 15 different fields (database id of file, file location, text, user defined metadata, references, etc.).

There is the potential to be as many as 30 different fields.

I started drafting the class object which would define these fields, but realized that there would be 30 getters and setters and that's when I thought of this thread. I am doing something wrong.

My proof of concept I designed mined the content into .property files (ie: id=1234 location=C:/abc/123.txt user=john_doe text=call me ishmael...), but writing to the property file during the mining and reading the property file back in later on seems inefficient...but now I'm wondering if passing up to 30,000 objects back to the caller with 30 fields set is even worse.

The average byte size of each object would likely be not much larger than the content of my post. Hopefully this makes some sort of sense. Just in case anyone is curious, this is Java/Groovy.

mister_gosh fucked around with this message at 20:11 on May 3, 2012

trex eaterofcadrs
Jun 17, 2005
My lack of understanding is only exceeded by my lack of concern.
Any reason you're not using solr or elasticsearch? They do all that kind of crap for you.

ShadoX
Oct 4, 2004
There is no W!

mister_gosh posted:

I am in the midst of a developing a large search application solution and now part of my solution may end up here one day.

One part of the process is a request gets sent to a data miner which extracts various field types from the files. Each request may have anywhere between 1 to 30,000 objects to mine at a time. The caller then would receive back an array of class objects (each object represents a file).

Each file has content that gets interspersed to at minimum 15 different fields (database id of file, file location, text, user defined metadata, references, etc.).

There is the potential to be as many as 30 different fields.

I started drafting the class object which would define these fields, but realized that there would be 30 getters and setters and that's when I thought of this thread. I am doing something wrong.

My proof of concept I designed mined the content into .property files (ie: id=1234 location=C:/abc/123.txt user=john_doe text=call me ishmael...), but writing to the property file during the mining and reading the property file back in later on seems inefficient...but now I'm wondering if passing up to 30,000 objects back to the caller with 30 fields set is even worse.

The average byte size of each object would likely be not much larger than the content of my post. Hopefully this makes some sort of sense. Just in case anyone is curious, this is Java/Groovy.

I'm not entirely certain what your question is, but you should be able to store your fields and their values in a dictionary instead of using 30 specifically defined fields on the class. This would allow it to be flexible and easily expandable because you won't have to specify in code the names of all of the different fields.

mister_gosh
May 24, 2002

Thanks for the responses!

I'm not sure if a dictionary would be much different, would it? It'd be about the same and less controlled. For instance, if I want to set the id in a dictionary:

object.set("id",id)
id = object.get(id)

Whereas if I'm doing an object with named fields:

object.setID(id)
id = object.getID()

There is the benefit that I don't have 30 getters and setters, but size/traffic-wise, it'd probably be about the same. I'd love to be proven wrong/shown the light though as I hate my present idea.

[EDIT: The more I think about it, the more it probably makes sense to use, like you said, a dictionary rather than a proprietary object I define. It would allow expansion more easily and require less overhead. Thanks!]

Regarding the question about Solr. This is actually partially for a Solr application. I am using Tika and other parsers available in Solr to extract some information from types like Word and graphics, but the bulk of my data is in a proprietary XML format. Adding to that is the fact that 50% of the fields to feed into Solr for a given file comes from content outside of the file itself (such as its subversion location, revision, user, and the database mysql ID for that object).

The mining part of this application also needs to be independent of Solr as I am building another solution which uses solely Lucene. I need to mine the content with the same results but process it differently. That's why I need an array of values (or a bunch of property files to grab the data from) prior to opening the connection to Solr, Lucene or whatever to start the actual indexing process.

mister_gosh fucked around with this message at 20:25 on May 3, 2012

Impotence
Nov 8, 2010
Lipstick Apathy

trex eaterofcadrs posted:

Any reason you're not using solr or elasticsearch? They do all that kind of crap for you.

Out of curiosity, what are your thoughts on indextank?

Huragok
Sep 14, 2011
I'm building an iOS app using Objective-J, PhoneGap and an iOS theming kit.

I'm a horror.

abraham linksys
Sep 6, 2010

:darksouls:

Huragok posted:

I'm building an iOS app using Objective-J

:catstare:

The rest of it is understandable, but man. I had no idea anything used Objective-J, outside of the handful of Cappuccino apps out there.

Huragok
Sep 14, 2011
Truth be told, the runtime Objective-J compiler doesn't like being paired with anything but Cappuccino. It's more of a "hey, can this be done?" experiment than anything going into production. But it can be done! :downs:

Sagacity
May 2, 2003
Hopefully my epitaph will be funnier than my custom title.
Oh PHP, you so crazy!

Apparently when running PHP through CGI it forgets to strip the querystring properly, so you can send command-line parameters to the PHP-CGI binary.

Ofcourse, the PHP developers were quick to release a fix which...well, you know. They tried, though, God bless 'em.

Bonfire Lit
Jul 9, 2008

If you're one of the sinners who caused this please unfriend me now.

Sagacity posted:

Apparently when running PHP through CGI it forgets to strip the querystring properly, so you can send command-line parameters to the PHP-CGI binary.
The worse thing is that they used to ignore parameters, but a couple of years ago, they removed the code because nobody remembered why they had it. This is why you write comments, gentlemen.

Huragok
Sep 14, 2011

Isilkor posted:

This is why you write comments, gentlemen.

Code without comments compiles faster :shepface:

Zombywuf
Mar 29, 2008

Isilkor posted:

The worse thing is that they used to ignore parameters, but a couple of years ago, they removed the code because nobody remembered why they had it. This is why you write comments, gentlemen.

And good commit messages :-)

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Huragok posted:

Objective-J

Huh. When I looked at it a while ago, it was just a very hacky parser.

Isilkor posted:

The worse thing is that they used to ignore parameters, but a couple of years ago, they removed the code because nobody remembered why they had it. This is why you write comments, gentlemen.

This is why you have commit messages, people.

MononcQc
May 29, 2007

Sagacity posted:

Oh PHP, you so crazy!

Apparently when running PHP through CGI it forgets to strip the querystring properly, so you can send command-line parameters to the PHP-CGI binary.

Ofcourse, the PHP developers were quick to release a fix which...well, you know. They tried, though, God bless 'em.

Thankfully nearly nobody runs PHP in that mode anymore. That bug is unlikely to show up at all (compared to other PHP vulnerabilities).

Huragok
Sep 14, 2011

Suspicious Dish posted:

Huh. When I looked at it a while ago, it was just a very hacky parser.

Yeah the first version is pretty drat slow. Objective-J 2.0 is in the works according to this blog post. I have to admit for some reason Obj-C rubs me the wrong way but Obj-J doesn't.

ToxicFrog
Apr 26, 2008


MononcQc posted:

Thankfully nearly nobody runs PHP in that mode anymore. That bug is unlikely to show up at all (compared to other PHP vulnerabilities).

You know who does run PHP in that mode?

Sony.

MrMoo
Sep 14, 2000

Sony obviously in awe of PHP function naming standards, redirectToProperPageUrl. Stay classy,
code:
case 'admin': ...
$isAdmin = true;
$runXssClean = false;

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Janin posted:

Speaking as someone who's written three JSON parsers, no, the JSON encoding of astral characters has no virtue. It exists that way only to more closely resemble Javascript, which is not a goal any sane person should strive for.

Python's handling is much better. It has fixed-width for both BMP and astral, by using \u for BMP and \U for astral.

\uFF5E -> always 6 characters
\U0001D11E -> always 10 characters

javascript dates to a time before the extended plane existed. it only has escapes for the extended plane because netscape ran on some obscure platforms where transport streams were required to be 8 (or even 7!) bit clean. you should be thankful those escapes even exist

as for your complaints about json allowing forms that aren't valid javascript, you're wrong. apart from u+2028 and u+2029 (which aren't even invalid javascript; they just cause problems when json is treated as executable code and not data) all json is also javascript

encoding detection isn't even in the scope of the discussion. if you can't determine the encoding of your javascript you can't execute it anyways. crockford's detection algorithm is just a heuristic that happens to work because of the structure of json and utf-x

(USER WAS PUT ON PROBATION FOR THIS POST)

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Huragok posted:

Truth be told, the runtime Objective-J compiler doesn't like being paired with anything but Cappuccino. It's more of a "hey, can this be done?" experiment than anything going into production. But it can be done! :downs:

I looked at Objective-J some years ago, but was put off by the documentation being "Uh ... look at the Objective-C docs. It's mostly the same."

tef
May 30, 2004

-> some l-system crap ->

ToxicFrog posted:

You know who does run PHP in that mode?

Sony.

http://mobile.twitter.com/bl4sty/status/198393049502593025

quote:

-dauto_prepend_file=php://input and some POST body is enough for RCE.

:ohdear:

tef
May 30, 2004

-> some l-system crap ->

MononcQc posted:

Thankfully nearly nobody runs PHP in that mode anymore. That bug is unlikely to show up at all (compared to other PHP vulnerabilities).

https://help.us.army.mil/cgi-bin/akohd.cfg/php/enduser/home.php?-s

thanks reddit!

MononcQc
May 29, 2007

Ah yes. Government and big corps, of course, will use outdated CGI stuff. I have yet been too optimistic assuming more efficient modules from around a decade ago would have been adopted by now :suicide:

Simulated
Sep 28, 2001
Lowtax giveth, and Lowtax taketh away.
College Slice
The real horror is that those things are still live (as of this posting). Google alerts, CVE alerts, etc anyone? Anyone?

Zombywuf
Mar 29, 2008

MononcQc posted:

Ah yes. Government and big corps, of course, will use outdated CGI stuff. I have yet been too optimistic assuming more efficient modules from around a decade ago would have been adopted by now :suicide:

CGI is a perfectly good way of running code.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Both the Sony and Army pages have been taken down.

MononcQc
May 29, 2007

Zombywuf posted:

CGI is a perfectly good way of running code.
There's been better stuff than that for a long while. Even mod_php, libapache2-mod-fcgid, or php-fpm with nginx, for example.

Then again, not all servers fully support that part of the CGI spec and not all non-apache servers are vulnerable either.

Zombywuf
Mar 29, 2008

MononcQc posted:

There's been better stuff than that for a long while. Even mod_php, libapache2-mod-fcgid, or php-fpm with nginx, for example.

Then again, not all servers fully support that part of the CGI spec and not all non-apache servers are vulnerable either.

Define better. I don't like resource leaks.

Zombywuf
Mar 29, 2008

Golbez posted:

Both the Sony and Army pages have been taken down.

"Taken down" you say?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Zombywuf posted:

"Taken down" you say?

By whom, I can't say. :tinfoil:

MononcQc
May 29, 2007

Zombywuf posted:

Define better. I don't like resource leaks.

Generally faster? I never had real resource leaks there, but then again, I left PHP as soon as I could.

ultramiraculous
Nov 12, 2003

"No..."
Grimey Drawer
So I've been skimming through this thread and really enjoying it. The hard admission I have is that I work in a shared/coworking space. We're making a mobile app, using fancy things like Objective-C and Scala and MongoDB. The gallery for our space is also co-hosting a PBR-sponsored event and a rep just wheeled in 20 cases of beer.

Am I....a bad person?

PDP-1
Oct 12, 2004

It's a beautiful day in the neighborhood.
We got a new digital camera system today that supposedly can stream video out at 640x480 resolution. My attempt at connecting to the device worked, but spiked the CPU to 100% while generating a 3FPS framerate.

I figured I must be doing something stupid like allocating new bitmaps for every frame, so I hooked up a profiler. Nope, my program was nearly garbageless and was just spending 97% of its time stuck in awesomeVendorDLL.GetFrameData().

After some Googling around I found the vendor's reference implementation. It lets you set the camera's gain and shutter speed properties, read back how many frames it's collected since being turned on, etc. They didn't bother to include any code that actually read image data back from the camera though, apparently using a camera to take pictures is such an obscure function that it's not worth documenting.

Finally, I contacted the sales rep and asked if he could put me in touch with someone on their software team. That was a no-go because "the software people don't like to be bothered with support questions".

It's late on Friday afternoon. I'm gonna take the weekend to cool down a bit and start the RMA process on Monday.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug

PDP-1 posted:

Finally, I contacted the sales rep and asked if he could put me in touch with someone on their software team. That was a no-go because "the software people don't like to be bothered with support questions".

Haha, that means "the software people are outsourced and don't speak English, so there's no point in trying to communicate directly with them"

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Zombywuf posted:

CGI is a perfectly good way of running code.

Are you talking about CGI 'classic'? Out of process? Single threaded?

Simulated
Sep 28, 2001
Lowtax giveth, and Lowtax taketh away.
College Slice

Zombywuf posted:

CGI is a perfectly good way of running code.

True, just not your code :razz:

Zombywuf
Mar 29, 2008

Scaramouche posted:

Are you talking about CGI 'classic'? Out of process? Single threaded?

As many threads as you want, it's your process.

Adbot
ADBOT LOVES YOU

shrughes
Oct 11, 2008

(call/cc call/cc)
Official Fix for PHP Flaw Easily Bypassed, Researchers Say

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply