Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Smugdog Millionaire
Sep 14, 2002

8) Blame Icefrog

ScaryFast posted:

it just sucks that 1 moron can flip his wig at me over a little stupid thing and then someone else bans me as a result.

How old was the guy who banned you? Was he real old?

Adbot
ADBOT LOVES YOU

shodanjr_gr
Nov 20, 2007

Scaevolus posted:

CLRS is a good choice.

Of course, you can't discuss data structures without covering algorithms as well.

Thanks for the suggestion! I'll probably get this from Amazon today!

mistermojo
Jul 3, 2004

covener posted:

for original issue, create_images 2>&1 | tee -a log.txt is a somewhat common idiom. This sends stderr to the same file descriptor as stdout (that tee is already reading).

For your followup, create_images is likely waiting for data on stdin -- are you supposed to invoke it with any positional arguments, or feed it data via stdin?

a guess about your symptom and how your app wants input:
code:
create_images < input.txt 2>&1 | tee -a log.txt

haha nevermind, I guess the professor just wanted us to > log.txt or | tee -a log.txt for each echo command. I thought there was supposed to be one command that would do it atomatically

ScaryFast
Apr 16, 2003

It mentions IRC in your first post so I figured if ever there was a channel for posting things that don't deserve their own thread, this is it. Apparently you people pay for your Internets by the Kilobyte.

free bees posted:

How old was the guy who banned you? Was he real old?

I was thinking young. Real young. But I'm not going to bitch and moan about it here because I'd absolutely hate to see hybridfusion have to decide between water and Internet again this month because of my careless posting.

ScaryFast fucked around with this message at 22:31 on Feb 4, 2009

csammis
Aug 26, 2003

Mental Institution

ScaryFast posted:

It mentions IRC in your first post so I figured if ever there was a channel for posting things that don't deserve their own thread, this is it. Apparently you people pay for your Internets by the Kilobyte.

All joking aside, #cobol isn't SA and isn't run by SA; it's just a resource that happens to be run by goons. IRC drama doesn't belong on the forums.

Biscuit Hider
Apr 12, 2005

Biscuit Hider
Hey man, I pulled some strings.

code:
* Hybridfusion sets mode: -b *!*Grr@*.tbaytel.net
* Hybridfusion sets mode: -b *!*scary@*.tbaytel.net

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

ScaryFast posted:

I was thinking young. Real young. But I'm not going to bitch and moan about it here because I'd absolutely hate to see hybridfusion have to decide between water and Internet again this month because of my careless posting.

http://www.youtube.com/watch?v=StCoClvLb4Y

Ciaphas
Nov 20, 2005

> BEWARE, COWARD :ovr:


This might be a nonsense question but I'm really grasping at straws with a work issue here. Is switching the endianness of a floating point value any different from switching the endianness of an integer, or are there any gotchas to look out for?

(edit) I'm working in C++ as usual but it's not really a language specific issue--at least I don't think it is. Also, for reasons that are stupid and not worth detailing, I'm trying to go from little to big and then, later, back to little, if any of that matters.

Ciaphas fucked around with this message at 16:27 on Feb 5, 2009

TSDK
Nov 24, 2003

I got a wooden uploading this one
No different at all, but if you're compling on a gcc based toolchain then you will have to pay attention to strict aliasing.

Ciaphas
Nov 20, 2005

> BEWARE, COWARD :ovr:


Bugger. And we don't use that technique in our code (or even use gcc, for that matter), so that's not a factor either.

I'm never going to find this stupid loving bug :cry:

(edit) Heh, turns out it wasn't even an endian OR precision issue, it was a "one side is in yards and the other is in meters, you useless twat" issue. Oh well, good to know that floating point byteswaps are no different from integral, at least :saddowns:

Ciaphas fucked around with this message at 18:34 on Feb 5, 2009

ndb
Aug 25, 2005

So, does anyone know of any resources about programming a shell, as in writing a CLI, as opposed to shell programming?

I have an idea for a research project for my Systems class, but I need some sort of resource on my proposal.

So far, I've found the source code of the BSH, and that's it, as well as maybe one article in the ACM's Digital Library which I don't have access to.

6174
Dec 4, 2004
I've got a problem and I'm not sure of the best way to proceed.

Basically I've got a flat file database that I need to ensure is in the correct format. It is a high-resolution molecular absorption database, so it basically has information saying that at a specific wavenumber, a particular molecule has a set of properties. This database gets used to provide a priori estimates and so forth when running simulations and the like.

Each line is a fixed width, and basically in a fixed Fortran format. They all follow the same general format, ie the first three characters specify the molecule and isotope, next ten specify the wavenumber etc. There are also variable fields whose format depends on the molecule/isotope of the line. There are also lots of exceptions.

I have a working C++ prototype, but it is going to become an unmaintainable mess if I continue with it in its current direction. Right now it is progressing line by line. It breaks the line up by substrings, and by what amount to large switch statements it determines the format that each of the substrings should be. Each substring/format of substring is passed off to a subroutine that validates the substring based on some work I did going backwards from some DFAs based on Fortran FORMAT statements.

What I would like to do is get the format out of the program itself, and have that be separate text file, particularly since there are a couple formats this database could be in. That way I can add exceptions and new formats more easily. What should I be looking at to do this? Would it actually make things maintainable?

Speed is also a bit of a concern. My prototype goes through 1.73 million lines in about 6.75 minutes, and in different situations could be asked to deal with up to 2 million lines. However, maintainability is much more of a concern than speed since these validations aren't too frequent. I'm also willing to move to a different language if it would make the process significantly simpler/easier.

Edit: The level of errors that need to be reported are along the lines of saying "X field of Y line failed".

6174 fucked around with this message at 19:21 on Feb 5, 2009

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

6174 posted:

Each line is a fixed width, and basically in a fixed Fortran format. They all follow the same general format, ie the first three characters specify the molecule and isotope, next ten specify the wavenumber etc. There are also variable fields whose format depends on the molecule/isotope of the line. There are also lots of exceptions .

Your external specification idea sounds like it will inevitably become a domain-specific parsing language. You really, really don't want to write one of those; you could use an existing language, but those are typically for context-free grammars, and your file format very well might not be context-free. However, unless these subfields are very complex, I think you could just write a very simple stream parser, which really can be quite efficient and maintainable if you design it properly. In an object-oriented language, you would make classes for all the different molecule/isotype types; once you've read the molecule ID, you delegate to the appropriate class to read the rest of the line. If molecules tend to cluster in groups with similarly-formatted lines, you can use inheritance to take advantage of that, etc. The key thing is to not try to parse the entire line at once, but to allow earlier parts of the line to determine how later parts will be parsed; that's almost certainly how the original Fortran parsers worked, except that they couldn't exploit object-orientation to cut down on the redundancy and switches.

6174
Dec 4, 2004

rjmccall posted:

Your external specification idea sounds like it will inevitably become a domain-specific parsing language. You really, really don't want to write one of those;

That is what I expected, and you're right, I don't want to write one which is why I made the post in the first place. And no, unfortunately the format isn't context-free (or at least if it is, I don't see how).

rjmccall posted:

However, unless these subfields are very complex, I think you could just write a very simple stream parser, which really can be quite efficient and maintainable if you design it properly.

I think this will work better than what I'm doing right now. Eventually I need to add checks along to the lines of if you see a line with a set of properties that means there must be a mate to it with a similar set of properties. But those checks are much more specific and only need to be run on a subset of say 100k of the lines instead of the whole thing. They should probably be separate programs or at least a second pass through the file after the format has been validated.

POKEMAN SAM
Jul 8, 2004

Clock Explosion posted:

So, does anyone know of any resources about programming a shell, as in writing a CLI, as opposed to shell programming?

I have an idea for a research project for my Systems class, but I need some sort of resource on my proposal.

So far, I've found the source code of the BSH, and that's it, as well as maybe one article in the ACM's Digital Library which I don't have access to.

I wrote one for a UNIX programming class, and all I used was the Google and this book: http://www.kohala.com/start/apue.html

It's a great book for programming in a UNIX environment anyways, but it has some text in it about writing a shell.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

6174 posted:

They should probably be separate programs or at least a second pass through the file after the format has been validated.

I guess you could do it that way, but it might make more sense to just keep track of the rows after you've parsed them — I mean, your first program is basically be doing a full parse, and these separate programs are basically doing a full parse plus some inter-row validations, so it's not like you're saving work by doing them in separate passes. Unless you're dead-set on performance — and you said you weren't — I would write this parser as if you were really going to use all the data in the file; i.e. have your main row-parsing procedure actually return an object containing all the information in the row. It's easy enough for a simple intra-row validator to just delete those objects if it doesn't need them; it's much more annoying to come back later and discover that you need to totally redesign your original parser to actually remember all this data it parsed out.

6174
Dec 4, 2004

rjmccall posted:

Unless you're dead-set on performance — and you said you weren't — I would write this parser as if you were really going to use all the data in the file;

My main concern would more be memory usage. Most of the lines are going to need inter-row comparison (sometimes set of rows too). I also wouldn't necessarily be able to discard lines from comparison until the file is completely read. Since we're talking a couple million rows (the version of the database in beta form right now is about 2.09 million), it would seem to me that unless I very carefully manage memory here I could easily consume so much that I'd waste a lot of time swapping to/from disk. Of course I could be mentally overestimating the memory usage.

edit: A little more info on size here if someone is willing to sanity check my memory estimate:

Each line is 160 characters in length, containing maybe 20 parameters, of which 4 are variable format. Each one of those variable parameters is 15 characters in length and can feasibly be expected to store another 15 parameters each (though none are quite that high yet). There are about 30-35 molecules, each with between 1 to 6ish isotopes. The format is generally dictated by molecule, but there are a couple where each isotope has its own format. All told there are about 100-150 distinct line formats including the various exceptions to the standard rules.

Also the development machine I've got only has 1 GB of memory (with no funding to increase it anytime soon). In plain text format the database is already 323 MB (for the 2.09 million line version).

6174 fucked around with this message at 21:40 on Feb 5, 2009

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Well, I'm certainly not suggesting that you keep everything in memory just because. I'm just saying that you should write the parser so that your outermost loop can look like this:

code:
ValidateFile(file):
  stream <- OpenForInput(file)
  while not at end of stream:
    row <- ReadNextRow(stream)
    if row parsed correctly:
      free row
You still use a constant amount of memory this way, but it makes it easy to decide to keep things around, and you don't have to rework your parser to do so.

Regardless, if you need to check that there's a matching row for certain rows, and you don't have any constraints about where the matching row has to be, and you're not sure you can keep all the relevant rows in memory at once, you will have an interesting problem. Probably you will need to use a simple temporary database to persist things for you, or use a thin wrapper on top of your own caching/pinning scheme.

rjmccall fucked around with this message at 23:02 on Feb 5, 2009

6174
Dec 4, 2004

rjmccall posted:

You still use a constant amount of memory this way, but it makes it easy to decide to keep things around, and you don't have to rework your parser to do so.

Ahh, I understand what you are saying now. I thought you were saying I whould only parse a line once ever and apply all the inter-row checks based on that single parsing, which led my thinking to the potential memory problem.

I can restrict the inter-row comparisons a lot by saying that all inter-comparisons will always be dealing with at least the same molecule if not isotope. These subsets are then on the order of like 100k-200k for the larger ones, which is definitely feasible to store in memory. What I can't say is where the elements of these subsets are in the file since it is not sorted by molecule. In fact there isn't a fixed sorting of the file (it is sorted by wavenumber, but two lines can have the same wavenumber and how those are inter-sorted is not specified and varies through the file).

I'm thinking at this point it might be the simplest to have the initial parse validate the format and then write out the rows to temporary files based on molecule/isotope. Then for the inter-comparisons simply read in the temporary files using the same parser and conduct the checks based on the second parsing. That way at no point should I be trying to store too much in memory at once and the flow of the program should still be comprehensible enough that someone with only basic programming skills would be able to use/modify later if need be.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

6174 posted:

I'm thinking at this point it might be the simplest to have the initial parse validate the format and then write out the rows to temporary files based on molecule/isotope.

I would strongly suggest using a relational database for this work if possible, instead of temporary flat files. Postgres is quite good for being free, and it really does have very efficient algorithms for processing large quantities of data within the limits of available memory.

6174
Dec 4, 2004

rjmccall posted:

I would strongly suggest using a relational database for this work if possible, instead of temporary flat files. Postgres is quite good for being free, and it really does have very efficient algorithms for processing large quantities of data within the limits of available memory.

I understand your sentiment, but unfortunately there is no way that would fly with my boss. This program is step one of putting this flat file database into a relational database (and the data doesn't go well into a relational model anyways). The goal there is to have tools that takes the existing flat file database, convert it to a relational database, and also have the tools to export to existing formats since there are a lot of existing programs written by others that depend on the flat file format. This parser will be used to verify that the output is correct, and hammer down the format of the flat file since the documentation about the database is incomplete and inconsistent.

I also have to keep the requirements of the tool relatively simple since it will almost certainly be distributed to people who have a technical upper limit of typing make. Those that can go further only know Fortran and would never trust a tool that used a database as an intermediary step (they would assume it is wrong since they are superstitious when it comes to computers).

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
I just meant using a temporary database instead of temporary files, but if you'll have to redistribute and don't want to tell people to install a dependent package, I can see the problem. In that case, you always have the option of, er, [strike]implementing your own relational database[/strike] implementing an external merge sort of your flat file and then doing validation on the sorted results. But you're right that it's probably less overall work to use temporary files if you can't use a database.

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh
Why not do both and use SQLite?

6174
Dec 4, 2004

Avenging Dentist posted:

Why not do both and use SQLite?

I'll have to read up on SQLite, but assuming I don't hit a wall on total memory usage when creating the temp files to begin with, that seems like a good option.

Also, thanks for the help rjmccall, you've certainly saved me from banging my head on the wall a few times if I had gone with my original thoughts.

defmacro
Sep 27, 2005
cacio e ping pong

Clock Explosion posted:

So, does anyone know of any resources about programming a shell, as in writing a CLI, as opposed to shell programming?

I have an idea for a research project for my Systems class, but I need some sort of resource on my proposal.

So far, I've found the source code of the BSH, and that's it, as well as maybe one article in the ACM's Digital Library which I don't have access to.

You might want to check out the CS:APP shell lab for this. It's what we used in my systems class and was a pretty fun lab.

narbsy
Jun 2, 2007

GT_Onizuka posted:

You might want to check out the CS:APP shell lab for this. It's what we used in my systems class and was a pretty fun lab.

Seconding this, it is a fun lab, and pretty informative. The book is a really good resource for the lab.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Clock Explosion posted:

So, does anyone know of any resources about programming a shell, as in writing a CLI, as opposed to shell programming?

I have an idea for a research project for my Systems class, but I need some sort of resource on my proposal.

So far, I've found the source code of the BSH, and that's it, as well as maybe one article in the ACM's Digital Library which I don't have access to.

I have the source code for a Solaris Shell I wrote from a while ago. It's nothing too complex; supports running commands, line editor mode, up-down line buffers, forking commands, and a few built in functions. It is as simple as it needs to be, but no simpler.

http://www.omgitswebsite.com/JoSH/josh_source.tar

CAVEAT: This code could be loving horrible. Been a long while since I've looked at it, but it might give you some ideas about Yacc, Lex, and the like.

Please don't yell at me Avenging Dentist. :cry:

BubbaGrace
Jul 14, 2006

Awhile back someone posted a shell script that used Curl to download files from Filefront. All I had to do was input the URL to the filefront page and the script handled all the redirecting.

The command was something like

code:
./getfilefront "http://files.filefront.com/Unreal+Tournament+2007+Trailer/;5444378;/fileinfo.html"
Since search is down I cannot find this post. Would anyone mind whipping another one up for me?

baquerd
Jul 2, 2007

by FactsAreUseless

Felony posted:

Awhile back someone posted a shell script that used Curl to download files from Filefront. All I had to do was input the URL to the filefront page and the script handled all the redirecting.

The command was something like

code:
./getfilefront "http://files.filefront.com/Unreal+Tournament+2007+Trailer/;5444378;/fileinfo.html"
Since search is down I cannot find this post. Would anyone mind whipping another one up for me?

I'll do it for :10bux:

ndb
Aug 25, 2005

GT_Onizuka posted:

You might want to check out the CS:APP shell lab for this. It's what we used in my systems class and was a pretty fun lab.

Reading that actual lab helped me a lot, and it allowed me to change gears on a project, to hopefully match the scope of this project.

I'm just happy to find a reference. This is why I love CoC.

slovach
Oct 6, 2005
Lennie Fuckin' Briscoe
Is there a particularly good way to learn MMX / SSE?

Alpha blending (which I'm doing) sounds ripe for exploiting this.

At 32 bits per pixel, with 128 bits to work with, couldn't I be crunching math for 4 pixels at once? :eek:

POKEMAN SAM
Jul 8, 2004

slovach posted:

Is there a particularly good way to learn MMX / SSE?

Alpha blending (which I'm doing) sounds ripe for exploiting this.

At 32 bits per pixel, with 128 bits to work with, couldn't I be crunching math for 4 pixels at once? :eek:

I'm interested in this as well. :)

hamshu
May 12, 2007

Ice Vs. Fire
I've run into a problem recently. I've been working in Joomla and I have to code a simple horizontal menu. The thing is, I want to do it in the CSS instead of using a special module or an addon. I have the menu set for the area User1 which will be above the header. I made a menu and shoved it there. How do I edit the CSS to make it display sideways. I know this is probably stupid easy but I don't do much coding or work with joomla. I usually write content so this is new to me. thanks for the help.

Grand Poobah
Jun 20, 2005
I have a program I use for work that can pump out a report in HTML format. I hate how the program formats the HTML, and there is no good way to customize it. Fortunately, it will also write the information into XML format, so I figure that's my ticket to creating a new report. However, that's not my forte, and basic approaches like bringing it into Access or Excel aren't working for me.

What I'm looking for is a program that will allow me to easily import in XML files (these files also reference images created in the XML export process), and create/modify reports based off of those files. Ideally, it would run off of some database, because I would like to have the ability to add in my own information to the program, to customize the output. Access would have been perfect, if I could have gotten it to work.

Here's the process
I export XML data for Project A from program
I open up 'Custom Report Program'
I input Project A custom information (this information is saved so I can use it later for every Project A report)
I import/reference XML data
I can then print out a new report based on a combination of the XML data and the custom project info.

Anyone have any idea of a program that will do that? If not, how easy would it be to hire someone to create something that did? Even if it's an add-on to some other program.

EDIT: Glad to hear it should be easy, got the right thread. Would a request to hire someone to do this go in this forum, SH/SC or SA-Mart? I've asked my company guys, and 1) the IT department doesn't give a poo poo, and if I sent the request to them, it would get ignored; they prefer to play heroes to people with MS Office problems 2) our Crystal Reports guy couldn't figure out a good way to do what i wanted.

Grand Poobah fucked around with this message at 18:52 on Feb 8, 2009

Mithaldu
Sep 25, 2007

Let's cuddle. :3:
How to put this... The question you asked is similar to asking whether there's a program that can import a report written in plain english. Since it sounds like you're at work somewhere, ask your local Perl junkie or sysadmin if they can help you convert an xml file into something useful.

Otherwise you're looking at hiring someone for a day or so, depending on the complexity.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

jschmidt posted:

Anyone have any idea of a program that will do that? If not, how easy would it be to hire someone to create something that did? Even if it's an add-on to some other program.

Building an XSL transform for your XML schema isn't a hard task for someone experienced, although if you go that approach you may need a new transform for every set of "custom information", depending on exactly what database features you're thinking of using.

Edit:

Mithaldu posted:

How to put this... The question you asked is similar to asking whether there's a program that can import a report written in plain english.

XML may be terrible, but it's nowhere nearly that bad, and there are frameworks like XSL for doing transforms to different forms; this was, after all, one of the primary design purposes of XML.

ShoulderDaemon fucked around with this message at 18:55 on Feb 8, 2009

Mithaldu
Sep 25, 2007

Let's cuddle. :3:

ShoulderDaemon posted:

it's nowhere nearly that bad
Mind you, i have no idea what XSL is, but that one part i disagree with: It is entirely as good or bad as whoever wrote that specific schema is, since it inherently support unlimited complexity. (And i've seen some of the worse ways to do it.)

julyJones
Feb 12, 2007

Stopped making sense.

slovach posted:

Is there a particularly good way to learn MMX / SSE?

I'm learning this stuff too. :) I'm using inline assembly but there are also "SSE intrinsics" in GCC and MS VS that provide C types and functions for MMX and SSE.

I don't know that there's a definitive how-to, but here's an inline assembly tutorial (gcc) I used to get started:
http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html
A nice directory of the instruction sets is here:
http://softpixel.com/~cwright/programming/simd/sse2.php
Other than a few tutorials and the Intel docs, documentation seems kind of sparse. :P

You might want to go with SSE2 because it's supported on P4 and later CPUs and allows you to also use MMX (64-bit) instructions on the XMM (128-bit) registers, which I've found really convenient.

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
I wanted to ask some more stuff about neural networks, but I wondered first if somebody could recommend a good overall theory book on them that's focused on using neural networks, and less on making something from scratch. You'll probably get what I mean from the questions I have. Assuming I am using backwards propagation to train a neural network:

1. What does it mean that the single greatest factor in training the network is the random variables with which I start? I seem to only get a few percentage reduced error from my starting number after 500 runs, and the success seems to depend more on the random values.
2. What are some good rule of thumb formulas for calculation the amount of neurons for my hidden layer? The first one I heard was to have 1.5x as many as the input layer, but I've heard of other formulas based on the number of different combinations I want represented in my output.
3. Have there been any advances in training neural networks that I should be looking for instead? I don't mean things like pruning but more like things that make backwards propagation obsolete.

Adbot
ADBOT LOVES YOU

baquerd
Jul 2, 2007

by FactsAreUseless

Rocko Bonaparte posted:

1. What does it mean that the single greatest factor in training the network is the random variables with which I start? I seem to only get a few percentage reduced error from my starting number after 500 runs, and the success seems to depend more on the random values.
2. What are some good rule of thumb formulas for calculation the amount of neurons for my hidden layer? The first one I heard was to have 1.5x as many as the input layer, but I've heard of other formulas based on the number of different combinations I want represented in my output.
3. Have there been any advances in training neural networks that I should be looking for instead? I don't mean things like pruning but more like things that make backwards propagation obsolete.

1. It could mean many things. Most likely, other than the possibility that you have programmed it incorrectly, is that you are converging on local minima in the solution space. This is indicative that you should not be using a neural network, or that you need more data points, or that you should add momentum or tweak the amount of it. Though 500 epochs is pitifully low for sizable data.

2. Rule of thumb - you want to minimize the nodes in the hidden layer. Start with one node and add more until you reach satisfactory adaption. Too many nodes and you'll reduce performance, add noise, etc.

3. If you're doing a neural network, backpropagation is the gold standard. I would go so far as to say that to use anything else is not really a neural network.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply