Ask General Programming Questions Not Worth Their Own Thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Ask General Programming Questions Not Worth Their Own Thread

«‹›671 »

the talent deficit: Dec 20, 2003; self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture

This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set?

# ? Dec 5, 2009 07:05

Adbot: ADBOT LOVES YOU

# ? May 16, 2024 18:47

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

the talent deficit posted:

This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set?

Do you have any more context? There are ways you can design your set representation to make inequalities likely to be detected earlier rather than later in the most common cases for your problem, and there are representations that depend on some degree of observable sharing which can change your decision algorithm from O(n) in the number of elements in the list to O(n) in the number of modifications made to the two lists since the last time you confirmed they were equal, although in practice few people care to use these except for rare cases such as infinite sets.

Dense sets can be checked for equality in constant time. Consequently, if you are dealing with disjoint unions of dense sets, you can reduce to O(n) in the number of dense subsets with the right representation.

More practically, it's often possible to derive quick lemmas about what kinds of modifications are possible, and check only restricted subsets for equality.

If your representation is the naive sorted-list representation of a list, and you have no prior information to base your test on, then the proof that testing for equality is O(n) is rather trivial; the complexity of list equality is typically taught as part of any introduction to complexity class in a CS program.

# ? Dec 5, 2009 07:36

That Turkey Story: Mar 30, 2003

the talent deficit posted:

This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set?

Maybe I'm misunderstanding the question but I'm not sure I see the problem? If the sets are sorted thats trivially O(n). If one is sorted it's trivially O(log(n)). If neither is sorted, you can easily get nlog(n) by sorting/comparing. In any situation you can get O(n) with hashing.

# ? Dec 5, 2009 07:40

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

That Turkey Story posted:

If one is sorted it's trivially O(log(n)).

Surely you mean O(n log(n)). O(log(n)) is the cost of a single lookup for a sorted-list-set, and in any case it would be somewhat absurd for the cost to compare equality of two sorted lists to be higher than the cost to compare a sorted and an unsorted list.

# ? Dec 5, 2009 08:01

That Turkey Story: Mar 30, 2003

Woops yeah nlog(n).

# ? Dec 5, 2009 08:11

shrughes: Oct 11, 2008; (call/cc call/cc)

Use an associative crytographic hash function and a balanced binary search tree -- stick hashes at every node describing the subtree beneath at the respective node.

# ? Dec 5, 2009 21:55

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Let's say we invented some sort of software-based intelligence...

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

# ? Dec 7, 2009 19:22

csammis: Aug 26, 2003; Mental Institution

I was going to type up some smartass answer but if you're actually serious about this as a topic, it's worth its own thread.

# ? Dec 7, 2009 19:26

haveblue: Aug 15, 2005; Toilet Rascal

Triple Tech posted:

Let's say we invented some sort of software-based intelligence...

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

Or neither. As Ars Techinca put it when talking about the supercomputer allegedly as powerful as a cat brain a few weeks back:

quote:

In a nutshell, when a simulation of a complex phenomenon (brains, weather systems) reaches a certain level of fidelity, it becomes just as difficult to figure out what's actually going on in the model�how it's organized, or how it will respond to a set of inputs�as it is to answer the same questions about a live version of the phenomenon that the simulation is modeling. So building a highly accurate simulation of a complex, nondeterministic system doesn't mean that you'll immediately understand how that system works�it just means that instead of having one thing you don't understand (at whatever level of abstraction), you now have two things you don't understand: the real system, and a simulation of the system that has all of the complexities of the original plus an additional layer of complexity associated with the models implementation in hardware and software.

# ? Dec 7, 2009 19:36

tef: May 30, 2004; -> some l-system crap ->

quote:

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

INSUFFICIENT DATA FOR MEANINGFUL ANSWER.

# ? Dec 7, 2009 19:50

more falafel please: Feb 26, 2005; forums poster

Triple Tech posted:

Let's say we invented some sort of software-based intelligence...

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

Read GEB.

# ? Dec 9, 2009 01:52

causticfluids: Dec 25, 2006; Congratulations on not getting fit in 2011!

Another OCaml question:

I'm trying to implement the software I've written in C++ for my senior thesis in OCaml, (well, the interesting bits) in order to learn OCaml and hopefully use it or another functional language in my graduate work. Currently, I'm just trying to write a program which echoes a dataset that's been piped into the program from the terminal.

EG

code:

$~ ./main < data_in > data_out

should echo the contents of data_in, and any other things I send to stdout, into data_out. I did it this way so I don't have to worry about file IO yet, and can still focus on learning OCaml syntax and style. But I have a syntax error headache!

Here is the (relevant and closed under itself) code currently:

code:

let rec betterread data_size = 
  if data_size = 0 then []  (* can I do this better??? *)
  else let instring = read_line () in
    match (Str.split (Str.regexp "[ \t]+") instring) with [h;t] ->  
    [float_of_string h;float_of_string t]::(betterread (data_size - 1));; 

let head lst =
  match lst with h::t->h;;
let tail lst =
  match lst with h::t->t;;
let second lst =
  head (tail lst);;

let rec writebuild data_size data =
  Printf.printf "%f %f\n%!" (head data) (second data);
  let next_list = tail (tail data) in
    if next_list <> [] then writebuild (data_size - 1) next_list 
    else 0;; 

Printf.printf"\n --This should echo a data file piped in-- \n%!";;
let data_size = read_int () in
  let data = betterread (data_size) in
    Printf.printf "%d\n%!" data_size;
    writebuild data_size data;;

and the data files look like:

code:

1000
5.2 6.9
0.444 69
...

My probem now is that in the last line of code, data has type float list list when it should have type float list. Obviously, I've screwed up some syntax somewhere and threw a list into another list. I'm guessing it's in my match statement:

code:

match (Str.split (Str.regexp "[ \t]+") instring) with [h;t] -> 
[float_of_string h;float_of_string t]::(betterread (data_size - 1));;

Str.split (Str.regexp "[ \t]+") instring should operate on instring like "4.3 7.5"->["4.3";"7.5"], then I match ["4.3";"7.5"] to [4.3;7.5]. This I have all got to work in the interpreter. I'm still trying to find my error, and thought I would ask those more experienced than I. I tried putting parentheses in front of the match statement and before the ::, but no such luck.

In addition, why does the ocaml interactive interpreter blow so hard, as well as many other interactive interpreters? I can't move around with the arrows, it doesn't have a command history, etc... I mean it's hard for me to find it useful when I have to backspace out 80 characters because I made a spelling mistake. I'm left with copying and pasting into the terminal, or loading the module when I start the interpreter, or using a #use statement. Just ranting, really. Interactive mode is really nice to have.

Also, is there a way I can make my pathetic head and tail functions better/safer? I thought about something like if <> [] then some_list_expr else [] but I think that would cause more headaches than needed to get rid of a compiler warning, which should not have been taken lightly in the first place.

EDIT: Ah! Perhaps in x::y where y is some list of objects x must be another object and not another list of objects? I need list concatenation?
EDIT2: Got that one! I needed the @ operator. Now I'm getting an reference to undefined global `Str'. Need to use or load something, most likely. Maybe not because I'm compiling natively? I try "ocamlc file.ml str.cma -o file" as well as "ocamlc file.ml Str.cma -o file" and both give me "Reference to undefined global `Str'". I'm linking something wrong.

Ah, I got it! Str.cma needs to come before file.ml. Makes sense. Correct code for completeness:

code:

let rec betterread data_size = 
  if data_size = 0 then []  
  else let instring = read_line () in
    (match (Str.split (Str.regexp "[ \t]+") instring) with [h;t] ->  
    [float_of_string h;float_of_string t])@(betterread (data_size - 1));; 

let head lst =
  match lst with h::t->h;;
let tail lst =
  match lst with h::t->t;;
let second lst =
  head (tail lst);;

let rec writebuild data_size data =
  Printf.printf "%f %f\n%!" (head data) (second data);
  let next_list = tail (tail data) in
    if next_list <> [] then writebuild (data_size - 1) next_list 
    else 0;; 

Printf.printf"\n --This should echo a data file piped in-- \n%!";;
let data_size = read_int () in
  let data = betterread (data_size) in
    Printf.printf "%d\n%!" data_size;
    writebuild data_size data;;

causticfluids fucked around with this message at 06:40 on Dec 9, 2009

# ? Dec 9, 2009 05:50

shrughes: Oct 11, 2008; (call/cc call/cc)

causticfluids posted:

In addition, why does the ocaml interactive interpreter blow so hard, as well as many other interactive interpreters?

Run them inside rlwrap or inside an Emacs buffer.

# ? Dec 9, 2009 06:23

causticfluids: Dec 25, 2006; Congratulations on not getting fit in 2011!

shrughes posted:

Run them inside rlwrap or inside an Emacs buffer.

Ah, wow, thanks for the great tip on rlwrap. I am surprised I've never encountered it before. I should have thought about readline. :doh:

# ? Dec 9, 2009 06:37

the talent deficit: Dec 20, 2003; self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture

ShoulderDaemon posted:

Do you have any more context? There are ways you can design your set representation to make inequalities likely to be detected earlier rather than later in the most common cases for your problem, and there are representations that depend on some degree of observable sharing which can change your decision algorithm from O(n) in the number of elements in the list to O(n) in the number of modifications made to the two lists since the last time you confirmed they were equal, although in practice few people care to use these except for rare cases such as infinite sets.

Dense sets can be checked for equality in constant time. Consequently, if you are dealing with disjoint unions of dense sets, you can reduce to O(n) in the number of dense subsets with the right representation.

More practically, it's often possible to derive quick lemmas about what kinds of modifications are possible, and check only restricted subsets for equality.

If your representation is the naive sorted-list representation of a list, and you have no prior information to base your test on, then the proof that testing for equality is O(n) is rather trivial; the complexity of list equality is typically taught as part of any introduction to complexity class in a CS program.

I was criminally vague initially. Sorry. This is what I really want:

I've got two systems, completely distinct from one another. They're organized as sets of nodes containing various state. Right now, we're using a tree representation (merkel trees) to calculate a single hash of the state of each system we can compare trivially. However, this restricts us to having systems with identical internal organization. If any state is moved from one node to another, or the number of nodes changes, the hashes now represent the same state, but are distinct and non comparable.

I know there are O(n) algorithms to calculate this state, but O(n) is, unfortunately, too much overhead (as N is very large when discussing the entire set).

What I really need is a hash function such that hash(A, hash(B, hash(C, hash(D, E)))) = hash(E, hash(D, hash(C, hash(B, A)))) (for any permutation of A, B, C, D, E). I know these exist, but I'm unable to find anything on practical implementations.

# ? Dec 9, 2009 09:49

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Why would someone want an in-memory database?

My assumption here is that the data is important and not disposable. I get that moving databases to RAM makes it tons faster, but doesn't that make the data not safe in case of a power outtage or crash? How do these in-memory databases address this issue? Or are there really use cases where people don't care about the longevity of the data? If they did care, how and when would the data be written to disk?

# ? Dec 9, 2009 16:47

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Triple Tech posted:

Why would someone want an in-memory database?

My assumption here is that the data is important and not disposable. I get that moving databases to RAM makes it tons faster, but doesn't that make the data not safe in case of a power outtage or crash? How do these in-memory databases address this issue? Or are there really use cases where people don't care about the longevity of the data? If they did care, how and when would the data be written to disk?

I think the fastest possible database server would use an in-memory database that did full daily backups and kept an incremental change log for recovery purposes.

It's also pretty easy to copy data from database to database - 2 statements per table in MySQL to make an exact copy (as far as I know, at least). If you wanted to manipulate a large dataset as fast as possible in order to generate a report, you could copy it to an in-memory database, do whatever you want quickly and discard the database.

# ? Dec 9, 2009 17:16

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

Triple Tech posted:

Why would someone want an in-memory database?

My assumption here is that the data is important and not disposable. I get that moving databases to RAM makes it tons faster, but doesn't that make the data not safe in case of a power outtage or crash? How do these in-memory databases address this issue? Or are there really use cases where people don't care about the longevity of the data? If they did care, how and when would the data be written to disk?

http://en.wikipedia.org/wiki/In-memory_database

# ? Dec 9, 2009 17:17

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

the talent deficit posted:

What I really need is a hash function such that hash(A, hash(B, hash(C, hash(D, E)))) = hash(E, hash(D, hash(C, hash(B, A)))) (for any permutation of A, B, C, D, E). I know these exist, but I'm unable to find anything on practical implementations.

Assuming you can't have duplicate data elements, you can trivially do the Zubrist thing.

For each distinct data element, compute a high-entropy* hash for just that element. Normally you'd simply choose a distinct random number, but because you have multiple systems that you probably don't want to have to synchronize the random numbers between, you may find it simpler to just use a cryptographic hash of the data here.

To hash a database with data elements A, B, and C, simply take the XOR of the element hashes. XOR is Abelian, which is the property you want.

*High entropy is not strictly the property you want here, which is really just orthogonality, but it's much simpler to deal with because it's not a global property and tends to result in orthogonal sets.

ShoulderDaemon fucked around with this message at 18:00 on Dec 9, 2009

# ? Dec 9, 2009 17:57

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

How do you ask a senior developer about the formatting of their code without offending them?

It's so hard to read when it's 5,000 lines of:

code:

function getAccRes()
{
if (something) {
              doThis()
}
else
{
      doThat()
}}

# ? Dec 10, 2009 00:05

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

fletcher posted:

How do you ask a senior developer about the formatting of their code without offending them?

This is what I hate about where I work. Emphasis on senior developer. He's generated three jobs because of the lovely work he does. When push comes to shove, we can't confront him about the code he makes because it will hurt his "feelings". :airquote:

# ? Dec 10, 2009 00:53

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

Get everyone on the team to agree on a set of coding standards and then refactor the whole code base with some automated tool.

# ? Dec 10, 2009 03:15

BigRedDot: Mar 6, 2008

MEAT TREAT posted:

Get everyone on the team to agree on a set of coding standards

This, will never happen. Perhaps you mean: Find someone with the authority to mandate a set of coding standards and convince them to do so.

# ? Dec 10, 2009 05:05

karms: Jan 22, 2006; by Nyc_Tattoo; Yam Slacker

fletcher posted:

How do you ask a senior developer about the formatting of their code without offending them?

It's so hard to read when it's 5,000 lines of:
code:
function getAccRes()
{
if (something) {
              doThis()
}
else
{
      doThat()
}}

Auto-format as you come across them? As long as everyone agrees on the format auto-format generates, you're golden

# ? Dec 10, 2009 09:21

yatagan: Aug 31, 2009; by Ozma

KARMA! posted:

Auto-format as you come across them?

You should do this. Do not dicuss it with the other developers if there is no current coding formatting standard, you just need to get your formatting style in as many pieces of code as possible. If approached, and only if approached, tell them you've heard Google uses similar standards to the ones you've implemented in a few of their groups, and you were just about to bring up the issue for discussion anyhow.

# ? Dec 10, 2009 09:33

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Janitor question since I had to janitate today...

If I have Cron -> Process A -> rsync running, so I can't see the output of Process A or rsync, is there a way I can get that STDOUT/STDERR to my terminal after it has launched?

# ? Dec 12, 2009 01:27

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

Triple Tech posted:

Janitor question since I had to janitate today...

If I have Cron -> Process A -> rsync running, so I can't see the output of Process A or rsync, is there a way I can get that STDOUT/STDERR to my terminal after it has launched?

Redirect it to a file then tail that file?

# ? Dec 12, 2009 02:34

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

You can redirect a process that's already running?

# ? Dec 12, 2009 03:41

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Triple Tech posted:

Janitor question since I had to janitate today...

If I have Cron -> Process A -> rsync running, so I can't see the output of Process A or rsync, is there a way I can get that STDOUT/STDERR to my terminal after it has launched?

Cheap answer: wait until cron emails the output to you, then cat it to your terminal.

It's very nontrivial to redirect output from an already-started process. The easiest approach is to ptrace the process, reopen the relevant file descriptor to point somewhere else, then resume its execution; note that this may gently caress with any buffers the program is maintaining internally.

# ? Dec 12, 2009 04:10

twofish: Apr 17, 2006; .

I've started hacking for Gnome / Nautilus. I've been writing a patch for the god-awful way it handles list view column widths (not very intelligently). For the longest time, I've used gedit, and it's worked great for most thing's I've written up to about 5 files of ~200 lines or so. It can actually be extended pretty well through python and the hundred-ought plugins. But now, I'm dealing with something which has thousands of lines of code across hundreds of files. How does one usually approach such a project? What editor / IDE do you use? With a general lack of documentation of which header certain prototypes are coming out of, how does one figure this poo poo out without reading through 10,000 lines of C.

Is there some special trick to tackling these bigger codebases or is it still pretty much "use your editor, get good at git/svn/whatev, read what docs there are, don't be stupid, effort."

# ? Dec 12, 2009 07:25

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

ctags and cscope are both useful programs for large unfamiliar codebases in C. ctags is designed to be used by a frontend provided by your editor, cscope has its own interface.

# ? Dec 12, 2009 07:35

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

Triple Tech posted:

You can redirect a process that's already running?

Missed that bit of your post; I thought you had control over starting/stopping the process. As mentioned, redirecting process output while running is unfun at best.

# ? Dec 12, 2009 09:51

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

The real answer is we need a more mature logging framework... Probably STDOUT multiplexed to a file and a database...

The problem was an rsync that was taking too long. Usually it's supposed to be done almost instantly but someone dropped a multi gigabyte file into the sync directory and we took that as one of the servers dying. So we had to kill our main process, slice out the code that it gone hung up on, and run it from the console to stare at it.

So. Ghetto. :downsrim:

# ? Dec 12, 2009 16:56

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Triple Tech posted:

The real answer is we need a more mature logging framework... Probably STDOUT multiplexed to a file and a database...

tee(1)

# ? Dec 12, 2009 23:50

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Otto Skorzeny posted:

tee(1)

I've only seen tee work for file descriptors, not completely different output media like e-mail and database. I was thinking more log4j/log4perl.

I find operating a systems group by babysitting e-mails to be an extremely ghetto/inefficient solution.

# ? Dec 13, 2009 05:08

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

tee works with filenames and with stdout, if it worked on file descriptors you could tee to a socket (I suppose on plan9 you could do that because it uses 9p rather than berkeley sockets)

# ? Dec 13, 2009 05:39

shrughes: Oct 11, 2008; (call/cc call/cc)

Otto Skorzeny posted:

tee works with filenames and with stdout, if it worked on file descriptors you could tee to a socket (I suppose on plan9 you could do that because it uses 9p rather than berkeley sockets)

You can use, for example, /proc/self/fd/5 in Linux. You can do stuff with mkfifo, cat, and nc too.

# ? Dec 13, 2009 06:55

BizarroAzrael: Apr 6, 2006; "That must weigh heavily on your soul. Let me purge it for you."

batch question: is it possible to make Labels and GOTO commands work within a FOR loop? I'm trying to have a loop perform a task 1-3 times, and after each time this task runs I want to run a timer and check that it has finished properly, which will repeat until it is. However, when I run the script to FOR loops token (%%a) is lost the first time it encounters a GOTO command, so the contents of the loop will only run once:

code:

FOR %%a IN (%1 %2 %3) DO (
	(
		Runtask.exe %%a
	)
	:checkfinished

        ::determine if Runtask has completed properly
	tasklist /FI "IMAGENAME eq Runtask.exe" 2>NUL | find /I /N "Runtask.exe">NUL

        ::if the process was not found to still be running, go to the end, else echo that it is still going and proceed
	if "%ERRORLEVEL%"=="0" (echo %date% %time% Runtask.exe is still running) else (GOTO continue)

        ::wait 30 seconds
		@ping 127.0.0.1 -n 2 -w 1000 > nul
		@ping 127.0.0.1 -n 30 -w 1000> nul
		goto checkfinished
	:continue
	ECHO finished - %%a 
)

I'm thinking separating the check for the task and it's GOTOs and Labels into a separate script called from this one might work, but it would be preferable to do it all in one place, if possible.

# ? Dec 15, 2009 17:29

Vinterstum: Jul 30, 2003

BizarroAzrael posted:

.bat ohgod

So what reasons are there for not using a proper scripting language like Python to do this? It seems kinda... counterproductive.

# ? Dec 15, 2009 18:39

Adbot: ADBOT LOVES YOU

# ? May 16, 2024 18:47

csammis: Aug 26, 2003; Mental Institution

Or if not Python, how about Powershell or VBScript? All of your questions are about batch file functionality that you don't seem to be totally clear on and can't find documentation about (this is because batch files suck). I would imagine this would be executed one hell of a lot easier and faster in a more usual scripting language.

# ? Dec 15, 2009 21:58

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Ask General Programming Questions Not Worth Their Own Thread

«‹›671 »