Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set?

Adbot
ADBOT LOVES YOU

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

the talent deficit posted:

This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set?

Do you have any more context? There are ways you can design your set representation to make inequalities likely to be detected earlier rather than later in the most common cases for your problem, and there are representations that depend on some degree of observable sharing which can change your decision algorithm from O(n) in the number of elements in the list to O(n) in the number of modifications made to the two lists since the last time you confirmed they were equal, although in practice few people care to use these except for rare cases such as infinite sets.

Dense sets can be checked for equality in constant time. Consequently, if you are dealing with disjoint unions of dense sets, you can reduce to O(n) in the number of dense subsets with the right representation.

More practically, it's often possible to derive quick lemmas about what kinds of modifications are possible, and check only restricted subsets for equality.

If your representation is the naive sorted-list representation of a list, and you have no prior information to base your test on, then the proof that testing for equality is O(n) is rather trivial; the complexity of list equality is typically taught as part of any introduction to complexity class in a CS program.

That Turkey Story
Mar 30, 2003

the talent deficit posted:

This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set?

Maybe I'm misunderstanding the question but I'm not sure I see the problem? If the sets are sorted thats trivially O(n). If one is sorted it's trivially O(log(n)). If neither is sorted, you can easily get nlog(n) by sorting/comparing. In any situation you can get O(n) with hashing.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

That Turkey Story posted:

If one is sorted it's trivially O(log(n)).

Surely you mean O(n log(n)). O(log(n)) is the cost of a single lookup for a sorted-list-set, and in any case it would be somewhat absurd for the cost to compare equality of two sorted lists to be higher than the cost to compare a sorted and an unsorted list.

That Turkey Story
Mar 30, 2003

Woops yeah nlog(n).

shrughes
Oct 11, 2008

(call/cc call/cc)
Use an associative crytographic hash function and a balanced binary search tree -- stick hashes at every node describing the subtree beneath at the respective node.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Let's say we invented some sort of software-based intelligence...

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

csammis
Aug 26, 2003

Mental Institution
I was going to type up some smartass answer but if you're actually serious about this as a topic, it's worth its own thread.

haveblue
Aug 15, 2005



Toilet Rascal

Triple Tech posted:

Let's say we invented some sort of software-based intelligence...

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

Or neither. As Ars Techinca put it when talking about the supercomputer allegedly as powerful as a cat brain a few weeks back:

quote:

In a nutshell, when a simulation of a complex phenomenon (brains, weather systems) reaches a certain level of fidelity, it becomes just as difficult to figure out what's actually going on in the model—how it's organized, or how it will respond to a set of inputs—as it is to answer the same questions about a live version of the phenomenon that the simulation is modeling. So building a highly accurate simulation of a complex, nondeterministic system doesn't mean that you'll immediately understand how that system works—it just means that instead of having one thing you don't understand (at whatever level of abstraction), you now have two things you don't understand: the real system, and a simulation of the system that has all of the complexities of the original plus an additional layer of complexity associated with the models implementation in hardware and software.

tef
May 30, 2004

-> some l-system crap ->

quote:

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

INSUFFICIENT DATA FOR MEANINGFUL ANSWER.

more falafel please
Feb 26, 2005

forums poster

Triple Tech posted:

Let's say we invented some sort of software-based intelligence...

If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?

Read GEB.

causticfluids
Dec 25, 2006

Congratulations on not getting fit in 2011!
Another OCaml question:

I'm trying to implement the software I've written in C++ for my senior thesis in OCaml, (well, the interesting bits) in order to learn OCaml and hopefully use it or another functional language in my graduate work. Currently, I'm just trying to write a program which echoes a dataset that's been piped into the program from the terminal.

EG
code:
$~ ./main < data_in > data_out
should echo the contents of data_in, and any other things I send to stdout, into data_out. I did it this way so I don't have to worry about file IO yet, and can still focus on learning OCaml syntax and style. But I have a syntax error headache!

Here is the (relevant and closed under itself) code currently:

code:
let rec betterread data_size = 
  if data_size = 0 then []  (* can I do this better??? *)
  else let instring = read_line () in
    match (Str.split (Str.regexp "[ \t]+") instring) with [h;t] ->  
    [float_of_string h;float_of_string t]::(betterread (data_size - 1));; 

let head lst =
  match lst with h::t->h;;
let tail lst =
  match lst with h::t->t;;
let second lst =
  head (tail lst);;

let rec writebuild data_size data =
  Printf.printf "%f %f\n%!" (head data) (second data);
  let next_list = tail (tail data) in
    if next_list <> [] then writebuild (data_size - 1) next_list 
    else 0;; 

Printf.printf"\n --This should echo a data file piped in-- \n%!";;
let data_size = read_int () in
  let data = betterread (data_size) in
    Printf.printf "%d\n%!" data_size;
    writebuild data_size data;;
and the data files look like:

code:
1000
5.2 6.9
0.444 69
...
My probem now is that in the last line of code, data has type float list list when it should have type float list. Obviously, I've screwed up some syntax somewhere and threw a list into another list. I'm guessing it's in my match statement:

code:
match (Str.split (Str.regexp "[ \t]+") instring) with [h;t] -> 
[float_of_string h;float_of_string t]::(betterread (data_size - 1));; 
Str.split (Str.regexp "[ \t]+") instring should operate on instring like "4.3 7.5"->["4.3";"7.5"], then I match ["4.3";"7.5"] to [4.3;7.5]. This I have all got to work in the interpreter. I'm still trying to find my error, and thought I would ask those more experienced than I. I tried putting parentheses in front of the match statement and before the ::, but no such luck.

In addition, why does the ocaml interactive interpreter blow so hard, as well as many other interactive interpreters? I can't move around with the arrows, it doesn't have a command history, etc... I mean it's hard for me to find it useful when I have to backspace out 80 characters because I made a spelling mistake. I'm left with copying and pasting into the terminal, or loading the module when I start the interpreter, or using a #use statement. Just ranting, really. Interactive mode is really nice to have.

Also, is there a way I can make my pathetic head and tail functions better/safer? I thought about something like if <> [] then some_list_expr else [] but I think that would cause more headaches than needed to get rid of a compiler warning, which should not have been taken lightly in the first place.

EDIT: Ah! Perhaps in x::y where y is some list of objects x must be another object and not another list of objects? I need list concatenation?
EDIT2: Got that one! I needed the @ operator. Now I'm getting an reference to undefined global `Str'. Need to use or load something, most likely. Maybe not because I'm compiling natively? I try "ocamlc file.ml str.cma -o file" as well as "ocamlc file.ml Str.cma -o file" and both give me "Reference to undefined global `Str'". I'm linking something wrong.

Ah, I got it! Str.cma needs to come before file.ml. Makes sense. Correct code for completeness:

code:
let rec betterread data_size = 
  if data_size = 0 then []  
  else let instring = read_line () in
    (match (Str.split (Str.regexp "[ \t]+") instring) with [h;t] ->  
    [float_of_string h;float_of_string t])@(betterread (data_size - 1));; 

let head lst =
  match lst with h::t->h;;
let tail lst =
  match lst with h::t->t;;
let second lst =
  head (tail lst);;

let rec writebuild data_size data =
  Printf.printf "%f %f\n%!" (head data) (second data);
  let next_list = tail (tail data) in
    if next_list <> [] then writebuild (data_size - 1) next_list 
    else 0;; 

Printf.printf"\n --This should echo a data file piped in-- \n%!";;
let data_size = read_int () in
  let data = betterread (data_size) in
    Printf.printf "%d\n%!" data_size;
    writebuild data_size data;;

causticfluids fucked around with this message at 06:40 on Dec 9, 2009

shrughes
Oct 11, 2008

(call/cc call/cc)

causticfluids posted:

In addition, why does the ocaml interactive interpreter blow so hard, as well as many other interactive interpreters?

Run them inside rlwrap or inside an Emacs buffer.

causticfluids
Dec 25, 2006

Congratulations on not getting fit in 2011!

shrughes posted:

Run them inside rlwrap or inside an Emacs buffer.

Ah, wow, thanks for the great tip on rlwrap. I am surprised I've never encountered it before. I should have thought about readline. :doh:

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





ShoulderDaemon posted:

Do you have any more context? There are ways you can design your set representation to make inequalities likely to be detected earlier rather than later in the most common cases for your problem, and there are representations that depend on some degree of observable sharing which can change your decision algorithm from O(n) in the number of elements in the list to O(n) in the number of modifications made to the two lists since the last time you confirmed they were equal, although in practice few people care to use these except for rare cases such as infinite sets.

Dense sets can be checked for equality in constant time. Consequently, if you are dealing with disjoint unions of dense sets, you can reduce to O(n) in the number of dense subsets with the right representation.

More practically, it's often possible to derive quick lemmas about what kinds of modifications are possible, and check only restricted subsets for equality.

If your representation is the naive sorted-list representation of a list, and you have no prior information to base your test on, then the proof that testing for equality is O(n) is rather trivial; the complexity of list equality is typically taught as part of any introduction to complexity class in a CS program.

I was criminally vague initially. Sorry. This is what I really want:

I've got two systems, completely distinct from one another. They're organized as sets of nodes containing various state. Right now, we're using a tree representation (merkel trees) to calculate a single hash of the state of each system we can compare trivially. However, this restricts us to having systems with identical internal organization. If any state is moved from one node to another, or the number of nodes changes, the hashes now represent the same state, but are distinct and non comparable.

I know there are O(n) algorithms to calculate this state, but O(n) is, unfortunately, too much overhead (as N is very large when discussing the entire set).

What I really need is a hash function such that hash(A, hash(B, hash(C, hash(D, E)))) = hash(E, hash(D, hash(C, hash(B, A)))) (for any permutation of A, B, C, D, E). I know these exist, but I'm unable to find anything on practical implementations.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Why would someone want an in-memory database?

My assumption here is that the data is important and not disposable. I get that moving databases to RAM makes it tons faster, but doesn't that make the data not safe in case of a power outtage or crash? How do these in-memory databases address this issue? Or are there really use cases where people don't care about the longevity of the data? If they did care, how and when would the data be written to disk?

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Triple Tech posted:

Why would someone want an in-memory database?

My assumption here is that the data is important and not disposable. I get that moving databases to RAM makes it tons faster, but doesn't that make the data not safe in case of a power outtage or crash? How do these in-memory databases address this issue? Or are there really use cases where people don't care about the longevity of the data? If they did care, how and when would the data be written to disk?

I think the fastest possible database server would use an in-memory database that did full daily backups and kept an incremental change log for recovery purposes.

It's also pretty easy to copy data from database to database - 2 statements per table in MySQL to make an exact copy (as far as I know, at least). If you wanted to manipulate a large dataset as fast as possible in order to generate a report, you could copy it to an in-memory database, do whatever you want quickly and discard the database.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Triple Tech posted:

Why would someone want an in-memory database?

My assumption here is that the data is important and not disposable. I get that moving databases to RAM makes it tons faster, but doesn't that make the data not safe in case of a power outtage or crash? How do these in-memory databases address this issue? Or are there really use cases where people don't care about the longevity of the data? If they did care, how and when would the data be written to disk?

http://en.wikipedia.org/wiki/In-memory_database

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

the talent deficit posted:

What I really need is a hash function such that hash(A, hash(B, hash(C, hash(D, E)))) = hash(E, hash(D, hash(C, hash(B, A)))) (for any permutation of A, B, C, D, E). I know these exist, but I'm unable to find anything on practical implementations.

Assuming you can't have duplicate data elements, you can trivially do the Zubrist thing.

For each distinct data element, compute a high-entropy* hash for just that element. Normally you'd simply choose a distinct random number, but because you have multiple systems that you probably don't want to have to synchronize the random numbers between, you may find it simpler to just use a cryptographic hash of the data here.

To hash a database with data elements A, B, and C, simply take the XOR of the element hashes. XOR is Abelian, which is the property you want.

*High entropy is not strictly the property you want here, which is really just orthogonality, but it's much simpler to deal with because it's not a global property and tends to result in orthogonal sets.

ShoulderDaemon fucked around with this message at 18:00 on Dec 9, 2009

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
How do you ask a senior developer about the formatting of their code without offending them?

It's so hard to read when it's 5,000 lines of:

code:
function getAccRes()
{
if (something) {
              doThis()
}
else
{
      doThat()
}}

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

fletcher posted:

How do you ask a senior developer about the formatting of their code without offending them?

This is what I hate about where I work. Emphasis on senior developer. He's generated three jobs because of the lovely work he does. When push comes to shove, we can't confront him about the code he makes because it will hurt his "feelings". :airquote:

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
Get everyone on the team to agree on a set of coding standards and then refactor the whole code base with some automated tool.

BigRedDot
Mar 6, 2008

MEAT TREAT posted:

Get everyone on the team to agree on a set of coding standards
This, will never happen. Perhaps you mean: Find someone with the authority to mandate a set of coding standards and convince them to do so.

karms
Jan 22, 2006

by Nyc_Tattoo
Yam Slacker

fletcher posted:

How do you ask a senior developer about the formatting of their code without offending them?

It's so hard to read when it's 5,000 lines of:

code:
function getAccRes()
{
if (something) {
              doThis()
}
else
{
      doThat()
}}

Auto-format as you come across them? As long as everyone agrees on the format auto-format generates, you're golden

yatagan
Aug 31, 2009

by Ozma

KARMA! posted:

Auto-format as you come across them?

You should do this. Do not dicuss it with the other developers if there is no current coding formatting standard, you just need to get your formatting style in as many pieces of code as possible. If approached, and only if approached, tell them you've heard Google uses similar standards to the ones you've implemented in a few of their groups, and you were just about to bring up the issue for discussion anyhow.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
:sigh: Janitor question since I had to janitate today...

If I have Cron -> Process A -> rsync running, so I can't see the output of Process A or rsync, is there a way I can get that STDOUT/STDERR to my terminal after it has launched?

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

Triple Tech posted:

:sigh: Janitor question since I had to janitate today...

If I have Cron -> Process A -> rsync running, so I can't see the output of Process A or rsync, is there a way I can get that STDOUT/STDERR to my terminal after it has launched?

Redirect it to a file then tail that file?

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
You can redirect a process that's already running?

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Triple Tech posted:

:sigh: Janitor question since I had to janitate today...

If I have Cron -> Process A -> rsync running, so I can't see the output of Process A or rsync, is there a way I can get that STDOUT/STDERR to my terminal after it has launched?

Cheap answer: wait until cron emails the output to you, then cat it to your terminal.

It's very nontrivial to redirect output from an already-started process. The easiest approach is to ptrace the process, reopen the relevant file descriptor to point somewhere else, then resume its execution; note that this may gently caress with any buffers the program is maintaining internally.

twofish
Apr 17, 2006

.
I've started hacking for Gnome / Nautilus. I've been writing a patch for the god-awful way it handles list view column widths (not very intelligently). For the longest time, I've used gedit, and it's worked great for most thing's I've written up to about 5 files of ~200 lines or so. It can actually be extended pretty well through python and the hundred-ought plugins. But now, I'm dealing with something which has thousands of lines of code across hundreds of files. How does one usually approach such a project? What editor / IDE do you use? With a general lack of documentation of which header certain prototypes are coming out of, how does one figure this poo poo out without reading through 10,000 lines of C.

Is there some special trick to tackling these bigger codebases or is it still pretty much "use your editor, get good at git/svn/whatev, read what docs there are, don't be stupid, effort."

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender
ctags and cscope are both useful programs for large unfamiliar codebases in C. ctags is designed to be used by a frontend provided by your editor, cscope has its own interface.

pokeyman
Nov 26, 2006

That elephant ate my entire platoon.

Triple Tech posted:

You can redirect a process that's already running?

Missed that bit of your post; I thought you had control over starting/stopping the process. As mentioned, redirecting process output while running is unfun at best.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
The real answer is we need a more mature logging framework... Probably STDOUT multiplexed to a file and a database...

The problem was an rsync that was taking too long. Usually it's supposed to be done almost instantly but someone dropped a multi gigabyte file into the sync directory and we took that as one of the servers dying. So we had to kill our main process, slice out the code that it gone hung up on, and run it from the console to stare at it.

So. Ghetto. :downsrim:

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

Triple Tech posted:

The real answer is we need a more mature logging framework... Probably STDOUT multiplexed to a file and a database...

tee(1)

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

Otto Skorzeny posted:

tee(1)

I've only seen tee work for file descriptors, not completely different output media like e-mail and database. I was thinking more log4j/log4perl.

I find operating a systems group by babysitting e-mails to be an extremely ghetto/inefficient solution.

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
tee works with filenames and with stdout, if it worked on file descriptors you could tee to a socket (I suppose on plan9 you could do that because it uses 9p rather than berkeley sockets)

shrughes
Oct 11, 2008

(call/cc call/cc)

Otto Skorzeny posted:

tee works with filenames and with stdout, if it worked on file descriptors you could tee to a socket (I suppose on plan9 you could do that because it uses 9p rather than berkeley sockets)

You can use, for example, /proc/self/fd/5 in Linux. You can do stuff with mkfifo, cat, and nc too.

BizarroAzrael
Apr 6, 2006

"That must weigh heavily on your soul. Let me purge it for you."
batch question: is it possible to make Labels and GOTO commands work within a FOR loop? I'm trying to have a loop perform a task 1-3 times, and after each time this task runs I want to run a timer and check that it has finished properly, which will repeat until it is. However, when I run the script to FOR loops token (%%a) is lost the first time it encounters a GOTO command, so the contents of the loop will only run once:

code:
FOR %%a IN (%1 %2 %3) DO (
	(
		Runtask.exe %%a
	)
	:checkfinished

        ::determine if Runtask has completed properly
	tasklist /FI "IMAGENAME eq Runtask.exe" 2>NUL | find /I /N "Runtask.exe">NUL

        ::if the process was not found to still be running, go to the end, else echo that it is still going and proceed
	if "%ERRORLEVEL%"=="0" (echo %date% %time% Runtask.exe is still running) else (GOTO continue)

        ::wait 30 seconds
		@ping 127.0.0.1 -n 2 -w 1000 > nul
		@ping 127.0.0.1 -n 30 -w 1000> nul
		goto checkfinished
	:continue
	ECHO finished - %%a 
)
I'm thinking separating the check for the task and it's GOTOs and Labels into a separate script called from this one might work, but it would be preferable to do it all in one place, if possible.

Vinterstum
Jul 30, 2003

BizarroAzrael posted:

.bat ohgod

So what reasons are there for not using a proper scripting language like Python to do this? It seems kinda... counterproductive.

Adbot
ADBOT LOVES YOU

csammis
Aug 26, 2003

Mental Institution
Or if not Python, how about Powershell or VBScript? All of your questions are about batch file functionality that you don't seem to be totally clear on and can't find documentation about (this is because batch files suck). I would imagine this would be executed one hell of a lot easier and faster in a more usual scripting language.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply