Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
JawnV6
Jul 4, 2004

So hot ...
What's top's interface called? I have a static program that polls the state of a few machines, I'd like to just wrap it up and have it constantly update to the same screen, but I don't even know what to search for to rip off how top prints out to the terminal.

Appa Time! posted:

Need some help with one of the project Euler puzzles. It's the one where you have to find the largest prime factor of 317584931803. I made a method I called isPrime(int x), where you test x % (2 to x^1/2 inclusive) and if any of those is 0, it returns false, otherwise it returns true. Well that helper method worked for the smaller example case (largest prime factor of 13195), but when I tried it for 317584931803, it ran for several minutes and still didn't find the answer, so I killed it. I'm trying the fermat primality test instead for the smaller case, but it doesn't seem very accurate. I'm also worried that I might have problems using that algorithm for the larger case, since there is no nextLong method for the java Random class. Am I on the right track with using the fermat test? Is there another algorithm I should be using?
317584931803 is larger than can be stored in an int.

Adbot
ADBOT LOVES YOU

JawnV6
Jul 4, 2004

So hot ...

more falafel please posted:

I think top just uses ncurses, which is a generic console interface library, and basically what any console program with a UI at all uses.
Thanks.

Appa Time! posted:

Yeah I should have mentioned that I used longs for the larger case. That's not the problem, my algorithm works it's just that it's worst case time is horrible for larger stuff, so I was looking for a better alternative.
Don't check every single integer between 1 and x^.5, build a list of primes and check against just those?

JawnV6
Jul 4, 2004

So hot ...

Triangulum posted:

TI BASIC question. I wrote my first program for my 83 that solves the quadratic formula. The problem is that my program gives decimal approximations instead of exact value when dealing with square roots. How do I program it to show exact values?

I don't think the 83 will show square roots inline like the 89 or 92 does. Just to make sure I understand, you're plugging in values to a solver, it's spitting out "1.732..". and you want it to say "sqrt(3)" instead? Your best bet is to also print out the square of all the numbers given to see if they're integers, otherwise it's going to be a guessing game.

JawnV6
Jul 4, 2004

So hot ...
What's a good resource for learning how to implement a hashing algorithm?

I'm looking at making a Bloom filter in verilog, I can find a few thousand descriptions on the error rate calculation but apparently programmers can just sneeze out random hashing algorithms on command and don't need any training.

JawnV6
Jul 4, 2004

So hot ...

6174 posted:

Have you looked at TAOCP vol 3?

I don't know what that is. "The Art Of C Programming" is coming up with several hits with different authors.

JawnV6
Jul 4, 2004

So hot ...

6174 posted:

TAOCP is The Art of Computer Programming by Knuth.

edit: Since posting in the first place I've read (but not personally verified) that there is some discussion about hashing in the Dragon book as well (Compilers: Principles Techniques and Tools by Aho et al).
I'll look into those two, does CLRS have any clear discussion on hashes? I've got that on my shelf at home, not quite sure why I didn't check it in the first place.

StumblyWumbly posted:

Hey, I'm a hard/firmware guy who only writes in C, Verilog and VHDL.
I came from a hardware/C background and took to perl very easily. We also have a lot of legacy perl lying around, perl API's abound, much more relevant than Ruby or Python would have been. Again, your results may vary.

JawnV6
Jul 4, 2004

So hot ...

StickGuy posted:

There's not really a thread for this so I'll ask it here. Does anyone have experience with doing FFT on the GPU? I've been looking around for a few libraries and I'm having difficulties getting the ones I've found working.
One of the nvidia CUDA example programs does a FFT. I can't say much more than that, but their example code is pretty good in general.

quote:

Also, should we have a GPU programming thread?
I was coming to ask the same thing :v: After finally getting 169.09 drivers installed to make it stop complaining, every example program either seg faults or halts my display.

JawnV6
Jul 4, 2004

So hot ...

more falafel please posted:

The XOR trick is not an optimization, it's a trick. On modern CPUs it's actually slower because of pipelining. It's a "clever" trick undergrads show off to their friends.

Inline the function and don't use pointers. See if there's still a difference.

On the functions you wrote, xor is doing 3 loads and 3 stores, assign is doing 2 loads, 2 stores, and using an extra register. The difference in speed is because of the loads/stores, not 'pipelining'. mov and xor will get through exactly the same.

JawnV6
Jul 4, 2004

So hot ...

Scaevolus posted:

-O3 yields the same assembly, and -O uses "movl $0, %eax" instead of "xorl %eax, %eax".
Ugh what a stupid unreadable hack to zero out a register. I despise this 'xor trick'

JawnV6
Jul 4, 2004

So hot ...

more falafel please posted:

I used to use xor a on the Z80 because it was only 1 byte and like 2 cycles whereas the immediate move was 2 bytes and like 5 cycles or something.

But to be fair that was the Z80.

On EMT64 (lol) using the 64 bit xor takes an extra byte, the 32 bit xor will zero the upper half of the register anyway :v:

JawnV6
Jul 4, 2004

So hot ...

Ugg boots posted:

Ugh, now I started posting on StackOverflow. It's not bad, but they're going to want to make a better "front page" because just listing off a bunch of seemingly random questions is kind of a clusterfuck.

I like the achievement style badges :D


Edit: What the gently caress does reputation represent? Is there some sort of formula based on badges and tags and stuff?

Edit 2: Found the formula.
Top rated article on the front pager, with 33 votes, 48 answers, 562 views:
Recommended Fonts for Programming?

Thank god, with this sort of crucial information I can finally ditch experts exchange.

JawnV6
Jul 4, 2004

So hot ...

:suicide: There is so much wrong with the discussion on that page.

edit: I can't vote up or down without reputation? How do I get that? Do I want that?

edit2: I figured out how to get reputation: http://beta.stackoverflow.com/questions/36906/what-is-the-fastest-way-to-swap-values-in-c

JawnV6 fucked around with this message at 16:13 on Aug 31, 2008

JawnV6
Jul 4, 2004

So hot ...

artificialj posted:

Cool, thanks, I'll look into that one.

I understand why you're providing a solution but honestly gaming the vote on a conservative echo chamber poll will, at best, feed into a persecution complex.

JawnV6
Jul 4, 2004

So hot ...

Scaramouche posted:

Would agree that C is a bad starter language ... the entire makefile process was incredibly unintuitive, and

Yeah it's really easy to make a lovely course with C :v:

My first course started in C and they gave us all the boilerplate and started with updating small functions. Well, we "started" with Karel the Robot and moved to C afterwards, but it's not too hard to have an Intro class use C without all those stupid headaches your course was fraught with.

JawnV6
Jul 4, 2004

So hot ...

baquerd posted:

Making the posts appear newest first (unless there's a forum option I am unaware of) would need some custom code.

The forum option you're looking for is "the reply screen".

JawnV6
Jul 4, 2004

So hot ...
Depending on how much of it there is, that "if/then logic" might tank his performance on a GPU. Branches are still handled by predicate-based execution, right?

JawnV6
Jul 4, 2004

So hot ...

shrughes posted:

Uh, worst case scenario it executes both branches and conditional-moves the result, right? Maybe performance could be divide by two..

The hardware's significantly dumber than you're imagining. Every thread executes every instruction, divergent 'branches' executing are handled by predicates at the instruction level.

For NVidia's CUDA, you have groups of 32 threads called a 'warp' that have identical code. So if you've got something like:
code:
if(cond){
instruction1
instruction2
} else {
instruction3
instruction4
}
it'll be flattened into something like this:
code:
predicate = if(cond)
if(predicate) instruction1
if(predicate) instruction2
if(!predicate) instruction3 
if(!predicate) instruction4
So when this lands on a GPU, if anyone takes that branch then every single execution unit is going through all 5 instructions. 40% of your potential throughput is wasted because of that if/then.

It's easy to assume you couldn't do worse than 40-50%, but it's not hard to get a case where this will really hurt you. Maybe his if/then logic is a thousand instructions long, the 'flattened' code would result in a lot of execution units going idle for a long time if even one thread takes it. On a GPGPU the naive code port would be horrendous performance. You have to think of if/then branches in terms of "Will iterations within %32 of each other all branch the same?" and I don't think that's as plainly obvious as everyone else does.

Just seemed like everyone's glossing over the most relevant detail he said to shout "embarrassingly parallel!" louder.

JawnV6
Jul 4, 2004

So hot ...
I'm really not sure what you're saying. Strictly speaking about GPU's, the 'execution units' are too dumb to control the instruction stream. There's one instruction fetch unit for every group of 32 threads. Every one of those 32 threads has to be considering the same instruction, therefore if any of them diverge on a branch you're basically issuing nops on the other 31. There's no 'interleaving' there's no bubbles, there's just some EU's working and some lying idle while the threads they're held together with work on the other path.

If everyone%32 takes the same direction on each branch this isn't an issue, every EU will be doing 'real' work on every tick.

JawnV6
Jul 4, 2004

So hot ...
I'm honestly in the dark as far as ATI's implementation goes. I was strictly speaking about CUDA's handling of diverging branches. For all I know ATI cards handle branches like a champ and I'm completely off base.

I definitely think you should learn one GPGPU flavor if only to understand the limitations and tradeoffs inherent in that choice. If the same sort of predicated execution is hampering your performance the required refactoring will be a learning experience.

Looking at that code, the only thing the if/then is determining is the signs of the operands. You could compute each term outside, then have the if/then just doing a single computation (apply signs to operands) with everything else independent of it.

So:
code:
t1 = (A1-A2)*X
t2 = (B2-B1)*X*Y
if(A1>(B1*Y*D)){
  // nothing
}elseif(A1<(B1*Y*D)){{
  t1 = -1 * t1;
  t2 = -1 * t2;
}else{
  t1 = t2 = 0;
}
R=t1 + t2;

JawnV6
Jul 4, 2004

So hot ...

1337JiveTurkey posted:

However if it's small enough then the instructions will be predicated instead, which is what I was referring to. In that case everything maintains the same PC and is given the same instruction. However only some threads actually execute that instruction based on their predicate registers while the others do nothing for the cycle.
Yeah this is right. I think we're on the same page wrt the underlying hardware capability, it's just odd for me to think of that as a 'pipeline bubble'.

1337JiveTurkey posted:

I was probably thinking too much of how Itanium handles predicates since it's also a predicated VLIW architecture. In that case, each instruction in the bundle can be predicated separately. So if you've got:
code:
if (foo) {
    x = x * 2;
}
else {
    x = x * 3;
}
It's perfectly fine in Itanium to do both in the same instruction. I guess it doesn't matter that much although it does show how hard it can be to assess the cost of an individual if statement in this context.
Right, that's beyond the capabilities of CUDA. Threads that have foo set are executing while everyone else in the warp is idle and vice versa.

Zombywuf posted:

Is there a sign function? i.e:
code:
s = sign(A1 < (B1 * Y * D));
t1 = s * (A1 - A2) * X;
t2 = s * (B2 - B1) * X * Y;
R = t1 + t2;
No idea. I suspect something branchless is possible, but couldn't figure it out on the spot and just pulled out the most I could.

JawnV6
Jul 4, 2004

So hot ...

Popete posted:

I'll see if I can't throw something together this week. I'm learning it currently and a thread on it would probably help a few others around here as well.

Your first bug will be relying on the compiler to put in a bus for you but it'll only instantiate a single-bit wire.

JawnV6
Jul 4, 2004

So hot ...

pseudorandom name posted:

Sorry, I'm out of date, everybody microcodes LEAVE now. Never use it.

I'm pretty sure I'm just confused by your terminology, but what instructions do you think don't get microcoded?

JawnV6
Jul 4, 2004

So hot ...

pseudorandom name posted:

The ones that aren't listed as "microcoded" in AMD's optimization manual.

So by "everybody" you mean AMD. Gotcha.

JawnV6
Jul 4, 2004

So hot ...
Well, they're defining it this way:
  • FastPath Single - Decodes directly into one macro-op in microprocessor hardware.
  • FastPath Double - Decodes directly into two macro-ops in microprocessor hardware.
  • Microcode - Decodes into one or more (usually three or more) macro-ops using the on-chip microcode-engine ROM (MROM).
But I don't see how you could code LEAVE in less than 3 ops and I don't see why you'd proscribe against 'microcoded' instructions as a blanket recommendation anyway. It's not a very useful distinction at that granularity.

JawnV6
Jul 4, 2004

So hot ...
Hit me up on IRC

JawnV6
Jul 4, 2004

So hot ...

Ariza posted:

This is a very stupid Python question. I haven't done any programming in years and I'm trying to teach myself Python by making up a task and completing with Google and documentation but I've hit a stupid wall that makes me feel very dumb.
I mean normally I wouldn't call this a faulty learning method, but

Ariza posted:

I completely forgot about if statements
:stare:

JawnV6
Jul 4, 2004

So hot ...
Statistics will be a lot more useful than diffeq.

JawnV6
Jul 4, 2004

So hot ...

TasteMyHouse posted:

yeah, I could parse the ascii in a snap. that's not what's been assigned to me though. I've been assigned specifically to read the binary format, for performance reasons apparently.

My boss is talking to someone who he thinks might be able to get me the spec. meanwhile, I'm just trying to "crack" it by comparing ascii / binary output.

So how many weeks of your time are you planning on burning before even checking the performance of regexes over the ASCII?

Neslepaks, I was working on something similar recently for a card game. I had (28 choose 7) = 1,184,040 possible hands and wanted to analyze them. I wanted a way to easily associate a particular hand with a simple integer. So I spent a little time and wrote a brute-force algorithm that would step through every possibility. I spent a few days thinking trying to figure out complicated ways to slice up the million possibilities so that I could make going from a number to a hand really really fast. The best idea I had was something like a jump table: if you just check index 0 you can slice the million up into chunk of at most 296,000, then checking the other indexes for alignment (i.e. entry 1 is the next card after entry 0) slices it up even further. If you're dealing with a huge number of entries that might help.

Then I went and actually checked performance. Turns out the brute force takes under 1/5th of a second if it's just stepping through the indexes, not printing out hands, etc.. So the 'quick' way of doing it is "run the brute force n times then stop". And all that thinking was just wasted time.

JawnV6
Jul 4, 2004

So hot ...
Make a google docs spreadsheet with a formula in it. Have it so other students just put their grades into labeled cells.

JawnV6
Jul 4, 2004

So hot ...
gcc's being lazy about the stack. It's compiling a function and after the args are consumed it just slaps new data into [esp] and [esp+4] before calling another function. Are there any flags that will force it to play nice and actually push the args?

I'm doing some inline assembly that's getting mixed up and couldn't find any way to express "hey this is pushing something on the stack don't touch it". Right now I'm just modifying the machine code once it's done but that's annoying and I'd rather avoid it entirely.

JawnV6
Jul 4, 2004

So hot ...

ToxicFrog posted:

That might be sibling call optimization (which permits it to reuse the stack frame when tail-calling a function with the same stack space requirements as the caller); perhaps -fno-optimize-sibling-calls?
No change with this option.

ToxicFrog posted:

If that doesn't work, I have no suggestions, but you can always try turning down the optimization settings until the problem goes away, then looking at the list of specific optimizations enabled by the setting you just turned down and doing a binary search on them to find the specific one causing the problem.
I haven't specified a -O arg, and -O0 didn't change the output.

The particular call is in a #define but that shouldn't change calling conventions, right? Something like this
code:
#define macroLabel( pointer ) \
if( ( pointer )->property ) \
{ \
functionCall(pointer, arg2) \
}

JawnV6
Jul 4, 2004

So hot ...

rjmccall posted:

The convention is that the argument space is scratch space and can be used by the callee for whatever the hell it wants. If you're relying on the arguments not being touched, you are relying on promises not given.

Great, thanks for a load of theory bullshit that doesn't affect my problem or offer insight towards an answer. Got any thoughts on RISC/CISC while we're out here in the theoretical wastelands?

JawnV6
Jul 4, 2004

So hot ...

Internet Janitor posted:

"I have a square peg and I am inserting it in a round hole. Why is this a pain in the rear end?"
"Round holes are designed for round pegs. A square peg won't work in a general case."
"gently caress you. I HAVE a square peg and I'm putting it in the motherfucker. I don't care about theory."
I'm sorry, I thought my original question was quite clear in what I was trying to do. I'm having difficulty with how gcc is carving out the stack for a call. That really has nothing to do with caller/callee argument conventions and bringing up a basic topic in such a patronizing manner wasn't the least bit helpful.

More to the point, I'm certain you don't have a firm enough grasp on the discussion to be throwing stones.

ShoulderDaemon posted:

As far as I know, GCC never generates PUSH/POP instructions to deal with the stack on x86 architectures. It always prefers to track local values by their absolute offset from the stack frame, because that happened to be a much easier way to write the compiler and it's allowed by the calling convention. I admit it's been a while since I poked around inside GCC, but I think you're just out of luck if you want it to behave that way.

Maybe try putting your assembly stuff in an inline function?
It does generate some push/pop instructions, as well as leave without doing an enter prior with stack manipulation (:argh:). But all that is around the function beginning and ending. You're right, it's just subtracting out the biggest space it will need and using absolute references.

rjmccall posted:

I did miss that this is inline assembly instead of a separate function coded in assembly, so I will graciously cut you some slack. GCC does not make any promises at all about the layout of the frame, including the arguments area. You need to use inline assembly constraints to tell GCC that you're using values from the local context.
I'm familiar with inline constraints to use local variables. But I'm not using anything local in the asm. I just need space on the stack to remain untouched, which it doesn't look to support.

pseudorandom name posted:

Your problem is that you're modifying the stack pointer in your inline assembly which (to my knowledge) gcc doesn't allow.
I'm pretty sure I can work around it regardless of that limitation. It just seemed like there'd be a way to get it to emit the stack-based instructions for function calls as well.

JawnV6
Jul 4, 2004

So hot ...

pseudorandom name posted:

Why? It's the most efficient way to allocate a stack frame.
Yeah, the compiler I wrote did it that way as well. But since then I've been doing a shitload of pure asm coding so I wasn't as quick to pick up on that as I should've been.

pseudorandom name posted:

And there's nothing wrong with using LEAVE without ENTER, I'm not sure why you'd be upset about that either.
The stack frame is manually adjusted, then leave; ret. It's weird to see one and not the other. Checking back in the PRM it's not explicitly warned against as was my impression, but it still doesn't look right.

rjmccall posted:

You've gotten patronized because you're communicating poorly and then being rude to people who misunderstand you, and because your approach reeks of a very rookie mistake with inline asm, which is to try to work around the compiler's code emission patterns instead of adequately expressing your intent to the compiler.
My problem's solved for the moment and this work is more in the proof-of-concept space so the relative cleanliness doesn't matter. This is a third party code base that works perfectly well on ARM and x86 with another compiler but I'll be sure to pass your 'rookie' assessment onto the authors. There is literally no way to adequately express this intent to gcc, but please keep telling me to do that.

I'm seriously not too concerned with your understanding of my problem. You haven't once given my posts an honest reading and now that I've got a working solution my drive to hold your hand through it is gone. Terribly sorry.

JawnV6
Jul 4, 2004

So hot ...

pseudorandom name posted:

You haven't once explained your problem.
I apparently explained it with enough fidelity for others to understand, discuss, and answer. That's what really gets me about this chorus of "you didn't explain it," did y'all just completely miss the discussion I had? I've even got a solution (non binary patching) that's working and doesn't interfere with gcc's stack frame.

pseudorandom name posted:

You have been a rude rear end in a top hat on multiple occasions, though.
:confused: And? That hasn't stopped people from going over my posts with a microscope to reconstruct some fictional version of what I'm working on. Maybe in the future don't reward 'rear end in a top hat' behavior with increased fervor for answering questions??

rjmccall posted:

ETA: this is one of those situations where a little pseudocode goes a long way.
It's also a situation where I have enough background knowledge that specific, targeted questions were enough so that I didn't have to go through the legwork of making toy problems for arrogant jerks who are, for reasons beyond their control, compelled to grok every question in this thread to the finest detail.

JawnV6
Jul 4, 2004

So hot ...

pseudorandom name posted:

You explained your immediate difficulty with the fragile and broken method you use to solve your problem, but have never explained what your actual problem is.

"How do I get gcc to stop being gcc?" isn't your actual problem, your actual problem is whatever your final goal is that your broken inline assembly hack is trying to accomplish.

I'm fine with the "broken inline assembly hack" and really don't care to explain the details. I can assure you, I've read the wiki page for the XY Problem and for this specific instance there's really no other way. If not, I'll just live with the shame of adding 6 asm instructions to a flow that used to be 3 that leave the stack untouched for gcc. I just thought there'd be an easier way.

JawnV6 fucked around with this message at 22:21 on Jan 22, 2012

JawnV6
Jul 4, 2004

So hot ...

Bob Morales posted:

Are there any websites that talk about writing video games for the old 80's and early 90's arcade machines? I realize most of the stuff has been long since thrown away or under NDA, but I love to read some write-ups on their tools chains and methods they used to get the art and code from whatever computers they used, to the arcade boards themselves.

This isn't exactly what you're talking about, but I think you'd find it interesting.

JawnV6
Jul 4, 2004

So hot ...

Sab669 posted:

That's what he showed us, and it really confuses me. Why isn't y 1.0 when we print it? Because of the if statement, I feel like it should be. I also don't understand how d is ever even being incremented. I guess simply just having a recursive method call in the return statement is what's really killing me.

You're going to be calling the power() function multiple times. It seems like you're only focusing on the final instance it's called. You should write out explicitly what the code does, starting from the function call power(5.6,2).

That will return d * power(d, i-1) = 5.6 * power(5.6,2-1) = 5.6 * power(5.6,1)

JawnV6
Jul 4, 2004

So hot ...
I remember my big stumbling block to understanding recursion was the big O cost of addition.

Adbot
ADBOT LOVES YOU

JawnV6
Jul 4, 2004

So hot ...

Strong Sauce posted:

is its taking the first 2^16 (0xffff) bits
It's taking the first 16 bits. Not 2^16 or 65,536.

Strong Sauce posted:

0xF (16)
& 00001111 (the number 16)
0xf = 0b1111 = 15

Not 16.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply