Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Tan Dumplord
Mar 9, 2005

by FactsAreUseless
I've been struggling to find a way to debug ARM assembly, as the Patchblocks hardware has no debugging interface. I was pricing out simulators and other debugging tools. Then I looked beside my monitor and saw the boxed, unused Raspberry Pi 2. :cripes:

Adbot
ADBOT LOVES YOU

Tan Dumplord
Mar 9, 2005

by FactsAreUseless
Stupid question time!

When do these two statements (in C) produce different results?

(a * b) / 1024
(a * b) >> 10

Where a is a signed 8 bit integer and b is a signed 32-bit integer.

I have a piece of code that uses /1024, and I thought I'd optimize it by switching to >>10, but it produces different results!

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
It's implementation-defined whether right shift of a signed int is an arithmetic shift or a logical shift. Are these values sometimes negative by chance?

As an aside, strength reduction is a really easy optimization for the compiler to make. Consider inspecting the generated assembly and verifying it isn't happening before doing it yourself.

Tan Dumplord
Mar 9, 2005

by FactsAreUseless
Yeah, a uses the full range of signed 8-bit values.

I always assumed that right shifts were arithmetic. I've written a whole bunch of code for this platform under that assumption. It's GCC. According to a mailing list post, GCC uses ASR when the value is signed.

As to strength reduction, I'm afraid I don't know x86 ASM. This code targets x86 for the emulator, but then targets ARMv7M for the hardware.

Here's /1024:
code:
	leal	1023(%eax), %edx
	testl	%eax, %eax
	cmovs	%edx, %eax
	sarl	$10, %eax
And here's >>10:
code:
	sarl	$10, %eax

Pteretis
Nov 4, 2011

An arithmetic right shift is still not equivalent to division. The issue arises when you round a negative number.

Tan Dumplord
Mar 9, 2005

by FactsAreUseless
Well poo poo, 10 seconds in Python cleared that right up for me. I really hadn't considered what happens when you right shift a 2's complement number. Shifting all your zeroes away leaves you at -1!

Goddamn. This really fucks up my understanding of fixed-point arithmetic.

Does this mean that -0.5 (Q10) >> 10 == -1? Fuuuuuuuuck.....

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

sliderule posted:

Yeah, a uses the full range of signed 8-bit values.

I always assumed that right shifts were arithmetic. I've written a whole bunch of code for this platform under that assumption. It's GCC. According to a mailing list post, GCC uses ASR when the value is signed.

As to strength reduction, I'm afraid I don't know x86 ASM. This code targets x86 for the emulator, but then targets ARMv7M for the hardware.

Here's /1024:
code:
	leal	1023(%eax), %edx
	testl	%eax, %eax
	cmovs	%edx, %eax
	sarl	$10, %eax
And here's >>10:
code:
	sarl	$10, %eax

As you probably suspected, it's performing strength reduction and shifting right by ten in both cases, but in the / 1024 case it's correcting the rounding error that occurs when you do division via an arithmetic right shift on a negative, whereas in the >> 10 case it isn't bothering to because you told it to shift rather than divide.

Specifically, the in the / 1024 case it's doing basically

code:
if (n < 0) {
    n += 1023;
}
n >>= 10;

Tan Dumplord
Mar 9, 2005

by FactsAreUseless
Well, I've learned something today! Thanks!

Now, to find all the places where I assumed that right shifting all the fractional bits away yielded the integer component.

Rescue Toaster
Mar 13, 2003
Anyone know any good resources for patterns and/or data structures in embedded C? I've learned some key things at work and school, but I feel like there must be so much more out there. Especially anything that focuses on situations where you don't have dynamic allocation.

Things like you can do a binary tree inside an array where the child of node n are nodes (n*2) and (n*2)+1, as long as you keep it balanced and establish a max depth. Queues are generally simple with a max array size and head & tail pointers. Those sorts of things.

And for patterns, like 80% of the interesting 'patterns' I've found for C basically devolve into 're-implement dynamic dispatch yourself and then you can make a visitor!' which should just say 'use C++ idiot'.

Jerry Bindle
May 16, 2003
Its really hard to generalize anything about embedded stuff, that goes for programming languages as well as hw. I recommend looking at a few RTOSes targeted to a platform that you're interested in to see how they structure the application framework. There's a book called "Practical UML Statecharts in C/C++" that walks you through the design of a framework that allows you to, well, design (or use) a framework that will execute UML statemachines. It is somewhat a manual for the author's proprietary software. But he does a good job of explaining all the theory/concepts behind choices he's made. After reading it you could put it down and write your own OS/Statemachine framework.

Pan Et Circenses
Nov 2, 2009

Rescue Toaster posted:

Things like you can do a binary tree inside an array where the child of node n are nodes (n*2) and (n*2)+1, as long as you keep it balanced and establish a max depth. Queues are generally simple with a max array size and head & tail pointers. Those sorts of things.

Here's one that maybe everybody knows, but I use it somewhere in basically every embedded system I write. With queues like that, make their size a power of two, then you can wrap them with no branching and just a bitwise and:

head = head & (size - 1)

...equivalent to:

if(head >= size) head = 0;

Especially fast if size-1 is a compile time constant.

Sweevo
Nov 8, 2007

i sometimes throw cables away

i mean straight into the bin without spending 10+ years in the box of might-come-in-handy-someday first

im a fucking monster

A cool trick with x86 assembly lets you do arbitrary-sized queues without branches. Assume BX is the current queue position, and CX is the queue size. It works on other architectures too as long as they have a Subtract With Carry instruction.

code:
inc  bx       ; next position
cmp  cx,bx    ; size - next
sbb  ax,ax    ; if carry set, ax = ffff, else ax = 0
and  bx,ax    ; bx = either next+1 or 0
It's not so useful on modern systems as branches are faster than they used to be. But can be fast on architectures with deep pipelines where branching is often slow.

Sweevo fucked around with this message at 11:15 on Jun 8, 2015

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:

I want to store a timestamp along with some data. What is best practice for the format? I'd prefer to keep it compact, but I'd also like to avoid rollover issues. Is 64-bit seconds or milliseconds past some epoch the way to go?

Colonel Taint
Mar 14, 2004


taqueso posted:

I want to store a timestamp along with some data. What is best practice for the format? I'd prefer to keep it compact, but I'd also like to avoid rollover issues. Is 64-bit seconds or milliseconds past some epoch the way to go?

Best practice is to use something that works well for your system, I would think. What kind of data are you tagging with the timestamp? How is it being delivered to the end point? How crucial is it, really, to save a few bits. I'd tend to err on the conservative side in general, but there may be a few cases where it may not really be worth sweating or it would be better to send full timestamps with every packet. Lossy/unreliable networks (udp) would imply that you need to send full information as much as possible, but if the delivery network is considered reliable (eg end-to-end serial), you could send an initial state packet with an absolute 64-bit timestamp, and save some bits by sending offsets. If you suspect the stream may be interrupted, a packet containing a full timestamp and the offset could be sent every n seconds as well.

Most 64 bit time structures I've worked with split the 64 bits into two words, with 32 bits of seconds and 32 bits of microseconds or nanoseconds. Signed 32-bit seconds-since-epoch rolls over in 2038, but you could potentially cut the bits another way depending on the precision you need. So if you only need millisecond precision you can do a 54/10 split to get the most of your seconds word.

yippee cahier
Mar 28, 2005

taqueso posted:

I want to store a timestamp along with some data. What is best practice for the format? I'd prefer to keep it compact, but I'd also like to avoid rollover issues. Is 64-bit seconds or milliseconds past some epoch the way to go?
Depends on the application. Do you need precision or space? In my current project we are using UNIX timestamps paired with a millisecond subsecond component. I've also seen 64-bit 100-ns counts since 1601-01-01, i.e. Windows FILETIMEs, which offered a more than enough precision for the application and plenty of time remaining in a cycle until rollover.

I'd just pick something relatively standard, but if circumstances require your own format then I would write conversion routines to a standard format.

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:

Thanks for the input.

The data is just position sensor readings that are being logged to flash. I have no expectation that this project will survive till 2038, but I'd still like to do the right thing in case it does. The data is for internal use only - I have control of the software that is responsible for interpreting the timestamps. If it was to be exported for some external tool, it could be transformed into a different format.

Currently, I am planning on writing a 256-byte flash page at a time, and each page will include a timestamp. I'd like each flash page to be usable without any of the other pages available.

The time/day info will be gathered from a bluetooth-connected cellphone, then tracked by internal RTC. I don't expect/need the timestamps to be super-accurate compared to walltime, but it would be nice to have millisecond resolution to be ensure that the pages can be ordered by time.

64-bit milliseconds since <epoch> seems to be pretty straightforward, unless I have to deal with it on a system that doesn't have support for 64-bit ints.

carticket
Jun 28, 2005

white and gold.

If you can handle it, my preference is text YYYYMMDDHHMMSS.xxxxx

Its not numerically comparable, but it is string comparable, and human readable. You can get to 10ms resolution in 64 bits if you store it BCD, and at that point it would be numerically comparable, and human readable in a hex editor.

Fusion Restaurant
May 20, 2015
What's a solid open source version of Arduino Uno boards? If it's possible to get a cheaper one made by someone else, I'm planning on sticking it into a robot so I'd rather permanently lose a cheaper chip than a more expensive one.

Star War Sex Parrot
Oct 2, 2003

Unless I'm misunderstanding you, just grab an AVR ATMega32 and a small breadboard off of Digikey.

Arcsech
Aug 5, 2008

Fusion Restaurant posted:

What's a solid open source version of Arduino Uno boards? If it's possible to get a cheaper one made by someone else, I'm planning on sticking it into a robot so I'd rather permanently lose a cheaper chip than a more expensive one.

If you're willing to order from and wait for shipping from :china: this clone from Fasttech is probably about the cheapest you'll get at $8. Might be able to get cheaper on eBay, but Fasttech is fairly reliable and you don't have to deal with eBay.

If you'd rather order from a US company, Sparkfun sells their Uno-compatible RedBoard for $20.

Tiger.Bomb
Jan 22, 2012
Does it have to be an UNO? The pro minis are cheap. I got mine for $3 a pop during a sale.

Skunkduster
Jul 15, 2005




I'm very new to microcontrollers and coding. I just got a launchpad and am going through the basics of blinking an LED on and off with this code:

code:
// most launchpads have a red LED
#define LED RED_LED

//see pins_energia.h for more LED definitions
//#define LED GREEN_LED
  
// the setup routine runs once when you press reset:
void setup() {                
  // initialize the digital pin as an output.
  pinMode(LED, OUTPUT);     
}

// the loop routine runs over and over again forever:
void loop() {
  digitalWrite(LED, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(LED, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
}
What is the most efficient way to modify the code to blink both LEDs? Is there some way to #define LED as RED_LED and GREEN_LED?

edit: also, if the delays were removed from the loop, what frequency would the LED blink at?

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

SkunkDuster posted:

I'm very new to microcontrollers and coding. I just got a launchpad and am going through the basics of blinking an LED on and off with this code:

code:
// most launchpads have a red LED
#define LED RED_LED

//see pins_energia.h for more LED definitions
//#define LED GREEN_LED
  
// the setup routine runs once when you press reset:
void setup() {                
  // initialize the digital pin as an output.
  pinMode(LED, OUTPUT);     
}

// the loop routine runs over and over again forever:
void loop() {
  digitalWrite(LED, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(LED, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
}
What is the most efficient way to modify the code to blink both LEDs? Is there some way to #define LED as RED_LED and GREEN_LED?

edit: also, if the delays were removed from the loop, what frequency would the LED blink at?

#define just uses the C preprocessor to cut-and-paste definitions in your source code; RED_LED and GREEN_LED don't become undefined or anything after LED is #define'd to one of them. digitalWrite(RED_LED, HIGH) and the same thing but with GREEN_LED should work fine assuming both RED_LED and GREEN_LED are defined in whatever header you're including to the proper values for the pins you have those LEDs attached to.

With regard to taking the delays out, the timing then depends on how fast you're running the micro and what the compiler is doing to that loop. If the delay-less loop is getting compiled down to three instructions (a write to an SFR controlling the pin you have the LED hooked up to to drive it high, another write to drive it low, and an unconditional jump to the start of the loop) and you're running at 8 MHz, the LED would blink at a frequency of 2.67 MHz with a duty cycle of 33%; you should know what frequency you're actually running at and you can take the delays out and inspect the generated asm to determine what the actual timing is in that case.

Skunkduster
Jul 15, 2005




Blotto Skorzany posted:

#define just uses the C preprocessor to cut-and-paste definitions in your source code; RED_LED and GREEN_LED don't become undefined or anything after LED is #define'd to one of them. digitalWrite(RED_LED, HIGH) and the same thing but with GREEN_LED should work fine assuming both RED_LED and GREEN_LED are defined in whatever header you're including to the proper values for the pins you have those LEDs attached to.

With regard to taking the delays out, the timing then depends on how fast you're running the micro and what the compiler is doing to that loop. If the delay-less loop is getting compiled down to three instructions (a write to an SFR controlling the pin you have the LED hooked up to to drive it high, another write to drive it low, and an unconditional jump to the start of the loop) and you're running at 8 MHz, the LED would blink at a frequency of 2.67 MHz with a duty cycle of 33%; you should know what frequency you're actually running at and you can take the delays out and inspect the generated asm to determine what the actual timing is in that case.

I believe I understand what you mean (I am VERY new to this) and modified it with #define RLED 2 and #define GLED 14 and added corresponding pinMode and digitalWrite lines and it did work. I was mainly wondering if I could #define LED to pins 2 and 14 so when set LED to high, it turns them both pins high with one statement.

Thanks for the description of the frequency. A couple of the acronyms are lost on me, but that's good enough to get a general idea.

ante
Apr 9, 2005

SUNSHINE AND RAINBOWS
The VHDL thread is dead, I guess, so this is the next best thing.


I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.

Aurium
Oct 10, 2010

SkunkDuster posted:

I believe I understand what you mean (I am VERY new to this) and modified it with #define RLED 2 and #define GLED 14 and added corresponding pinMode and digitalWrite lines and it did work. I was mainly wondering if I could #define LED to pins 2 and 14 so when set LED to high, it turns them both pins high with one statement.

Thanks for the description of the frequency. A couple of the acronyms are lost on me, but that's good enough to get a general idea.


Summary first:

Use digitalWrite(). Turn one pin on at a time. If you want the illusion turning them on with the same command, write a function that has 2 digitalWrite calls in it. What you're asking for can be done with port manipulation, but it will expose you to some technical details that will not really be helpful starting out, but will make your life harder. Port manipulation is more efficient/faster (often by a large factor) but faster only matters if the way you are currently doing something is actually too slow.


First off, which launchpad are you using? If you're using one of the older socketed ones, they run at 16mhz. The FRAM MSP-EXP430FR5969 also runs at 16Mhz. The MSP-EXP430F5529LP runs at 25Mhz. The MSP432 kit is 48mhz I think. Without knowing the output assembly, and how many cycles the individual instructions take, I can't tell you how fast your led blinks.

So now that THAT has added some confusion. If you are using the MSP430G2 launchpad, the following will turn on the in built RED_LED (on pin 1.0) and the inbuilt GREEN_LED (pin 1.6) in one command, provided they are already configured properly.

P1OUT |= 0b00100001;

To configure those pins for output, you would use

P1DIR |= 0b00100001;

Or the PinMode() function.

For the 5969 and the 5529, it is not possible for one command to turn them both on. The LEDs on those boards are on different ports. For the 432 it is not possible for the single red LED and the RGB led to be controlled by the same command, but any of the sub leds inside the RGB led CAN be controlled by a single command.

There's also a general electronics thread, and an arduino thread which could also help more with energia.

ante posted:

The VHDL thread is dead, I guess, so this is the next best thing.


I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.
I'd suggest posting in it anyway, you never know who has a thread bookmarked.

Personally, I have no idea, having never worked with any FPGAs.

taqueso
Mar 8, 2004


:911:
:wookie: :thermidor: :wookie:
:dehumanize:

:pirate::hf::tinfoil:

ante posted:

The VHDL thread is dead, I guess, so this is the next best thing.


I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.

I played around with it (a couple years ago) and it seemed to work well, if in a bit of a complicated way. I went back to the comfortable vendor toolchain for expediency and never bothered to try again. I think Altium can do what you want (sidestepping xilinx's toolchain) and you will be writing mostly the same HDL so you can bail pretty easily if it doesn't work out. I didn't care for the visual plug-together stuff, but that might have improved.

Ponderous Saxon
Jan 5, 2010
Fallen Rib

ante posted:

The VHDL thread is dead, I guess, so this is the next best thing.


I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.

What kind of integration are you after?

If you haven't already, it might be worthwhile checking out Vivado - they're still randomly breaking things between releases, but it's faster and the workflow much saner. If you don't like the IDE you avoid it entirely by using tcl scripting.

Ponderous Saxon fucked around with this message at 10:48 on Jul 12, 2015

Skunkduster
Jul 15, 2005




Aurium posted:

write a function that has 2 digitalWrite calls in it.

Perfect! That is exactly what I was looking for. As for the speed, I was more curious about how the speed of the blink was determined and not so much with what the actual speed is and you've answered that very well. Thanks!

ante
Apr 9, 2005

SUNSHINE AND RAINBOWS

Ponderous Saxon posted:

What kind of integration are you after?

If you haven't already, it might be worthwhile checking out Vivado - they're still randomly breaking things between releases, but it's faster and the workflow much saner. If you don't like the IDE you avoid it entirely by using tcl scripting.


taqueso posted:

I played around with it (a couple years ago) and it seemed to work well, if in a bit of a complicated way. I went back to the comfortable vendor toolchain for expediency and never bothered to try again. I think Altium can do what you want (sidestepping xilinx's toolchain) and you will be writing mostly the same HDL so you can bail pretty easily if it doesn't work out. I didn't care for the visual plug-together stuff, but that might have improved.

Honestly, I've been working on something stupidly complicated off and on for the past year, so I'm not even past the design/simulation stage.

I'll eventually be putting it on some higher end Spartan 6s, and my impression is that Vivado is the latest and greatest, but drops all support for non-enterprise chips like Spartans? I dunno, marketing doublespeak seems really prevalent in the FPGA world, so I had a hard time understanding exactly what the deal is.

Never thought of downloading Vivado just to use the tools and then back to ISE for layout and programming, but I guess it could be worth a shot.

Delta-Wye
Sep 29, 2005
Vivado doesn't support the spartan 6 family because gently caress you, that's why.

Anything seems like it would be an improvement over the vendor applied tools when it comes to xilinx and altera, but I'm not sure if I'd go running to altium of an companies expecting an improvement.

Spatial
Nov 15, 2007

When you're dealing with 16-bit fixed point image data, what is the correct way to do a multiply?

You can't just do:
code:
out = (x * y) >> 32
because in this number system 0xFFFF equals 1.0, not 0.9999.

What I do at the moment is:
code:
a = x;
b = (y==0) ? 0 : y+2;
out = (a * b) >> 32
This seems to work, but I'm worried I'm missing something obvious here.

sarehu
Apr 20, 2007

(call/cc call/cc)

Spatial posted:

When you're dealing with 16-bit fixed point image data, what is the correct way to do a multiply?

You can't just do:
code:
out = (x * y) >> 32
because in this number system 0xFFFF equals 1.0, not 0.9999.

You mean >> 16, right? Because x and y are values in [0, 65535], multiplying them produces a 32-bit result that needs to get squashed back down to [0, 65535].

Spatial posted:

What I do at the moment is:
code:
a = x;
b = (y==0) ? 0 : y+2;
out = (a * b) >> 32

Doesn't it produce wrong values? If y is 1 and x is 0xFFFF, you get (0xFFFF * 3) >> 32 which is 0x2FFFD >> 32 which is 2. You should get 1.

Let's let c = 0xFFFF. Then the "true" color values that range in [0.0, 1.0] are the values x/c, and y/c. (x/c) * (y/c) = (x*y)/(c*c). You want z such that z/c = (x*y)/(c*c). Thus z = (x*y)/c.

So you should do

code:
out = (x * y) / 0xFFFF
only maybe you don't want that, exactly -- maybe you want it to round to the nearest integer, instead of rounding down. Adjust accordingly.

Note that dividing by the constant 0xFFFF does not actually result in a division instruction -- look at the disassembly to see.

Spatial
Nov 15, 2007

Yeah sorry I have >> 16 in the code. It's late. :downs:

Thanks for the explanation, it clicked when you expressed it in fractions of the max value. Ahhh, so obvious in retrospect... I know the multiply by reciprocal trick, at least. :)

sarehu
Apr 20, 2007

(call/cc call/cc)
Also, since it's image data, and keep in mind I don't know what I'm talking about:

Blah blah blah think about gamma correction.

Spatial
Nov 15, 2007

Don't worry, I'm well versed in the actual image processing. I'm just used to doing it in floating point.

carticket
Jun 28, 2005

white and gold.

sarehu posted:

Note that dividing by the constant 0xFFFF does not actually result in a division instruction -- look at the disassembly to see.

Wait. I'm tired, so maybe I'm missing something, but what will this produce? It's mathematically different from a shift, since >>16 is equivalent to /65536 and not /65535.

sarehu
Apr 20, 2007

(call/cc call/cc)
If we compile this:
code:
#include <stdio.h>

int main(void) {
  unsigned int x, y;
  scanf("%u %u\n", &x, &y);
  unsigned int z = (x * y) / 0xFFFF;
  printf("z = %u\n", z);
  return 0;
}
We get this:
code:
0000000000400480 <main>:
  400480:	48 83 ec 18          	sub    $0x18,%rsp
  400484:	bf 54 06 40 00       	mov    $0x400654,%edi
  400489:	31 c0                	xor    %eax,%eax
  40048b:	48 8d 54 24 0c       	lea    0xc(%rsp),%rdx
  400490:	48 8d 74 24 08       	lea    0x8(%rsp),%rsi
  400495:	e8 d6 ff ff ff       	callq  400470 <__isoc99_scanf@plt>
  40049a:	8b 44 24 0c          	mov    0xc(%rsp),%eax
  40049e:	bf 5b 06 40 00       	mov    $0x40065b,%edi
  4004a3:	0f af 44 24 08       	imul   0x8(%rsp),%eax
  4004a8:	48 89 c1             	mov    %rax,%rcx
  4004ab:	48 89 c2             	mov    %rax,%rdx
  4004ae:	48 c1 e1 0f          	shl    $0xf,%rcx
  4004b2:	48 c1 e2 1f          	shl    $0x1f,%rdx
  4004b6:	48 01 ca             	add    %rcx,%rdx
  4004b9:	48 8d 34 02          	lea    (%rdx,%rax,1),%rsi
  4004bd:	31 c0                	xor    %eax,%eax
  4004bf:	48 c1 ee 2f          	shr    $0x2f,%rsi
  4004c3:	e8 78 ff ff ff       	callq  400440 <printf@plt>
  4004c8:	31 c0                	xor    %eax,%eax
  4004ca:	48 83 c4 18          	add    $0x18,%rsp
  4004ce:	c3                   	retq   
Edit: Which puts (p * (1 + 2^15 + 2^31)) / (2^47) in rsi, where p is the 32-bit unsigned product of x and y, extended to 64 bits, with the intervening computations being done using 64 bit registers, with truncating division.

Note that 2^47 / (1 + 2^15 + 2^31) = 65534.9999847.

sarehu fucked around with this message at 02:39 on Jul 22, 2015

sarehu
Apr 20, 2007

(call/cc call/cc)
The 32-bit version is:

code:
08048350 <main>:
 8048350:	8d 4c 24 04          	lea    0x4(%esp),%ecx
 8048354:	83 e4 f0             	and    $0xfffffff0,%esp
 8048357:	ff 71 fc             	pushl  -0x4(%ecx)
 804835a:	55                   	push   %ebp
 804835b:	89 e5                	mov    %esp,%ebp
 804835d:	51                   	push   %ecx
 804835e:	8d 45 f4             	lea    -0xc(%ebp),%eax
 8048361:	83 ec 18             	sub    $0x18,%esp
 8048364:	50                   	push   %eax
 8048365:	8d 45 f0             	lea    -0x10(%ebp),%eax
 8048368:	50                   	push   %eax
 8048369:	68 30 85 04 08       	push   $0x8048530
 804836e:	e8 cd ff ff ff       	call   8048340 <__isoc99_scanf@plt>
 8048373:	58                   	pop    %eax
 8048374:	5a                   	pop    %edx
 8048375:	8b 55 f4             	mov    -0xc(%ebp),%edx
 8048378:	b9 01 80 00 80       	mov    $0x80008001,%ecx
 804837d:	0f af 55 f0          	imul   -0x10(%ebp),%edx
 8048381:	89 d0                	mov    %edx,%eax
 8048383:	f7 e1                	mul    %ecx
 8048385:	c1 ea 0f             	shr    $0xf,%edx
 8048388:	52                   	push   %edx
 8048389:	68 37 85 04 08       	push   $0x8048537
 804838e:	e8 7d ff ff ff       	call   8048310 <printf@plt>
 8048393:	8b 4d fc             	mov    -0x4(%ebp),%ecx
 8048396:	83 c4 10             	add    $0x10,%esp
 8048399:	31 c0                	xor    %eax,%eax
 804839b:	c9                   	leave  
 804839c:	8d 61 fc             	lea    -0x4(%ecx),%esp
 804839f:	c3                   	ret    
It computes the same thing (the mul %ecx instruction leaves the 64-bit result of ecx*eax in edx:ecx, and right-shifting edx by 15 is the same as dropping 47 bits off a 64-bit result register).

(Never mind that this is an x86 CPU example in the "embedded programming" thread.)

Edit: If you have no multiply-with-64-bit-result function, and if emulating 64-bit multiply isn't a better option, you could do something akin to this:

code:
uint32_t divFFFF(uint32_t p) {
  uint32_t x = (p & 0xFFFF) + (p >> 16);
  p = x - p;
  p += (p << 16);
  p += ((x + 1) >> 16) + ((x + 2) >> 17);
  return p;
}
Though in this case, where you know that p != 0xFFFFFFFFu, you can drop the "+ ((x + 2) >> 17)".

Edit: If you want to know how the heck that works: Since 0x10000 = 1 (mod 0xFFFF), it follows that if p = q*0x10000 + r and x = q + r, then x = p (mod 0xFFFF). This is the same as adding up the digits (in base 10) to find if something's divisible by 9. Then we know p - x is a multiple of 0xFFFF. Thus, to compute (p - x)/0xFFFF, we can multiply (p - x) by the multiplicative inverse of 0xFFFF (the multiplicative inverse modulo 2^32), which is 0xFFFEFFFF, i.e -1 - (1 << 16). The product (p - x) * (-1 - (1 << 16)) is equal to (x - p) * (1 + (1 << 16)). That gives us (p-x)/0xFFFF, exactly, but we want to compute floor(p/0xFFFF). We have that either x < 0xFFFF, or 0xFFFF <= x < 2 * 0xFFFF, or x == 2 * 0xFFFF. So we adjust the result by 0, 1, or 2, accordingly.

Edit: ARM has umull which gives a hi:lo result, too.

sarehu fucked around with this message at 11:49 on Jul 22, 2015

Adbot
ADBOT LOVES YOU

Spatial
Nov 15, 2007

Now I have a division question :)

I need to get the modulo of two 48-bit values. I have a 32-bit CPU with no divider. However I do have a fast hardware division peripheral which does 64-by-32, giving me a 64-bit quotient and 32-bit remainder. What kind of approach should I use to take advantage of that?

I tried a few things but I had no luck so far.

  • Locked thread