64K is still quite a lot! Embedded Programming Microthread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 64K is still quite a lot! Embedded Programming Microthread

«‹›53 »

Tan Dumplord: Mar 9, 2005; by FactsAreUseless

I've been struggling to find a way to debug ARM assembly, as the Patchblocks hardware has no debugging interface. I was pricing out simulators and other debugging tools. Then I looked beside my monitor and saw the boxed, unused Raspberry Pi 2. :cripes:

# ? May 26, 2015 14:17

Adbot: ADBOT LOVES YOU

# ? May 13, 2024 02:52

Tan Dumplord: Mar 9, 2005; by FactsAreUseless

Stupid question time!

When do these two statements (in C) produce different results?

(a * b) / 1024
(a * b) >> 10

Where a is a signed 8 bit integer and b is a signed 32-bit integer.

I have a piece of code that uses /1024, and I thought I'd optimize it by switching to >>10, but it produces different results!

# ? Jun 5, 2015 17:15

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

It's implementation-defined whether right shift of a signed int is an arithmetic shift or a logical shift. Are these values sometimes negative by chance?

As an aside, strength reduction is a really easy optimization for the compiler to make. Consider inspecting the generated assembly and verifying it isn't happening before doing it yourself.

# ? Jun 5, 2015 17:23

Tan Dumplord: Mar 9, 2005; by FactsAreUseless

Yeah, a uses the full range of signed 8-bit values.

I always assumed that right shifts were arithmetic. I've written a whole bunch of code for this platform under that assumption. It's GCC. According to a mailing list post, GCC uses ASR when the value is signed.

As to strength reduction, I'm afraid I don't know x86 ASM. This code targets x86 for the emulator, but then targets ARMv7M for the hardware.

Here's /1024:

code:

	leal	1023(%eax), %edx
	testl	%eax, %eax
	cmovs	%edx, %eax
	sarl	$10, %eax

And here's >>10:

code:

	sarl	$10, %eax

# ? Jun 5, 2015 18:05

Pteretis: Nov 4, 2011

An arithmetic right shift is still not equivalent to division. The issue arises when you round a negative number.

# ? Jun 5, 2015 18:21

Tan Dumplord: Mar 9, 2005; by FactsAreUseless

Well poo poo, 10 seconds in Python cleared that right up for me. I really hadn't considered what happens when you right shift a 2's complement number. Shifting all your zeroes away leaves you at -1!

Goddamn. This really fucks up my understanding of fixed-point arithmetic.

Does this mean that -0.5 (Q10) >> 10 == -1? Fuuuuuuuuck.....

# ? Jun 5, 2015 18:39

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

sliderule posted:

Yeah, a uses the full range of signed 8-bit values.

I always assumed that right shifts were arithmetic. I've written a whole bunch of code for this platform under that assumption. It's GCC. According to a mailing list post, GCC uses ASR when the value is signed.

As to strength reduction, I'm afraid I don't know x86 ASM. This code targets x86 for the emulator, but then targets ARMv7M for the hardware.

Here's /1024:
code:
	leal	1023(%eax), %edx
	testl	%eax, %eax
	cmovs	%edx, %eax
	sarl	$10, %eax
And here's >>10:
code:
	sarl	$10, %eax

As you probably suspected, it's performing strength reduction and shifting right by ten in both cases, but in the / 1024 case it's correcting the rounding error that occurs when you do division via an arithmetic right shift on a negative, whereas in the >> 10 case it isn't bothering to because you told it to shift rather than divide.

Specifically, the in the / 1024 case it's doing basically

code:

if (n < 0) {
    n += 1023;
}
n >>= 10;

# ? Jun 5, 2015 18:44

Tan Dumplord: Mar 9, 2005; by FactsAreUseless

Well, I've learned something today! Thanks!

Now, to find all the places where I assumed that right shifting all the fractional bits away yielded the integer component.

# ? Jun 5, 2015 19:16

Rescue Toaster: Mar 13, 2003

Anyone know any good resources for patterns and/or data structures in embedded C? I've learned some key things at work and school, but I feel like there must be so much more out there. Especially anything that focuses on situations where you don't have dynamic allocation.

Things like you can do a binary tree inside an array where the child of node n are nodes (n*2) and (n*2)+1, as long as you keep it balanced and establish a max depth. Queues are generally simple with a max array size and head & tail pointers. Those sorts of things.

And for patterns, like 80% of the interesting 'patterns' I've found for C basically devolve into 're-implement dynamic dispatch yourself and then you can make a visitor!' which should just say 'use C++ idiot'.

# ? Jun 6, 2015 02:33

Jerry Bindle: May 16, 2003

Its really hard to generalize anything about embedded stuff, that goes for programming languages as well as hw. I recommend looking at a few RTOSes targeted to a platform that you're interested in to see how they structure the application framework. There's a book called "Practical UML Statecharts in C/C++" that walks you through the design of a framework that allows you to, well, design (or use) a framework that will execute UML statemachines. It is somewhat a manual for the author's proprietary software. But he does a good job of explaining all the theory/concepts behind choices he's made. After reading it you could put it down and write your own OS/Statemachine framework.

# ? Jun 6, 2015 03:10

Pan Et Circenses: Nov 2, 2009

Rescue Toaster posted:

Things like you can do a binary tree inside an array where the child of node n are nodes (n*2) and (n*2)+1, as long as you keep it balanced and establish a max depth. Queues are generally simple with a max array size and head & tail pointers. Those sorts of things.

Here's one that maybe everybody knows, but I use it somewhere in basically every embedded system I write. With queues like that, make their size a power of two, then you can wrap them with no branching and just a bitwise and:

head = head & (size - 1)

...equivalent to:

if(head >= size) head = 0;

Especially fast if size-1 is a compile time constant.

# ? Jun 7, 2015 19:31

Sweevo: Nov 8, 2007; i sometimes throw cables away

i mean straight into the bin without spending 10+ years in the box of might-come-in-handy-someday first

im a fucking monster

A cool trick with x86 assembly lets you do arbitrary-sized queues without branches. Assume BX is the current queue position, and CX is the queue size. It works on other architectures too as long as they have a Subtract With Carry instruction.

code:

inc  bx       ; next position
cmp  cx,bx    ; size - next
sbb  ax,ax    ; if carry set, ax = ffff, else ax = 0
and  bx,ax    ; bx = either next+1 or 0

It's not so useful on modern systems as branches are faster than they used to be. But can be fast on architectures with deep pipelines where branching is often slow.

Sweevo fucked around with this message at 11:15 on Jun 8, 2015

# ? Jun 8, 2015 10:29

taqueso: Mar 8, 2004

I want to store a timestamp along with some data. What is best practice for the format? I'd prefer to keep it compact, but I'd also like to avoid rollover issues. Is 64-bit seconds or milliseconds past some epoch the way to go?

# ? Jun 8, 2015 17:13

Colonel Taint: Mar 14, 2004

taqueso posted:

I want to store a timestamp along with some data. What is best practice for the format? I'd prefer to keep it compact, but I'd also like to avoid rollover issues. Is 64-bit seconds or milliseconds past some epoch the way to go?

Best practice is to use something that works well for your system, I would think. What kind of data are you tagging with the timestamp? How is it being delivered to the end point? How crucial is it, really, to save a few bits. I'd tend to err on the conservative side in general, but there may be a few cases where it may not really be worth sweating or it would be better to send full timestamps with every packet. Lossy/unreliable networks (udp) would imply that you need to send full information as much as possible, but if the delivery network is considered reliable (eg end-to-end serial), you could send an initial state packet with an absolute 64-bit timestamp, and save some bits by sending offsets. If you suspect the stream may be interrupted, a packet containing a full timestamp and the offset could be sent every n seconds as well.

Most 64 bit time structures I've worked with split the 64 bits into two words, with 32 bits of seconds and 32 bits of microseconds or nanoseconds. Signed 32-bit seconds-since-epoch rolls over in 2038, but you could potentially cut the bits another way depending on the precision you need. So if you only need millisecond precision you can do a 54/10 split to get the most of your seconds word.

# ? Jun 8, 2015 19:48

yippee cahier: Mar 28, 2005

taqueso posted:

I want to store a timestamp along with some data. What is best practice for the format? I'd prefer to keep it compact, but I'd also like to avoid rollover issues. Is 64-bit seconds or milliseconds past some epoch the way to go?

Depends on the application. Do you need precision or space? In my current project we are using UNIX timestamps paired with a millisecond subsecond component. I've also seen 64-bit 100-ns counts since 1601-01-01, i.e. Windows FILETIMEs, which offered a more than enough precision for the application and plenty of time remaining in a cycle until rollover.

I'd just pick something relatively standard, but if circumstances require your own format then I would write conversion routines to a standard format.

# ? Jun 8, 2015 19:48

taqueso: Mar 8, 2004

Thanks for the input.

The data is just position sensor readings that are being logged to flash. I have no expectation that this project will survive till 2038, but I'd still like to do the right thing in case it does. The data is for internal use only - I have control of the software that is responsible for interpreting the timestamps. If it was to be exported for some external tool, it could be transformed into a different format.

Currently, I am planning on writing a 256-byte flash page at a time, and each page will include a timestamp. I'd like each flash page to be usable without any of the other pages available.

The time/day info will be gathered from a bluetooth-connected cellphone, then tracked by internal RTC. I don't expect/need the timestamps to be super-accurate compared to walltime, but it would be nice to have millisecond resolution to be ensure that the pages can be ordered by time.

64-bit milliseconds since <epoch> seems to be pretty straightforward, unless I have to deal with it on a system that doesn't have support for 64-bit ints.

# ? Jun 8, 2015 20:09

carticket: Jun 28, 2005; white and gold.

If you can handle it, my preference is text YYYYMMDDHHMMSS.xxxxx

Its not numerically comparable, but it is string comparable, and human readable. You can get to 10ms resolution in 64 bits if you store it BCD, and at that point it would be numerically comparable, and human readable in a hex editor.

# ? Jun 12, 2015 19:58

Fusion Restaurant: May 20, 2015

What's a solid open source version of Arduino Uno boards? If it's possible to get a cheaper one made by someone else, I'm planning on sticking it into a robot so I'd rather permanently lose a cheaper chip than a more expensive one.

# ? Jun 24, 2015 20:36

Star War Sex Parrot: Oct 2, 2003

Unless I'm misunderstanding you, just grab an AVR ATMega32 and a small breadboard off of Digikey.

# ? Jun 24, 2015 20:55

Arcsech: Aug 5, 2008

Fusion Restaurant posted:

What's a solid open source version of Arduino Uno boards? If it's possible to get a cheaper one made by someone else, I'm planning on sticking it into a robot so I'd rather permanently lose a cheaper chip than a more expensive one.

If you're willing to order from and wait for shipping from :china:

this clone from Fasttech is probably about the cheapest you'll get at $8. Might be able to get cheaper on eBay, but Fasttech is fairly reliable and you don't have to deal with eBay.

If you'd rather order from a US company, Sparkfun sells their Uno-compatible RedBoard for $20.

# ? Jun 24, 2015 20:58

Tiger.Bomb: Jan 22, 2012

Does it have to be an UNO? The pro minis are cheap. I got mine for $3 a pop during a sale.

# ? Jun 25, 2015 07:32

Skunkduster: Jul 15, 2005

I'm very new to microcontrollers and coding. I just got a launchpad and am going through the basics of blinking an LED on and off with this code:

code:

// most launchpads have a red LED
#define LED RED_LED

//see pins_energia.h for more LED definitions
//#define LED GREEN_LED
  
// the setup routine runs once when you press reset:
void setup() {                
  // initialize the digital pin as an output.
  pinMode(LED, OUTPUT);     
}

// the loop routine runs over and over again forever:
void loop() {
  digitalWrite(LED, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(LED, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
}

What is the most efficient way to modify the code to blink both LEDs? Is there some way to #define LED as RED_LED and GREEN_LED?

edit: also, if the delays were removed from the loop, what frequency would the LED blink at?

# ? Jul 11, 2015 22:17

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

SkunkDuster posted:

I'm very new to microcontrollers and coding. I just got a launchpad and am going through the basics of blinking an LED on and off with this code:
code:
// most launchpads have a red LED
#define LED RED_LED

//see pins_energia.h for more LED definitions
//#define LED GREEN_LED
  
// the setup routine runs once when you press reset:
void setup() {                
  // initialize the digital pin as an output.
  pinMode(LED, OUTPUT);     
}

// the loop routine runs over and over again forever:
void loop() {
  digitalWrite(LED, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(LED, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
}
What is the most efficient way to modify the code to blink both LEDs? Is there some way to #define LED as RED_LED and GREEN_LED?

edit: also, if the delays were removed from the loop, what frequency would the LED blink at?

#define just uses the C preprocessor to cut-and-paste definitions in your source code; RED_LED and GREEN_LED don't become undefined or anything after LED is #define'd to one of them. digitalWrite(RED_LED, HIGH) and the same thing but with GREEN_LED should work fine assuming both RED_LED and GREEN_LED are defined in whatever header you're including to the proper values for the pins you have those LEDs attached to.

With regard to taking the delays out, the timing then depends on how fast you're running the micro and what the compiler is doing to that loop. If the delay-less loop is getting compiled down to three instructions (a write to an SFR controlling the pin you have the LED hooked up to to drive it high, another write to drive it low, and an unconditional jump to the start of the loop) and you're running at 8 MHz, the LED would blink at a frequency of 2.67 MHz with a duty cycle of 33%; you should know what frequency you're actually running at and you can take the delays out and inspect the generated asm to determine what the actual timing is in that case.

# ? Jul 11, 2015 22:35

Skunkduster: Jul 15, 2005

Blotto Skorzany posted:

#define just uses the C preprocessor to cut-and-paste definitions in your source code; RED_LED and GREEN_LED don't become undefined or anything after LED is #define'd to one of them. digitalWrite(RED_LED, HIGH) and the same thing but with GREEN_LED should work fine assuming both RED_LED and GREEN_LED are defined in whatever header you're including to the proper values for the pins you have those LEDs attached to.

With regard to taking the delays out, the timing then depends on how fast you're running the micro and what the compiler is doing to that loop. If the delay-less loop is getting compiled down to three instructions (a write to an SFR controlling the pin you have the LED hooked up to to drive it high, another write to drive it low, and an unconditional jump to the start of the loop) and you're running at 8 MHz, the LED would blink at a frequency of 2.67 MHz with a duty cycle of 33%; you should know what frequency you're actually running at and you can take the delays out and inspect the generated asm to determine what the actual timing is in that case.

I believe I understand what you mean (I am VERY new to this) and modified it with #define RLED 2 and #define GLED 14 and added corresponding pinMode and digitalWrite lines and it did work. I was mainly wondering if I could #define LED to pins 2 and 14 so when set LED to high, it turns them both pins high with one statement.

Thanks for the description of the frequency. A couple of the acronyms are lost on me, but that's good enough to get a general idea.

# ? Jul 11, 2015 22:59

ante: Apr 9, 2005; SUNSHINE AND RAINBOWS

The VHDL thread is dead, I guess, so this is the next best thing.

I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.

# ? Jul 12, 2015 01:51

Aurium: Oct 10, 2010

SkunkDuster posted:

I believe I understand what you mean (I am VERY new to this) and modified it with #define RLED 2 and #define GLED 14 and added corresponding pinMode and digitalWrite lines and it did work. I was mainly wondering if I could #define LED to pins 2 and 14 so when set LED to high, it turns them both pins high with one statement.

Thanks for the description of the frequency. A couple of the acronyms are lost on me, but that's good enough to get a general idea.

Summary first:

Use digitalWrite(). Turn one pin on at a time. If you want the illusion turning them on with the same command, write a function that has 2 digitalWrite calls in it. What you're asking for can be done with port manipulation, but it will expose you to some technical details that will not really be helpful starting out, but will make your life harder. Port manipulation is more efficient/faster (often by a large factor) but faster only matters if the way you are currently doing something is actually too slow.

First off, which launchpad are you using? If you're using one of the older socketed ones, they run at 16mhz. The FRAM MSP-EXP430FR5969 also runs at 16Mhz. The MSP-EXP430F5529LP runs at 25Mhz. The MSP432 kit is 48mhz I think. Without knowing the output assembly, and how many cycles the individual instructions take, I can't tell you how fast your led blinks.

So now that THAT has added some confusion. If you are using the MSP430G2 launchpad, the following will turn on the in built RED_LED (on pin 1.0) and the inbuilt GREEN_LED (pin 1.6) in one command, provided they are already configured properly.

P1OUT |= 0b00100001;

To configure those pins for output, you would use

P1DIR |= 0b00100001;

Or the PinMode() function.

For the 5969 and the 5529, it is not possible for one command to turn them both on. The LEDs on those boards are on different ports. For the 432 it is not possible for the single red LED and the RGB led to be controlled by the same command, but any of the sub leds inside the RGB led CAN be controlled by a single command.

There's also a general electronics thread, and an arduino thread which could also help more with energia.

ante posted:

The VHDL thread is dead, I guess, so this is the next best thing.

I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.

I'd suggest posting in it anyway, you never know who has a thread bookmarked.

Personally, I have no idea, having never worked with any FPGAs.

# ? Jul 12, 2015 04:09

taqueso: Mar 8, 2004

ante posted:

The VHDL thread is dead, I guess, so this is the next best thing.

I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.

I played around with it (a couple years ago) and it seemed to work well, if in a bit of a complicated way. I went back to the comfortable vendor toolchain for expediency and never bothered to try again. I think Altium can do what you want (sidestepping xilinx's toolchain) and you will be writing mostly the same HDL so you can bail pretty easily if it doesn't work out. I didn't care for the visual plug-together stuff, but that might have improved.

# ? Jul 12, 2015 04:27

Ponderous Saxon: Jan 5, 2010; Fallen Rib

ante posted:

The VHDL thread is dead, I guess, so this is the next best thing.

I have an Altium license through work. Would it be worthwhile to try FPGA development using that instead of ISE?

Xilinx's tools are stupidly clunky and aren't really well integrated, so it would be nice if Altium sidestepped all those issues.

What kind of integration are you after?

If you haven't already, it might be worthwhile checking out Vivado - they're still randomly breaking things between releases, but it's faster and the workflow much saner. If you don't like the IDE you avoid it entirely by using tcl scripting.

Ponderous Saxon fucked around with this message at 10:48 on Jul 12, 2015

# ? Jul 12, 2015 10:12

Skunkduster: Jul 15, 2005

Aurium posted:

write a function that has 2 digitalWrite calls in it.

Perfect! That is exactly what I was looking for. As for the speed, I was more curious about how the speed of the blink was determined and not so much with what the actual speed is and you've answered that very well. Thanks!

# ? Jul 12, 2015 11:33

ante: Apr 9, 2005; SUNSHINE AND RAINBOWS

Ponderous Saxon posted:

What kind of integration are you after?

If you haven't already, it might be worthwhile checking out Vivado - they're still randomly breaking things between releases, but it's faster and the workflow much saner. If you don't like the IDE you avoid it entirely by using tcl scripting.

taqueso posted:

I played around with it (a couple years ago) and it seemed to work well, if in a bit of a complicated way. I went back to the comfortable vendor toolchain for expediency and never bothered to try again. I think Altium can do what you want (sidestepping xilinx's toolchain) and you will be writing mostly the same HDL so you can bail pretty easily if it doesn't work out. I didn't care for the visual plug-together stuff, but that might have improved.

Honestly, I've been working on something stupidly complicated off and on for the past year, so I'm not even past the design/simulation stage.

I'll eventually be putting it on some higher end Spartan 6s, and my impression is that Vivado is the latest and greatest, but drops all support for non-enterprise chips like Spartans? I dunno, marketing doublespeak seems really prevalent in the FPGA world, so I had a hard time understanding exactly what the deal is.

Never thought of downloading Vivado just to use the tools and then back to ISE for layout and programming, but I guess it could be worth a shot.

# ? Jul 13, 2015 08:08

Delta-Wye: Sep 29, 2005

Vivado doesn't support the spartan 6 family because gently caress you, that's why.

Anything seems like it would be an improvement over the vendor applied tools when it comes to xilinx and altera, but I'm not sure if I'd go running to altium of an companies expecting an improvement.

# ? Jul 13, 2015 18:46

Spatial: Nov 15, 2007

When you're dealing with 16-bit fixed point image data, what is the correct way to do a multiply?

You can't just do:

code:

out = (x * y) >> 32

because in this number system 0xFFFF equals 1.0, not 0.9999.

What I do at the moment is:

code:

a = x;
b = (y==0) ? 0 : y+2;
out = (a * b) >> 32

This seems to work, but I'm worried I'm missing something obvious here.

# ? Jul 19, 2015 03:32

sarehu: Apr 20, 2007; (call/cc call/cc)

Spatial posted:

When you're dealing with 16-bit fixed point image data, what is the correct way to do a multiply?

You can't just do:
code:
out = (x * y) >> 32
because in this number system 0xFFFF equals 1.0, not 0.9999.

You mean >> 16, right? Because x and y are values in [0, 65535], multiplying them produces a 32-bit result that needs to get squashed back down to [0, 65535].

Spatial posted:

What I do at the moment is:
code:
a = x;
b = (y==0) ? 0 : y+2;
out = (a * b) >> 32

Doesn't it produce wrong values? If y is 1 and x is 0xFFFF, you get (0xFFFF * 3) >> 32 which is 0x2FFFD >> 32 which is 2. You should get 1.

Let's let c = 0xFFFF. Then the "true" color values that range in [0.0, 1.0] are the values x/c, and y/c. (x/c) * (y/c) = (x*y)/(c*c). You want z such that z/c = (x*y)/(c*c). Thus z = (x*y)/c.

So you should do

code:

out = (x * y) / 0xFFFF

only maybe you don't want that, exactly -- maybe you want it to round to the nearest integer, instead of rounding down. Adjust accordingly.

Note that dividing by the constant 0xFFFF does not actually result in a division instruction -- look at the disassembly to see.

# ? Jul 19, 2015 04:18

Spatial: Nov 15, 2007

Yeah sorry I have >> 16 in the code. It's late. :downs:

Thanks for the explanation, it clicked when you expressed it in fractions of the max value. Ahhh, so obvious in retrospect... I know the multiply by reciprocal trick, at least.

# ? Jul 19, 2015 05:05

sarehu: Apr 20, 2007; (call/cc call/cc)

Also, since it's image data, and keep in mind I don't know what I'm talking about:

Blah blah blah think about gamma correction.

# ? Jul 19, 2015 06:20

Spatial: Nov 15, 2007

Don't worry, I'm well versed in the actual image processing. I'm just used to doing it in floating point.

# ? Jul 19, 2015 16:39

carticket: Jun 28, 2005; white and gold.

sarehu posted:

Note that dividing by the constant 0xFFFF does not actually result in a division instruction -- look at the disassembly to see.

Wait. I'm tired, so maybe I'm missing something, but what will this produce? It's mathematically different from a shift, since >>16 is equivalent to /65536 and not /65535.

# ? Jul 22, 2015 01:57

sarehu: Apr 20, 2007; (call/cc call/cc)

If we compile this:

code:

#include <stdio.h>

int main(void) {
  unsigned int x, y;
  scanf("%u %u\n", &x, &y);
  unsigned int z = (x * y) / 0xFFFF;
  printf("z = %u\n", z);
  return 0;
}

We get this:

code:

0000000000400480 <main>:
  400480:	48 83 ec 18          	sub    $0x18,%rsp
  400484:	bf 54 06 40 00       	mov    $0x400654,%edi
  400489:	31 c0                	xor    %eax,%eax
  40048b:	48 8d 54 24 0c       	lea    0xc(%rsp),%rdx
  400490:	48 8d 74 24 08       	lea    0x8(%rsp),%rsi
  400495:	e8 d6 ff ff ff       	callq  400470 <__isoc99_scanf@plt>
  40049a:	8b 44 24 0c          	mov    0xc(%rsp),%eax
  40049e:	bf 5b 06 40 00       	mov    $0x40065b,%edi
  4004a3:	0f af 44 24 08       	imul   0x8(%rsp),%eax
  4004a8:	48 89 c1             	mov    %rax,%rcx
  4004ab:	48 89 c2             	mov    %rax,%rdx
  4004ae:	48 c1 e1 0f          	shl    $0xf,%rcx
  4004b2:	48 c1 e2 1f          	shl    $0x1f,%rdx
  4004b6:	48 01 ca             	add    %rcx,%rdx
  4004b9:	48 8d 34 02          	lea    (%rdx,%rax,1),%rsi
  4004bd:	31 c0                	xor    %eax,%eax
  4004bf:	48 c1 ee 2f          	shr    $0x2f,%rsi
  4004c3:	e8 78 ff ff ff       	callq  400440 <printf@plt>
  4004c8:	31 c0                	xor    %eax,%eax
  4004ca:	48 83 c4 18          	add    $0x18,%rsp
  4004ce:	c3                   	retq

Edit: Which puts (p * (1 + 2^15 + 2^31)) / (2^47) in rsi, where p is the 32-bit unsigned product of x and y, extended to 64 bits, with the intervening computations being done using 64 bit registers, with truncating division.

Note that 2^47 / (1 + 2^15 + 2^31) = 65534.9999847.

sarehu fucked around with this message at 02:39 on Jul 22, 2015

# ? Jul 22, 2015 02:26

sarehu: Apr 20, 2007; (call/cc call/cc)

The 32-bit version is:

code:

08048350 <main>:
 8048350:	8d 4c 24 04          	lea    0x4(%esp),%ecx
 8048354:	83 e4 f0             	and    $0xfffffff0,%esp
 8048357:	ff 71 fc             	pushl  -0x4(%ecx)
 804835a:	55                   	push   %ebp
 804835b:	89 e5                	mov    %esp,%ebp
 804835d:	51                   	push   %ecx
 804835e:	8d 45 f4             	lea    -0xc(%ebp),%eax
 8048361:	83 ec 18             	sub    $0x18,%esp
 8048364:	50                   	push   %eax
 8048365:	8d 45 f0             	lea    -0x10(%ebp),%eax
 8048368:	50                   	push   %eax
 8048369:	68 30 85 04 08       	push   $0x8048530
 804836e:	e8 cd ff ff ff       	call   8048340 <__isoc99_scanf@plt>
 8048373:	58                   	pop    %eax
 8048374:	5a                   	pop    %edx
 8048375:	8b 55 f4             	mov    -0xc(%ebp),%edx
 8048378:	b9 01 80 00 80       	mov    $0x80008001,%ecx
 804837d:	0f af 55 f0          	imul   -0x10(%ebp),%edx
 8048381:	89 d0                	mov    %edx,%eax
 8048383:	f7 e1                	mul    %ecx
 8048385:	c1 ea 0f             	shr    $0xf,%edx
 8048388:	52                   	push   %edx
 8048389:	68 37 85 04 08       	push   $0x8048537
 804838e:	e8 7d ff ff ff       	call   8048310 <printf@plt>
 8048393:	8b 4d fc             	mov    -0x4(%ebp),%ecx
 8048396:	83 c4 10             	add    $0x10,%esp
 8048399:	31 c0                	xor    %eax,%eax
 804839b:	c9                   	leave  
 804839c:	8d 61 fc             	lea    -0x4(%ecx),%esp
 804839f:	c3                   	ret

It computes the same thing (the mul %ecx instruction leaves the 64-bit result of ecx*eax in edx:ecx, and right-shifting edx by 15 is the same as dropping 47 bits off a 64-bit result register).

(Never mind that this is an x86 CPU example in the "embedded programming" thread.)

Edit: If you have no multiply-with-64-bit-result function, and if emulating 64-bit multiply isn't a better option, you could do something akin to this:

code:

uint32_t divFFFF(uint32_t p) {
  uint32_t x = (p & 0xFFFF) + (p >> 16);
  p = x - p;
  p += (p << 16);
  p += ((x + 1) >> 16) + ((x + 2) >> 17);
  return p;
}

Though in this case, where you know that p != 0xFFFFFFFFu, you can drop the "+ ((x + 2) >> 17)".

Edit: If you want to know how the heck that works: Since 0x10000 = 1 (mod 0xFFFF), it follows that if p = q*0x10000 + r and x = q + r, then x = p (mod 0xFFFF). This is the same as adding up the digits (in base 10) to find if something's divisible by 9. Then we know p - x is a multiple of 0xFFFF. Thus, to compute (p - x)/0xFFFF, we can multiply (p - x) by the multiplicative inverse of 0xFFFF (the multiplicative inverse modulo 2^32), which is 0xFFFEFFFF, i.e -1 - (1 << 16). The product (p - x) * (-1 - (1 << 16)) is equal to (x - p) * (1 + (1 << 16)). That gives us (p-x)/0xFFFF, exactly, but we want to compute floor(p/0xFFFF). We have that either x < 0xFFFF, or 0xFFFF <= x < 2 * 0xFFFF, or x == 2 * 0xFFFF. So we adjust the result by 0, 1, or 2, accordingly.

Edit: ARM has umull which gives a hi:lo result, too.

sarehu fucked around with this message at 11:49 on Jul 22, 2015

# ? Jul 22, 2015 02:49

Adbot: ADBOT LOVES YOU

# ? May 13, 2024 02:52

Spatial: Nov 15, 2007

Now I have a division question

I need to get the modulo of two 48-bit values. I have a 32-bit CPU with no divider. However I do have a fast hardware division peripheral which does 64-by-32, giving me a 64-bit quotient and 32-bit remainder. What kind of approach should I use to take advantage of that?

I tried a few things but I had no luck so far.

# ? Jul 31, 2015 19:53

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > 64K is still quite a lot! Embedded Programming Microthread

«‹›53 »