Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Spatial
Nov 15, 2007

My preferred way of interacting with memory mapped peripherals is overlaying nested structures and using bitfields. Autocomplete works, you don't have to care about masking things, it's easy to document and makes for very readable code at the point of use.

For devices with multiple instances of submodules with gaps between them, you literally put the gaps into the structs to match the stride of the instances. In my experience the transistor-touchers pretty much always do this intentionally in a way that allows for an array-of-structs memory overlay because it makes for easy address decoding on their end.

ARM's recommendation is super conservative. If you know what the code actually does you aren't going to have a problem using any kind of struct or bitfields. Using packed structs isn't really a good idea but it's hardly applicable: none of the fields are going to be unaligned on a peripheral anyway. Bitfield accesses are just basic read-modify-writes. The time to be careful is when you've got stuff like multiple bits which all need to be set in one write e.g. when a write triggers some action immediately.

Here's a cut down version of a DMA driver I made with just the structs shown.
code:
struct DmaConfig {
    u32 enable : 1;
};

struct DmaChannelConfig {
    u32 transfers : 12;
    u32 srcInc    : 1;
    u32 dstInc    : 1;
    u32 srcWidth  : 2;
    u32 dstWidth  : 2;
    u32 srcPeriph : 1;
    u32 dstPeriph : 1;
    u32           : 11;
    u32 enable    : 1;
};

struct DmaChannel {
    const void*             src;
    void*                   dst;
    struct DmaChannelConfig cfg;
    u32                     pad;
};

struct DmaController {
    struct DmaConfig  cfg;
    struct DmaChannel channels[ 8 ];
};
Elsewhere:
code:
static volatile struct DmaController * const dma = (void*) 0x40001000;
And then using it goes like this:
code:
void uartTx( const void * const buffer, const u32 bytes ) {
    volatile struct DmaChannel * channel = & dma->channels[ DmaChannelUartTx ];
    channel->src = buffer;
    channel->dst = & uart->tx;
    channel->cfg = (struct DmaChannelConfig) {
        .transfers = bytes,
        .srcInc    = true,
        .srcWidth  = DmaChannelWidthByte,
        .dstWidth  = DmaChannelWidthByte,
        .dstPeriph = DmaPeripheralUartTx,
        .enable    = true
    };
}

Adbot
ADBOT LOVES YOU

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
The careful reader will note that the recommendation in those docs is to avoid using packed structs to make unaligned fields, which is a pretty good idea. Avoiding packed structs entirely is an overly-conservative reading.

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

Jabor posted:

The careful reader will note that the recommendation in those docs is to avoid using packed structs to make unaligned fields, which is a pretty good idea. Avoiding packed structs entirely is an overly-conservative reading.
Yeah - if your hardware has registers that don't meet the alignment requirements of your cpu, you've got bigger problems than what sort of code your compiler outputs to access those registers.

Volguus
Mar 3, 2009
Is it wrong and how wrong is it to handle an enum like an int? Example:

code:
enum SomeEnum {
 	Value0 = 0,
        Value1,
       Value2,
   .....
       COUNT
};

SomeEnum selectedValue = SomeEnum::Value0;

//then later we want to go  through the values

int val = selectedValue;
++val;
if(val >= SomeEnum::COUNT) val = 0;

selectedValue = (SomeEnum)val;
Are there any issues with this code? On an x86 platform, gcc.

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

Volguus posted:

Is it wrong and how wrong is it to handle an enum like an int? Example:

code:
enum SomeEnum {
 	Value0 = 0,
        Value1,
       Value2,
   .....
       COUNT
};

SomeEnum selectedValue = SomeEnum::Value0;

//then later we want to go  through the values

int val = selectedValue;
++val;
if(val >= SomeEnum::COUNT) val = 0;

selectedValue = (SomeEnum)val;
Are there any issues with this code? On an x86 platform, gcc.

Is this C or C++? I think this is fine C, though I'd put the incrementing in a helper function, and maybe check for/warn about negative values. In C++ you can use a strongly typed enum to guarantee you don't pass a random integer as an enum and specify what the underlying type is.

Volguus
Mar 3, 2009

Jeffrey of YOSPOS posted:

Is this C or C++? I think this is fine C, though I'd put the incrementing in a helper function, and maybe check for/warn about negative values. In C++ you can use a strongly typed enum to guarantee you don't pass a random integer as an enum and specify what the underlying type is.

It's C++ . I was just wondering if there could be any problems with casting the enum values to an int and viceversa that I'm not aware of.

Qwertycoatl
Dec 31, 2008

Spatial posted:

My preferred way of interacting with memory mapped peripherals is overlaying nested structures and using bitfields. Autocomplete works, you don't have to care about masking things, it's easy to document and makes for very readable code at the point of use.

Depending on the hardware and compiler, using bitfields can go hideously wrong - if hardware is expecting all memory accesses to be four bytes, and the compiler turns your bitfield access into a one byte write (as it's perfectly entitled to do), the hardware might ignore it or do something stupid

e: it can also trip you up if the register makes the hardware do something when you write back a bit you read, or when reading has side effects, and all the other screwy things HW designers like to do

Qwertycoatl fucked around with this message at 18:47 on Mar 7, 2019

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
The right way to do this while still getting the convenience of bit-fields is to make the actual volatile field an int32 or whatever, but provide functions that take and return a more expressive type.

That would, of course, require the slightest amount of thought about code structure instead of producing excuses to wildly rend your garments and curse the sun, so I am not surprised that it is a solution that has never occurred to system programmers in 40 years of programming.

Qwertycoatl
Dec 31, 2008

The sun doesn't curse itself you know.

feedmegin
Jul 30, 2008

Qwertycoatl posted:

The sun doesn't curse itself you know.

Should have picked OSF/1 instead imo

Spatial
Nov 15, 2007

Qwertycoatl posted:

Depending on the hardware and compiler, using bitfields can go hideously wrong - if hardware is expecting all memory accesses to be four bytes, and the compiler turns your bitfield access into a one byte write (as it's perfectly entitled to do), the hardware might ignore it or do something stupid

e: it can also trip you up if the register makes the hardware do something when you write back a bit you read, or when reading has side effects, and all the other screwy things HW designers like to do
In the compilers I've used the width of the access is determined by the datatype you precede the bitfield with. Although with GCC you do have to enforce this via a flag.

Always gotta watch out for the special stuff with side effects though. My favourite is when you get a register that means completely different things when read versus when written.

e: Actually no the absolute worst is bastards who make READING a register kick off some hardware action. Inevitably you will want to debug it and the debugger will constantly read it, and you won't realise it for four hours. :v:

Spatial fucked around with this message at 22:00 on Mar 7, 2019

Spatial
Nov 15, 2007

rjmccall posted:

The right way to do this while still getting the convenience of bit-fields is to make the actual volatile field an int32 or whatever, but provide functions that take and return a more expressive type.

That would, of course, require the slightest amount of thought about code structure instead of producing excuses to wildly rend your garments and curse the sun, so I am not surprised that it is a solution that has never occurred to system programmers in 40 years of programming.
Too slow, gotta go fast!

Qwertycoatl
Dec 31, 2008

Spatial posted:

e: Actually no the absolute worst is bastards who make READING a register kick off some hardware action. Inevitably you will want to debug it and the debugger will constantly read it, and you won't realise it for four hours. :v:

Yeah we have a couple of fifos where you pop them by reading them, and a ton of counter registers that are clear on read

csammis
Aug 26, 2003

Mental Institution

Spatial posted:

e: Actually no the absolute worst is bastards who make READING a register kick off some hardware action. Inevitably you will want to debug it and the debugger will constantly read it, and you won't realise it for four hours. :v:

Christ yes this. This was among my very first professional experiences debugging hardware (a SPI peripheral's FIFO access register which popped off the FIFO when it was read) and also my very most recent experience debugging hardware, just yesterday in fact (a USB peripheral with clear-on-read bits in the interrupt status register).

It keeps happening

qsvui
Aug 23, 2003
some crazy thing
Has anyone tried using Kvasir? It's supposed to be some sort of metaprogramming library for microcontroller I/O and registers. I was considering it but the last commit was over a year ago so I assume the developer just got bored and left.

Qwertycoatl
Dec 31, 2008

qsvui posted:

Has anyone tried using Kvasir? It's supposed to be some sort of metaprogramming library for microcontroller I/O and registers. I was considering it but the last commit was over a year ago so I assume the developer just got bored and left.

It appears to be completely undocumented.

e: Oh, not completely. But "doc" just has dolor sit amet. And there's still very little information on how to use it, and I'd hate to debug it/anything using it.

Qwertycoatl fucked around with this message at 07:45 on Mar 8, 2019

General_Failure
Apr 17, 2005

Jeffrey of YOSPOS posted:

Yeah - if your hardware has registers that don't meet the alignment requirements of your cpu, you've got bigger problems than what sort of code your compiler outputs to access those registers.

You're not wrong. I think it's the DWC HDMI component of the whole video ...thing that has the byte aligned registers. Ie four registers per word. Remember ARM can't do byte accesses.

I had a little time to check the compiler I'm using actually supports bit fields in C. It does. I could use GCC but that's a whole new can of worms.

When I have a little time I might write a better test program to ensure it behaves as expected. All I did last night was write some bits then read and print them.

e:I was concerned about code portability being an issue using bitfields, until my dumb brain realised that this code is tied to the architecture anyway because it's driver code for ARM specific hardware on an ARM specific OS.

General_Failure fucked around with this message at 03:51 on Mar 9, 2019

Dren
Jan 5, 2001

Pillbug
Anyone have any thoughts about state machine libraries? My initial research turned up boost MSM, boost statechart, and an experimental boost lib called SML. SML has C++14 features and supposedly fixes some compile time and other issues that pop up with MSM. SML even has example code to print out the state machine as plantUML markup.

I’ve implemented some simple examples with SML and it seems to work fine but I’m interested if anyone has any thoughts.

fritz
Jul 26, 2003

The 2d graphics guys are talking audio now : http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1386r0.pdf

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

General_Failure posted:

You're not wrong. I think it's the DWC HDMI component of the whole video ...thing that has the byte aligned registers. Ie four registers per word. Remember ARM can't do byte accesses.

I had a little time to check the compiler I'm using actually supports bit fields in C. It does. I could use GCC but that's a whole new can of worms.

When I have a little time I might write a better test program to ensure it behaves as expected. All I did last night was write some bits then read and print them.

e:I was concerned about code portability being an issue using bitfields, until my dumb brain realised that this code is tied to the architecture anyway because it's driver code for ARM specific hardware on an ARM specific OS.
I think unaligned registers are okay as long as they are byte sized registers? It's not an unaligned access if the alignment is at least the field size. If you had a 4 byte register that wasn't 4-byte aligned it'd be an issue, but for a 1-byte register I'd think it was fine. (It's possible that this is a problem anyway for hardware registers in particular and I don't ever remember using a non-4 byte aligned register in my ARM days.) Your compiler will happily output structs with chars on one-byte boundaries and shorts on 2-byte boundaries even without packing.

pseudorandom name
May 6, 2007

What do you mean ARM can't do byte accesses, is that only for MMIO?

Spatial
Nov 15, 2007

It is fully possible to support byte writes in a peripheral it usually just isn't done. It depends entirely on the peripheral and the bus to it. Often the write interface with the MCU is the equivalent of this psuedocode:
code:
busaccess( iswrite, address, data ) {
    if (iswrite) {
        switch( address >> 2 ) {
            case 0: rega = data;
            case 1: regb = data;
            case 2: regc = data;
            ...
            }
    }
}
Under this scheme byte accesses don't make sense. In contrast SRAM hardware has separate byte lane controls which determine which bytes in the word are changed on write.

General_Failure:
For a large unaligned buffer of indeterminate length you will probably have a much easier time if you do the work in an array and then copy it into the peripheral memory when you're finished.

xgalaxy
Jan 27, 2004
i write code

It's like the committee has a fetish for the C# / Java standard libraries.

Xerophyte
Mar 17, 2008

This space intentionally left blank

I guess "why not Boost.EquivalentLibrary" is going to be a required section in every standard library proposal from here until the end of time.

Boost is good, don't get me wrong, but there are even more reasons you might not want to use Boost.Thing than there are for not using std::thing, and they often have very little to do with the quality of the libraries and more to do with the qualities of C++. Also, I hear there is a special place in hell for library developers who make boost classes part of their public API so the entire drat thing bleeds into any linking application (looking at you, Pixar...).

General_Failure
Apr 17, 2005

pseudorandom name posted:

What do you mean ARM can't do byte accesses, is that only for MMIO?

It just can't. It has LDRB and STRB which are pretty worthless because they only deal with the LSB.
It's not a whole lot different from going
code:
LDR a2, =SOMEADDRESS
LDR a1, [a2, #0]
AND a1, a1, #&FF
except it skips the AND. Yes, I know I could have thrown the barrel shifter in there to rotate before masking, and yes I know that's the way to affect reads and read / modify / write on bytes. It's just in practice needing to do shifts or rotates and bitwise operations just to work with a register is kind of clumsy. Extra bit fiddling is required of course to get a specific byte from the word. But this is the C thread.

ARM (besides very old versions) can only do word aligned accesses. It's up to the programmer or the language to extrapolate the data from there.
Thankfully it's only the DWC HDMI component which has packed byte sized registers. Fun fact: By default the Sunxi SOCs have the address registers obfuscated and read disabled for the DWC HDMI component. Thankfully someone worked out how to descramble and write enable that IO range. Apparently the vendor released BSP linux kernel uses it still scrambled for some reason.

Spatial posted:


General_Failure:
For a large unaligned buffer of indeterminate length you will probably have a much easier time if you do the work in an array and then copy it into the peripheral memory when you're finished.
I agree. Although in many cases I don't see why I couldn't just use the pointer as an array in situ.

On the OS side of things, like where I mentioned where I started writing a struct and realised that it can have a variable amount of data on the end, that's the result of a function which I need to populate being passed a pointer to a data structure by the OS. Working with it as an array would be by far the easiest method.

pseudorandom name
May 6, 2007

I don't know ARM mnemonics or assembly syntax so that's all meaningless to me, so I'll rephrase: are you saying that ARM pulled a DEC Alpha and it isn't possible to implement the C programming language or multithreading on ARM processors?

hackbunny
Jul 22, 2007

I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

pseudorandom name posted:

it isn't possible to implement the C programming language or multithreading

That seems an exaggeration, to say the least. The C programming language was implemented on far, far more exotic architectures, and two of the most notable operating systems ported to the Alpha (VMS and Windows NT) had a thread-oriented scheduler at a time when process-oriented schedulers were the norm

Spatial
Nov 15, 2007

GF is being misleadingly anal, of course they can do byte accesses. But there's still 32 data wires on the bus. So the simplest way to physically implement it - and therefore cheapest, fastest, smallest way - is to mask off the unwanted high bits from the data the MCU gets from the bus.

In the case of a memory mapped hardware peripheral that doesn't deal with bytes, byte addressing doesn't make any sense under this scheme. It won't even have the lowest 2 bits of the address physically connected to it.

But this is a limitation of the peripheral not the ARM core. The peripheral designer can support byte accesses if they want, but then they'd have to support a more advanced bus standard, it can't pass through certain bus bridges, it uses more power, it's slower, and it makes the device physically larger and more expensive. Hardware designers are absolutely allergic to all of these things which is why it's rarely done.

General_Failure
Apr 17, 2005

Spatial posted:

GF is being misleadingly anal
Am I? It's not intentional.

quote:


, of course they can do byte accesses. But there's still 32 data wires on the bus. So the simplest way to physically implement it - and therefore cheapest, fastest, smallest way - is to mask off the unwanted high bits from the data the MCU gets from the bus.
I totally agree with that.

Random copypaste from the i.MX6 manual. Page 1551. These are few registers from the section which is also relevant to the hardware I'm using.
I trimmed a few fields out for readability. First field is absolute address. Last field is register width.
I haaven't written anything meaningful for this section of the hardware yet, so if I am misinterpreting it, please tell me.
code:
12_1007 Frame Composer Input Video VBlank Pixels Register
(HDMI_FC_INVBLANK) 8
12_1008 Frame Composer Input Video HSync Front Porch Register 0
(HDMI_FC_HSYNCINDELAY0) 8
12_1009 Frame Composer Input Video HSync Front Porch Register 1
(HDMI_FC_HSYNCINDELAY1) 8
12_100A Frame Composer Input Video HSync Width Register 0
(HDMI_FC_HSYNCINWIDTH0) 8
12_100B Frame Composer Input Video HSync Width Register 1
(HDMI_FC_HSYNCINWIDTH1) 8
12_100C Frame Composer Input Video VSync Front Porch Register
(HDMI_FC_VSYNCINDELAY) 8
1609
12_100D Frame Composer Input Video VSync Width Register
(HDMI_FC_VSYNCINWIDTH) 8
1609
12_100E Frame Composer Input Video Refresh Rate Register 0
(HDMI_FC_INFREQ0) 8
1610
12_100F Frame Composer Input Video Refresh Rate Register 1
(HDMI_FC_INFREQ1) 8
1610
12_1010 Frame Composer Input Video Refresh Rate Register 2
(HDMI_FC_INFREQ2) 8

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
The C and C++ memory models make certain guarantees which amount to requiring stores to even a non-atomic object to only write to that specific object (unless the object is a bit-field). So an architecture which couldn't express an eight-bit store without doing a non-atomic read-modify-write sequence would be unable to implement the C specification with an eight-bit char. But it's hard to imagine why an architecture wouldn't be able to express that unless it literally didn't have an eight-bit store instruction. I'm not a processor architect, so my understanding here gets a little sketchy (and even if it was accurate once, it might be badly out of date), but I believe it's very common for this all to be managed in units of full cache lines: at most one core can own the right to have unpublished stores to a cache line, so any core which wants to perform a store has to wait for the current owner to publish its stores, which means that the owner can do arbitrarily-complex transformations with perfect atomicity as long as they're internal to that one cache line. Claiming ownership of a whole cache line and then changing only one byte of it is not fundamentally different from claiming ownership and then changing only four or eight bytes of it; very few processors have instructions that could set a whole cache line at once anyway.

Now what happens when you're talking about I/O devices, that I don't really know.

rjmccall fucked around with this message at 04:20 on Mar 10, 2019

pseudorandom name
May 6, 2007

OK, so the answer to my original question is "Yes, of course ARM can do byte accesses, the restrictions are on MMIO."

General_Failure
Apr 17, 2005

rjmccall posted:


Now what happens when you're talking about I/O devices, that I don't really know.
AFAIK whether it's RAM, IO or whatever the smallest atomic unit is 32 bits wide in aarch32/ARMv7. The LDRB / STRB mnemonics are of limited use because they only act on the least significant byte. TBQH I don't know the internal process for STRB. I think I've only used them a couple of times when doing the UART driver, and it was really just for novelty.

If you want to perform operations on bits / bytes you have have to use bitwise ops and shifts / rotates. You know, now I have to look under the hood at how C is handling the bitfields. This is interesting. I'm not trying to argue with anyone. I just came here initially because I want to write the driver in C to avoid what is to most a cryptic wall of assembly, but dealing with bits in registers, and blocks of registers with an odd layout was something that I'm struggling with.

Spatial
Nov 15, 2007

The ARM core and bus support byte granularity, it's just that most peripherals don't.

It's done with byte strobes. These are write enable signals that tell the device being addressed which bytes in the data should be written. On a 32-bit bus there are four of these - for STRB only the least significant is set. For STRH two are set, and for STR all four are set.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

This is significantly less insane than the 2d graphics proposal. The API at least resembles something you could write a real program with and it's not as overly-simplified as I expected.

Falcorum
Oct 21, 2010

Call me when they propose a physics engine instead.

General_Failure
Apr 17, 2005

Spatial posted:

The ARM core and bus support byte granularity, it's just that most peripherals don't.

It's done with byte strobes. These are write enable signals that tell the device being addressed which bytes in the data should be written. On a 32-bit bus there are four of these - for STRB only the least significant is set. For STRH two are set, and for STR all four are set.

Interesting.
I went digging quickly because I'd never heard of STRH. Makes sense it would exist. Only found it for aarch64 so far but it makes sense it'd exist for 32 bit.
For people reading in hex that's
STRB xx xx xx nn
STRW xx xx nn nn
STR nn nn nn nn

The irritation is when register maps exist like the one I copypasted an excerpt from a few posts back which has the registers arranged like dd bb cc aa. Totally doable but at odds with how registers are normally arranged for ARM architecture. In that case my instinct tells me to test whether C has predictable results when treating a pointer as an array of uint8_t and then do it that way.

Incidentally I stole a few minutes last night to view the disassembly of the test program I did for bitfields with the compiler I'm using. In it's own horrible way it was just using bitmasking and the barrel shifter to handle the fields. No surprises there.

Hopefully I'll stop writing dead end programs and actually get on with the driver now. Thanks for the help.

Spatial
Nov 15, 2007

Good luck. I know the pain all too well! :)

Btw if you're fiddling with bitfields in assembly, check out the instructions BFI, BFC and UBFX/SBFX. They have an index/width based interface rather than having to deal with masking and shifting.

feedmegin
Jul 30, 2008

rjmccall posted:

very few processors have instructions that could set a whole cache line at once anyway.

LDM/STM on Arm and the x86 string instructions come to mind...

feedmegin
Jul 30, 2008

hackbunny posted:

The C programming language was implemented on far, far more exotic architectures,

The C programming language was only standardised in 1989 (and the standard has evolved since), though. That there was a language called 'C' on a Symbolics machine does not mean it's compliant with the C standard as of TYOOL 2019.

Also, are we sure the output of Symbolics C actually directly ran on the bare hardware? In their position I'd probably write a compiler targeting a VM of some sort that presented a nice C-ish interface. It's not like you're using it for writing the kernel on a Lisp Machine.

Adbot
ADBOT LOVES YOU

Spatial
Nov 15, 2007

feedmegin posted:

LDM/STM on Arm and the x86 string instructions come to mind...
Yeah but that's still units of words across multiple cycles.

What about AVX-512, that must be whole cache lines right? 64 bytes per load/store? :mrgw:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply