Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 14

13 March 2025

Continuing my investigation into why the CH32V003 SPI port sometimes just locks up, I have looked at the source code for the two functions that are involved: The SPI_I2S_GetFlagStatus() function and the SPI_I2S_SendData() function. They both do exactly what you would hope that they would do.

What comes up as suspicious is my initialization of the port. Here are the values of the two control registers as well as the status register immediately after being initialized:

SPI_CTLR1 = 0xC154
SPI_CTLR2 = 0x0000
SPI_STATR = 0x0002

This varies from the final value of SPI_CTLR1 that I used in my assembly language version of the diagnostic: 0xC354.

So what’s the exact difference here? Using the SDK function to initialize the SPI port with what was just my best guess at what would be correct, we get the following bits set in CTLR1, versus what I told it to do:

Bit Field       SDK mine    Description
--- --------    --- ----    -----------
0   CHPA        0   0   clock phase (don't care)
1   CPOL        0   0   clock polarity (don't care)
2   MSTR        1   1   coordinator mode
5-3 BR          2   2   bit rate FCLK/8
6   SPE         1   1   SPI enable
7   LSBFIRST    0   0   not set = MSB first
8   SSI         1   1   select pin level
9   SSM         0   1   select 0=hardware, 1=software control
10  RXONLY      0   0   receive only mode (not used)
11  DFF         0   0   0 = 8 bit data
12  CRCNEXT     0   0   send CRC (not used)
13  CRCEN       0   0   enable hardware CRC (not used)
14  BIDIOE      1   1   enable output, transmit only
15  BIDIMODE    1   1   one line bidirectional mode

The only difference I see is that the SSM bit is cleared in the SDK initialization and set in mine. Since we’re not using the select line to select anything, it shouldn’t matter. It does matter that the NSS line is already set high before enabling the peripheral in coordinator mode. Per the RM, Section 14.2.2. Master Mode, p. 162:

"Configure the NSS pin, for example by setting the SSOE bit and letting the hardware set the NSS.  [I]t is also possible to set the SSM bit and set the SSI bit high.  To set the MSTR bit and the SPE bit, you need to make sure that the NSS is already high at this time."

And I know that it indeed does not work if the NSS is not set high before enabling the peripheral. The peripheral simply locks up with a “mode fault” error.

I added some code to print out the status register when a timeout occurs. I immediately see that it is always 0x0020, which means a “mode fault” has occurred. Here’s a list of things that can cause a mode fault on this peripheral:

When the SPI is operating in NSS pin hardware management mode, an external pull-down of the NSS pin occurs
in NSS pin software management mode, the SSI bit is cleared
the SPE bit is cleared, causing the SPI to be shut down
the MSTR bit is cleared and the SPI enters slave mode

Perhaps noise on the otherwise un-initialized NSS line is triggering an intermittent mode fault? Looking back, I see I have, in my ignorance, not specified which NSS handling strategy (hardware vs software) to use when configuring the peripheral.

Setting the SPI_NSS field to ‘SPI_NSS_Soft’ (1) when performing the SPI initialization, we get the following setup profile when the application starts:

SPI_CTLR1 = 0xC354
SPI_CTLR2 = 0x0000
SPI_STATR = 0x0002

So now it matches my bit-wise initialization of the control register. Now it’s time to let it run on ‘The Gauntlet’, as I have named my alternate test setup, overnight, and see what we shall see.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 13

12 March 2025

Big news! New chips have dropped! Today I was able to order some of the very new and hitherto unobtanium CH32V002 and CH32V006 chips. Very similar to our dear friend the CH32V003, but with more SRAM (4 KB) and in the case of the -006, more pins and 8 KB SRAM. I also spied a new development board for the top-o-the-line CH32V317 chip. It’s got the -WC package, a 68 pin QFN. I’ll grab the -VC package (100 pins of LQFP100 goodness) as soon as they will sell me one.

Testing of the CH32V003 with the SPI-driven WS2812B LED continues, with promising results. So far, no faults have been detected. Having more working examples will help me figure out what is going on in the few cases where it consistently fails.

Let’s go back to some of the unfinished business from the last entry. I had been testing the new SVD files as processed by my Python script and converted into C language header files. I had noticed that some of the defined single-bit field bit masks did not correspond to any of the fields in the defined structures, so I wanted to go back and address that.

The simpler of the obvious things I should do is to include some additional commentary adjacent to these values so I at least know where they’re supposed to go. That worked as well as I would expect it to, and did not seem to introduce any difficulties to the process. Here are the single bit mask values for the PWR peripheral, mentioned previously:

// peripheral register single-bit values

#define PWR_PDDS    (1 << 1) // CTLR PDDS
#define PWR_PVDE    (1 << 4) // CTLR PVDE
#define PWR_PVDO    (1 << 2) // CSR PVDO
#define PWR_AWUEN   (1 << 1) // AWUCSR AWUEN

At least those comments will let me know where those values are supposed to be used.

Now if I can get it to emit bit field structure components for the registers that have only a single field, we should be done here (for now).

Let’s get those missing fields defined. We’re going to need every single one of them!

It seems my very vague recollection about ‘duplicate member’ errors was indeed what had been happening. Luckily for us, there are only a total of twelve name-space collisions in the whole system. Here is an example from the very first peripheral definition in the SVD, our old friend PWR:

union {
    uint32_t        AWUWR;  // 0x0C Automatic wake window comparison value register (PWR_AWUWR)
    struct {
        uint32_t    AWUWR:6;    // 0 AWU window value
    };
};

So both the register and the bit field within the register have the exact same name, as far as the SVD is concerned. I can take advantage of a pattern I have noticed in The Naming of Things here, in that register names often end in ‘R’, which I suppose stands for ‘Register’. That means that I can look for this pattern (register name is identical to field name and ends in ‘R’), and drop the trailing ‘R’ from the field name. Let’s see if that improves things at all.

It looks some fiddlin’, as my Python skills are rudimentary at best, but for the one case where the name substitution would be the correct solution to the duplicate member error, it seems to work OK. That leaves seven more cases to deal with.

A single instance where the register name and the single field within it were the same, and yet not ending in ‘R’, was again found in our friend PWR, the Auto-wakeup Crossover Factor Register (PWR_ AWUPSC). Here I elected to append the entirely arbitrary tag “_field” to the end of the field name. When referring to this setting, just read or write directly to the register instead of the field. Simpler that way.

That leaves six duplicates remaining. These fall into three groups. The first are the injected data readout registers for the ADC, all named IDATA (the RM names them ‘JDATA’). The second group are the ‘unique ID’ bit fields all named ‘U_ID’, and the third is just, I feel, bad naming in the flash controller, with a name collision between the similar Extended Key Register (FLASH_MODEKEYR) and BOOT Key Register (FLASH_BOOT_MODEKEYP), who both define their single, 32-bit wide fields with the same name: MODEKEYR.

And remember, this is just for the -003 chips. I’m sure there will be more issues like this when I eventually get to the ‘bigger’ chips with more peripherals.

I can only think of a couple of ways to deal with the ADC registers. I think the easiest and hopefully most reliable method would be to look for the known duplicate field ‘IDATA’, then amend it based on its uniquely named register, like this:

Register name   New field name
-------------   --------------
IDATAR1         IDATA1
IDATAR2         IDATA2
IDATAR3         IDATA3
IDATAR4         IDATA4

Hey, waddaya know? It worked! Can I get away with the same trick on the unique ID fields, as well? It seems that I can. Now we’re down to only one duplicate, at least on the -003: the ‘MODEKEYR’ field that somehow appears in both the extended key register and boot key register for the flash control peripheral. I think the only thing for this case is a special test, just for this particular combination.

And it solves the last remaining problem. However … as I was browsing through the script, I saw plenty of notes to Future Me, screaming and begging for help. There are still many things that need to be added and improved in this process. But for the moment, we can proceed.

Now you understand why I wanted to do the “simpler” assembly language version first. I’m eventually going to burrow even deeper than that, because of How Things Are. But now I should be able to hold up two examples of software, both ostensibly written in C, and pore over their disassembled guts and figure out why one works and the other one doesn’t.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 12

10 March 2025

Here’s an interesting data point that just came to my attention: the on-going experiment with the WCH-official development board for the CH32V003F4P6 device has hung up after 229,552,000,000+ loops. That’s 229 billion with a ‘b’. What caused the hang-up? Unclear.

I was about to shut down the experiment as I thought it was no longer providing any useful data. Well, I was wrong about that. However, it looks more like a testing apparatus failure than the ‘unit under test’ (UUT), as trying to reset the device provided no indication of resumption on the serial console. Unplugging the WCH-LinkE caused the serial terminal to disconnect, as it does, and re-starting the connection showed the recently-reset device counting its millions of loops again. A more self-contained diagnostic set up is certainly worth thinking about at this stage. I’ll just note that here and move on with the other experiments.

I’ll get back to dusting off my C-language framework for these devices now. We’ve got freshly-minted new header files describing all the peripheral registers and all the single-bit-wide settings therein. I’ll take a peek at the -003 support file first and see if everything looks correct.

The first peripheral defined in the SVD file is the PWR power control system. It mostly controls the low power modes, power monitoring facility and ‘automatic wake up’ function. There are only five registers implemented and those are sparsely populated.

There’s going to be a certain amount of “What was I thinking?” involved in this kind of archeology. I see the things that I know for sure should be present, such as the ‘structure of structures’ that I define for each different peripheral, as well as some of the single-bit fields and register addresses.

But a closer look at those single-bit fields has me scratching my head. There seems to be a disconnect between the laid-out format of the structures and the fields. There are values defined that do not appear in the structure at all. Here’s what I’m looking at:

//------------------------------------------------------------------------------
// PWR
//------------------------------------------------------------------------------

typedef volatile struct { // PWR Power control

    union {
        uint32_t        CTLR;   // 0x0 Power control register (PWR_CTRL)
        struct {
            const uint32_t  CTLR_reserved_0:1;  // 0 - reserved
            uint32_t    PDDS:1; // 1 Power Down Deep Sleep
            const uint32_t  CTLR_reserved_2:1;  // 2 - reserved
            const uint32_t  CTLR_reserved_3:1;  // 3 - reserved
            uint32_t    PVDE:1; // 4 Power Voltage Detector Enable
            uint32_t    PLS:3;  // 5 PVD Level Selection
        };
    };
    uint32_t            CSR;    // 0x04 Power control state register (PWR_CSR)
    uint32_t            AWUCSR; // 0x08 Automatic wake-up control state register (PWR_AWUCSR)
    uint32_t            AWUWR;  // 0x0C Automatic wake window comparison value register (PWR_AWUWR)
    uint32_t            AWUPSC; // 0x10 Automatic wake-up prescaler register (PWR_AWUPSC)

} PWR_t;

#define PWR ((PWR_t *) 0x40007000) // peripheral pointer

// peripheral register single-bit values

#define PWR_PDDS    (1 << 1)
#define PWR_PVDE    (1 << 4)
#define PWR_PVDO    (1 << 2)
#define PWR_AWUEN   (1 << 1)

// peripheral register addresses

#define PWR_CTLR (*((volatile uint32_t *) 0x40007000))
#define PWR_CSR (*((volatile uint32_t *) 0x40007004))
#define PWR_AWUCSR (*((volatile uint32_t *) 0x40007008))
#define PWR_AWUWR (*((volatile uint32_t *) 0x4000700c))
#define PWR_AWUPSC (*((volatile uint32_t *) 0x40007010))

I’m specifically talking about the PWR_PVDO and PWR_AWUEN fields. Why are they not broken out as bit fields within the structure for their enclosing registers?

Ah, now I remember. I made the executive decision to not break out fields if there were only one field within a register. This seemed to make sense for registers such as the USART_DATAR register, where the register and the bit field were effectively the same thing.

But in this case, for whatever reason, the two bit fields within the registers do not start at bit position 0. Additionally, there would be no way for me to refer the the field within the register without knowing which register it belonged to – which is something I was hoping to avoid, because it’s possible, if properly encoded.

Were there naming ambiguities between registers and bit fields? That sounds like the kind of problem that this ‘solution’ addresses.

Well, I can always go back into the script and omit the ‘single field omission’ conditional. But before I do that, there’s some more fundamental testing I can do on these new include files. For example, do you even compile, bro? But those single-bit values need comments describing where they belong, for sure.

The simplest test of this is to create a new project that has just one source file in it that includes the device header and has a main() function. As it won’t have any need (yet) for interrupt support or a proper C runtime package, it will only fail because there is no ‘start’ function declared, which is what the linker script says is the ‘entry point’ of the program. But it should compile, if not link properly. Here’s what I think it should look like:

// filename:  F4-test.c
// part of bare-metal C framework test project
// 10 March 2025 - Dale Wheat

#include "ch32v003.h"

void main(void) { // main program function

    while(true) { // an endless loop
    }
}

// F4-test.c [end-of-file]

Whereas, this is all that would actually be required:

#include "ch32v003.h"
void main(void) {}

But you know how I am about these things. Ask a writer to solve a problem and it’s likely that “more writing” will be included near the top of the list.

I borrowed a makefile from another project and made various modifications to it to fit the present needs. And the result is that the header file induces a couple of errors, which is both bad news as well as good news. It could have been so much worse.

It looks like there’s a couple of typos in the WCH-supplied SVD file. Within the definition of the Programmable Fast Interrupt Controller (PFIC), there is a register called PFIC Interrupt Enable Status Register 1 (PFIC_ISR1). It contains fields to indicate which interrupts are enabled. They are called:

INTENSTA2   IRQ2
INTENSTA3   IRQ3
INTENSTA12  IRQ12
INTENSTA14  IRQ14
INTENSTA16_31   IRQ16-IRQ31

Farther down the list is the PFIC Interrupt Pending Status Register 1 (PFIC_IPR1). It contains a similarly named set of fields to indicate which interrupts are currently pending, but where we should see ‘PENDSTA14’ and ‘PENDSTA31_16’, we see ‘INTENSTA14’ and ‘INTENSTA16_31’ repeated.

As a scribbler of codes myself, whose fingers already know how to both copy and paste all by themselves, I think I see how this might have happened. I will let WCH know about this. They were both very prompt and exceedingly polite when I addressed a potential typo in the reference manual. But I will wait until I have made sure there are no other similar issues to report.

So for the moment, I will correct my local copy of the SVD file in question and re-run the conversion script.

This overcomes the compilation error. I get a warning (not an error) about the missing start symbol:

ld: warning: cannot find entry symbol start; defaulting to 0000000000000000

The default of so many zeros will work quite nicely, I think. That’s exactly where I wanted it to go, anyway. And it produces an ELF file! Here’s the meaningful part of the output listing:

Disassembly of section .text:

00000000 <main>:

#include "ch32v003.h"

void main(void) { // main program function

    while(true) { // an endless loop
   0:   a001                    j   0 <main>
   2:   0000                    unimp

Which is perfect. An endless loop, as expressed in C as “while(true) {}” gets translated, as it should, to RISC-V assembly as “0: j 0” or “jump to address 0”. Since it was such as short jump, relatively speaking, the compiler even used the ‘compressed’ version of the ‘j’ instruction, taking up only 16 bits of program memory. So even though the file is reported to be four (4) bytes long, in truth only the first two are doing anything important. I can even flash it to the chip and check it in the debugger. So we’re certainly on track for Great Things at this point.

Eliminating the “ENTRY(start)” command in the linker script gets rid of the warning. Per the GNU documentation for the linker found at:

https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_24.html

ENTRY is only one of several ways of choosing the entry point. You may indicate it in any of the following ways (shown in descending order of priority: methods higher in the list override methods lower down).

the `-e' entry command-line option;
the ENTRY(symbol) command in a linker control script;
the value of the symbol start, if present;
the address of the first byte of the .text section, if present;
The address 0.

As far as I know at the moment, the entry point for these devices will always be address 0, so we should be good. I prefer to be specific about these things, when I can, and not trust assumptions that might change in the future, as they often do. I could specify the entry point as a command line parameter to the linker, but I would normally rather have it in a document of some kind, such as the linker script. When we’re done setting up this framework, I’ll have decided one way or the other about this issue.

So do we have enough machinery in place to blink an LED? Let’s find out.

First, we have to enable the peripheral clock for the GPIO port where the LED has been installed. I’m using PA1, which is bit position 1 of GPIO port A. The GPIO ports are all on the PB2 bus, so we just set the IOPAEN clock enable bit in the PB2 Peripheral Clock Enable Register (RCC_APB2PCENR). It should only take this much C code to do this:

RCC->IOPAEN = ENABLE;

Since the definition for the GPIO ports map out all the fields, and because all the fields have unique names, we, as lazy human programmers, need not keep up with which register is which, and just generally wave in the general vicinity of the peripheral in question. I remembered it was in the RCC, and that was enough. I also remembered that the field was called IOPAEN, a mnemonic for “input output port A enable”. Additionally, I have taken the liberty of #define’ing the binary values ENABLE (1) and DISABLE (0) in the generated header file. I find that this family of chips largely uses a 1 to turn things on and a 0 to turn things off. This is, sadly, not universally true with other manufacturers. Good job, WCH! There are a few other goodies packed in there as well, which I’ll describe by and by.

Step two on the journey to blinking an LED is to configure the now-clocked GPIOA, or at least the pin we want. Do you remember my ‘cheat sheet’ of GPIO initialization codes? It comes in very handy for this kind of thing. The code I want is for a push-pull output with a maximum output frequency of 2 MHz. Why that exact frequency? It happens to be the slowest one available, the other choices being 10 MHz and 30 MHz. It’s an LED that we are going to be looking at with our human eyes, not a microwave signal being sent to outer space. The code for that is ‘2’. Now we just place that code in the right bit position, which for PA1 would be bit position 1. PA2 would be position 2, etc. Or we could just initialize all eight positions at once, even though we have Scientifically Proven that there are only two bits implemented in this port. The code looks like this:

GPIOA->CFGLR = 0x88888828;

All those 8s represent the setting for all the other bits, which is ‘input with pull-up or pull-down resistors’. This is the setting that uses the least amount of power, which will become more important once we need to put the chip to sleep when it needs to wait for something interesting to happen.

Now another way to do this would be to access the individual MODE and CNF fields for this GPIO pin and assign them their proper values, like this:

GPIOA->MODE1 = GPIO_MODE_OUTPUT_2MHz;
GPIOA->CNF1 = GPIO_CNF_PUSH_PULL;

But this requires some enumerated values that I haven’t bothered to put into the collection just yet, as well as much more code. It looks like it’s just two writes instead of one, as in the previous example, but since the compiler is granting our wish to deal with embedded device registers intelligently by using predefined bit fields, it’s going to involve a read-modify-write cycle on each field, along with all the bit shifting and masking that is required to do that.

Now everything is set up properly and we can just blink that LED all we want to now. Add this code inside the inner-most while() loop:

while(true) { // an endless loop
    GPIOA->ODR1 = ENABLE; // LED on
    GPIOA->ODR1 = DISABLE; // LED off
}

The ‘ODR1’ is the bit field corresponding to the output data register, or OUTDR, or ‘output data register. Setting it to ENABLE is the same as writing a 1 to it, which turns on the LED, as I have wired it up in the ‘active high’ configuration. Similarly, writing the DISABLE value of 0 turns it off.

You might be surprised to find when you run this program that the LED just comes on and stays on. Well, that’s a bit of an optical illusion. It’s actually blinking so fast you can’t see it. Try running it within the debugger and step through each program statement one at a time and you’ll see the expected behavior.

We can add a little loop in between the LED commands to slow it down. How about counting to a million? How long should that take? Here’s what the code would look like:

while(true) { // an endless loop
    GPIOA->ODR1 = ENABLE; // LED on
    for(uint32_t i = 0; i < 1000000; i++); // short delay
    GPIOA->ODR1 = DISABLE; // LED off
    for(uint32_t i = 0; i < 1000000; i++); // short delay
}

I’m seeing almost one second on and almost one second off. Those for() loops each create a new 32 bit unsigned integer variable called ‘i’, set it to zero initially, then increment it until it is no longer less than one million. Pretty quick!

Again, the compiler is doing some heavy lifting for us in the background here. Using the bit fields for the individual pins within the GPIO port has it reading, masking, OR’ing or perhaps AND’ing, as required, then finally writing for each transition. The chip itself has a more elegant way to address this frequently-occurring need.

In addition to the output data register, OUTDR, each of the GPIO ports has both a ‘bit set and reset’ register as well as a ‘bit clear’ register. Writing a 1 to any of the lower 8 bits of the BSHR register will set those bits, and only those bits, to 1. Writing a zero there does nothing, and leaves alone whatever is already there. Handy! The ‘lower byte of the upper half’ (?) of the BSHR register does the opposite: Any 1s written there will ‘reset’ that individual bit, and again any zeros written are ignored.

The BCR or ‘bit clear’ register does the same thing. Writing a 1 to a bit position clears that bit in the OUTDR and leaves the others intact. Why have two registers that do the same thing? You’re asking the wrong person. It’s not a ‘wrong question’; I genuinely don’t know the answer. You’ll find the exact same thing on the STM32 devices, so go figure.

So now we have a blinky example program that takes up all of 100 bytes. This is with the compiler’s ‘optimization’ setting of ‘for debug’, so it could probably go lower.

If we can blink an LED, what keeps us from configuring the SPI port and controlling a WS2812B addressable LED? Not much. Let’s do it.

The system clock was running at ~8 MHz for our very simple LED blinky test program. That’s what you get right after a reset with these chips. The internal HSI is running at ~24 MHz and is being divided by three (3) by the HPRE prescaler.

Here is the code to get it to run at 48 MHz, using the built-in PLL to double the HSI frequency:

RCC->HPRE = RCC_HPRE_1; // disable HCLK prescaler
RCC->PLLSRC = RCC_PLLSRC_HSI; // select HSI as PLL input
RCC->PLLON = ENABLE; // enable PLL
RCC->SW = RCC_SW_PLL; // select PLL as system clock, once it locks

Waiting for the PLL to lock is quite a bit simpler, from a coding standpoint, using this framework:

while(RCC->SWS != RCC_SWS_PLL) {
    // wait for PLL to lock
}

This while() loop just checks the system clock status bits to see when they eventually change over to the PLL, which will happen after the PLL locks. It will wait forever, if necessary, but it’s almost always a microscopically short time. Measure it, if you like. Let me know what you find.

Let’s initialize the SPI peripheral, again. Here’s what we need to get the setup we require for our special application of its unique talents. First, remember to enable GPIOC, which will be hosting our SDO on PC6. We configure it to be ‘output push-pull multiplexed 10 MHz max’:

RCC->IOPCEN = ENABLE; // enable GPIOC peripheral clock
GPIOC->CFGLR = 0x89888888; // PC6/SDO

Next enable the SPI peripheral clock setting bit SPI1EN somewhere, we don’t care where, within the RCC:

RCC->SPI1EN = ENABLE; // enable SPI peripheral clock

Then there’s just a short list of bits to flip in the SPI control register, and away we go:

SPI1->BIDIMODE = ENABLE;
SPI1->BIDIOE = ENABLE;
SPI1->SSM = ENABLE;
SPI1->SSI = ENABLE;
SPI1->BR = SPI_BR_8;
// note:  only now can we set these bits
SPI1->MSTR = ENABLE;
SPI1->SPE = ENABLE;

Again, the framework knows which bits are in which register, so we don’t have to.

One critical thing, among a list of other critical things that a proper C framework should do for us, that is not being handled (yet) is setting up the stack pointer. My previous code just happened to work because previous programs had set the stack pointer to the end of SRAM, 0x20000800, and there it remained, until I deliberately unplugged it and plugged it back in again to see what the stack pointer would be. And it was some random number, not pointing anywhere near the SRAM area at all. The code actually worked up to the point of the little delay loops, mostly because we were not calling any functions. Once the compiler tried to set up the ‘automatic variable’ i within the scope of each for() loop, it was reading and writing to No Where. This is considered a Bad Thing.

So let’s fix that by setting up the stack pointer. This ‘ought’ to be done in the C-runtime (which doesn’t yet exist) along with things like initializing variables and possibly setting up the system clock.

The C language, by itself, has no way to know how to set up the stack pointer on this chip, has no way to directly access any of the registers or CSRs, or anything like that. It’s intended to be ‘platform agnostic’ as far as that is possible. The GNU C Compiler Collection, on the other hand, has some extensions that let these things happen. We’ll use one now to set up the stack pointer.

    __asm__("la sp, 0x20000800"); // initialize stack pointer to end of SRAM

The ‘la’ instruction is actually pseudo-instruction to ‘load address’. Now that’s a hard-wired ‘magic number’, if I ever saw one. And it will change the second we move to a different chip within the family that has a different amount of SRAM. Let’s fix that by referring to a variable that’s ‘calculated’ in the linker script, called, so imaginatively, ‘end_of_RAM’:

__asm__("la sp, end_of_RAM"); // initialize stack pointer to end of SRAM

And that works, too, plus it gives us a clue as to where that value came from and what it means.

Now we can call functions! Well, almost, but we’ll fix that in a second. Let’s just splice in the already-working code from way back there.

void spi_send(uint8_t data) { // send 8 bit data via SPI

    while(SPI1->TXE == DISABLE) {
        // wait for transmit register to be empty before transmitting
    }

    SPI1->DATAR = data; // send data
}

void ws2812b_rgb(uint8_t red, uint8_t green, uint8_t blue) { // send RGB data to the WS2812B LED

    uint8_t i; // iterator

    for(i = 0; i < 8; i++) { // send 8 bits, MSB first
        spi_send(green & 0x80 ? 0x7E : 0x60); // send a one or a zero, depending
        green <<= 1; // shift all the bits in the byte
    }

    for(i = 0; i < 8; i++) { // send 8 bits, MSB first
        spi_send(red & 0x80 ? 0x7E : 0x60); // send a one or a zero, depending
        red <<= 1; // shift all the bits in the byte
    }

    for(i = 0; i < 8; i++) { // send 8 bits, MSB first
        spi_send(blue & 0x80 ? 0x7E : 0x60); // send a one or a zero, depending
        blue <<= 1; // shift all the bits in the byte
    }
}

void ws2812b_reset(void) { // hold WS2812B LED data line low for ~ 50 us

    spi_send(0x00); // send a zero
    for(uint32_t i = 0; i < 500; i++); // hold at least 50 us
}

Now the problem with having more than one function in a program, i.e., main() and only main(), is that now the compiler has to guess which one comes first in the binary image. It had better be main()! Well, it wasn’t. So I put back the ‘ENTRY(start)’ in the linker script and created a new function called start(). I moved the stack pointer initialization code to start(), then added some mumbo-jumbo (they are actually called “function attributes”) to make the compiler understand what I was doing. Here’s what it ended up looking like:

void start(void) __attribute__((naked, noreturn, section(".start")));
void start(void) { // what passes for a C-runtime

    __asm__("la sp, end_of_RAM"); // initialize stack pointer to end of SRAM
    __asm__("j main"); // continue to main()
}

So now the compiler knows that the start() function is ever so special, truly it is. It is what is known as ‘naked’, in that it has no procedural prologue or epilogue automatically added to it and has no built-in ‘return’ function appended to the end. It is also marked as ‘noreturn’, which simply means that it doesn’t return in any normal way, which is most certainly does not. There’s also that bit about it belonging to section “.start”, which is a special section that I invented and described in the linker script. It comes before the “main” part of the program code, so that’s how the linker knows to put that first in the binary image.

I also added a ‘jump’ instruction to the end of start() to tell it to jump to the main() function.

So now the program boots into the start() function, sets up the stack pointer and then jumps to the main function. I added a call to ws2812b_rgb() that sets the blue LED on at its minimum level when it turns on the other LED (which happens to be blue) and then sets all the internal LEDs to black when it turns off the other LED. And it just works.

I didn’t even bother calling the ws2812b_reset() function as there was enough time between LED togglings for it to get the idea.

So ideally I will move all this extraneous “support” scaffolding into a separate file and add that to the project makefile. I have been thinking about calling it ‘system.c’ and giving it its very own header file, ‘system.h’. My previous scheme had a handful of different files and it ended up making it quite difficult to create a completely new project, so I ended up coding up a bit of automation for that as well. If a little software is good, then a lot more is better, right?

To give it a good test overnight, I took out the human-perceivable delay and replaced it with the ws2812b_reset() function, which we need now. It’s back to just under 12,000 transmissions per second. I also left the other LED “blinking”, but at ~6 KHz it’s just a dim blur. I did add a line to the spi_send() function to turn off the LED while it was waiting for the TXE bit to be set. If it hangs there, I’ll see that there’s no dim blue blur, and be able to check the waveform on the oscilloscope to be doubly sure. Let’s see how well it does after a few million (or billion) cycles.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 11

8 March 2025

Testing proves testing works. And the testing of the new bare-metal WS2812B LED driver overnight proves that the new bare-metal WS2812B LED driver works, as well. Running just over 12,000 updates per second, it has been running for approximately 16 hours with no unexplainable hang-ups. That’s around 700 million error-free transmissions, which is a lot. So that’s not the problem.

I will take this opportunity to bring some closure to one of the original project design goals, which was to have a nice little demo program showing the LED changing colors, as well as being able to adjust the apparent brightness of the LED in real time.

I had already connected a small potentiometer on the little solderless breadboard, configured as a voltage divider, and attached it to one of the analog-to-digital converter (ADC) input pins. I would like to use that potentiometer as a dial to adjust the LED brightness.

There are several steps required to get the ADC initialized and ready to read analog values. First, we have to configure the ADC prescaler in the RCC’s Clock Configuration Register 0, RCC_CFGR0. Right now, it’s being set to “HBCLK divided by 2”, which is both the default value after reset and the value that I write to the register during initial setup (by omission). As the “HBCLK” is currently 48 MHz, or will be once the PLL locks, the ADC clock will be 24 MHz, which we are cautioned is the maximum rate. So I’ll leave it like that for now.

I should note here that I am assigning new enumerated values with their ‘absolute’ values within their respective register, and not the numeric value they would have if we were accessing them through a C-style bit-field within a structure. For example, the SW field starts at bit position 0 and is two bits wide. It has four possible values, three of which are assigned. Being at bit position 0 within the register, the absolute and relative values are the same. But the very next field, SWS, starts at bit position 2 and is also two bits wide. The relative values are the same, but the absolute values are shifted left two places:

# RCC_CFGR0/SW values
RCC_SW_HSI      = 0x00000000 # HSI
RCC_SW_HSE      = 0x00000001 # HSE
RCC_SW_PLL      = 0x00000002 # PLL

# RCC_CFGR0/SWS values
RCC_SWS_HSI     = 0x00000000 # HSI
RCC_SWS_HSE     = 0x00000004 # HSE
RCC_SWS_PLL     = 0x00000008 # PLL

So I can just use them as they are and not have to worry about shifting them around to be in the right place when I need them. I typically bit-wise ‘or’ all the values together to come up with a single value to write to the register in question. That way I can set several bits or bit fields at once with a single write.

The include file generator script also produces bit masks for each of the register bit fields, e.g.:

RCC_SW = 0b00000000000000000000000000000011 # SW - System clock Switch, pos=0, width=2
RCC_SWS = 0b00000000000000000000000000001100 # SWS - System Clock Switch Status, pos=2, width=2

These are useful for masking out everything but the bit field contents, when you need just that information out of a register.

Next we have to enable the ADC’s peripheral clock. As a PB2 peripheral, I can just add its enable bit to the setup that’s already being done for the other peripherals:

# enable required peripheral clocks

    li x4, RCC_USART1EN | RCC_SPI1EN | RCC_ADC1EN | RCC_IOPCEN
    sw x4, RCC_APB2PCENR(x3)

Then we should configure the input pin that is to be used as the ADC input. PC2 is analog input 2 (A2). I added a comment to the GPIOC initialization code and changed the configuration word being written to the CFGLR configuration register for GPIOC:

# GPIOC

#   PC2/A2 - analog input 2
#   PC6 - SPI data out

la x3, GPIOC_BASE
li x4, 0x89888088
sw x4, GPIO_CFGLR(x3)

That very magic-looking number is actually easy to understand once you see my ‘cheat sheet’ of all the possible configuration values for these pins:

0   analog input
1   output push-pull 10 MHz max
2   output push-pull 2 MHz max
3   output push-pull 30 MHz max
4   floating input mode (no pull up or down) - default
5   output open drain 10 MHz max
6   output open drain 2 MHz max
7   output open drain 30 MHz max
8   input with pull up or down
9   output push-pull multiplexed 10 MHz max
A   output push-pull multiplexed 2 MHz max
B   output push-pull multiplexed 30 MHz max
C   reserved
D   output open drain multiplexed 10 MHz max
E   output open drain multiplexed 2 MHz max
F   output open drain multiplexed 30 MHz max

Each hexadecimal digit represents one bit of the GPIO port. On other device families that offer 16 or more pins per port, there are additional configuration registers, but they use the same format. The ‘0’ digit that you see in the third-to-last position corresponds to the ‘analog input’ configuration from the table and sets up PC2 as an analog input.

Then we can start to configure the ADC itself. Now the ADC on this little chip is quite versatile and talented. It’s quite similar to the ADCs available on the STM32 devices, if you are familiar with those. It’s way more complex than, for example, the Atmel (now Microchip) AVR devices.

Most of the initialization process will be to tell the ADC peripheral what we don’t want. Like I said, it’s got a lot of features and I’m not even going to scratch the surface of what all it can do. I just want to take a single reading from a single channel every once in a while.

The first step is to power on the ADC module, using the ADON bit in the ADC Control Register 2 (ADC_CTLR2). Then begins a ‘module stabilization time, t(stab), of typically 7 us. We’ll give it 10 us because we’re generous.

As I mentioned previously, it’s more effort to tell the ADC not to do stuff. Next, we have to tell it that we only want one channel to be converted, and that channel is A2.

After that, there is a calibration routine that mostly runs itself. First, we reset the calibration register by writing a 1 into the RSTCAL bit of the ADC_CRLR2 register, then we wait for it to clear itself, signaling that it has completed the reset. Then we do the exact same thing but with the CAL bit, and then the ADC is done calibrating itself.

After that, the ADC is ready to do its thing. Here is the complete initialization code:

# initialize ADC

    la x3, ADC1_BASE
    li x4, ADC1_ADON
    sw x4, ADC1_CTLR2(x3) # module power on

    li a0, 10
    call delay_us # module stabilization time (>7 us)

    li x4, (1 << 20)
    sw x4, ADC1_RSQR1(x3) # total of 1 conversions requested
    li x4, 2
    sw x4, ADC1_RSQR3(x3) # 1st regular conversion channel is A2

    li x4, ADC1_RSTCAL | ADC1_ADON
    sw x4, ADC1_CTLR2(x3) # reset calibration register

1:  lw x4, ADC1_CTLR2(x3)
    andi x4, x4, ADC1_RSTCAL
    bnez x4, 1b # wait for RSTCAL to go back to zero when reset is complete

    li x4, ADC1_CAL | ADC1_ADON
    sw x4, ADC1_CTLR2(x3) # start calibration function

1:  lw x4, ADC1_CTLR2(x3)
    andi x4, x4, ADC1_CAL
    bnez x4, 1b # wait for CAL to go back to zero when calibration is complete

And here is a simple function to start a conversion, wait for it to complete, then return the converted value:

adc_convert: # perform single conversion

    # on entry: none
    # on exit: a0[9..0] conversion result

    # register usage:
    #   x3:  pointer to ADC1_BASE
    #   x4:  read status register

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3
    sw x4, 4(sp) # preserve x4

    la x3, ADC1_BASE
    li x4, ADC1_ADON
    sw x4, ADC1_CTLR2(x3) # start conversion

1:  lw x4, ADC1_STATR(x3)
    andi x4, x4, ADC1_EOC
    beqz x4, 1b # wait for conversion to complete

    lw a0, ADC1_RDATAR(x3) # read conversion result

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    lw x4, 4(sp) # restore x4
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

Bear in mind that this is a 10 bit wide ADC, and we only really want an 8 bit range for the LED brightness. Scaling the value to fit the range just takes a single “shift right logical” instruction. Once we have a scaled value available that represents the position of the “LED brightness” dial, we can use that as the color intensity value that we’re sending to the LED.

So I finally have a blinking, multi-color LED whose brightness I can adjust in real time. Who says dreams don’t come true?

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 10

7 March 2025

Today I find the WCH -F4P6 dev board has clocked over 35 billion loops without hanging up.

The STK system timer is available in all the CH32V devices. On the QingKe V2 devices, such as our -003 test subject, it is a 32 bit counter that can count up or down and can trigger an interrupt when it hits a particular value. This makes it very useful for basic timing tasks as well as providing periodic interrupts or a measure of uptime. The STK on the QingKe V4 devices is 64 bits long.

There’s not a lot needed to initialize the STK as there just aren’t that many options. One choice is whether to use the system clock directly as its clock source, or to divide it by eight. We’re only going to be using it to measure an approximately 50 microsecond pulse, and it doesn’t have to be excruciatingly precise. I’ll use the prescaled clock as the timing source.

Since the only other options are to have it trigger an interrupt or compare the current count to a value, which I’m not needing at the moment, that’s the only configuration bit in the STK_CTLR control register that I will need to set, other than the “STE” system timer enable control bit.

Time to add more enumerated values to my collection in my CH32V003.h header file:

# STK - System Timer

STK_STE     = (1 << 0) # STK enable
STK_STIE    = (1 << 1) # interrupt enable
STK_STCLK   = (1 << 2) # clock source selection
STK_STRE    = (1 << 3) # auto-reload counter enable
STK_SWIE    = (1 << 31) # software interrupt trigger

# STK_STCLK values

STK_STCLK_HCLK_8 = (0 << 2) # clock source is HCLK / 8
STK_STCLK_HCLK   = (1 << 2) # clock source is HCLK

The code to initialize the STK is pretty simple:

# initialize STK - clock = HCLK/8 = 6 MHz

    la x3, STK_BASE
    li x4, STK_STCLK_HCLK_8 | STK_STE
    sw x4, STK_CTLR(x3)

Technically, we can omit the STK_STCLK_HCLK_8 parameter, as it is a zero, but I like to include it to make my intention clearer to Future Me.

The delay_us function just needs to take the requested number of microseconds, as passed into it via function argument register a0, multiply it by six, as there are six STK timer clock cycles per microsecond, then add that time duration to the current time, as represented by the value in the STK_CNTL register.

The function then loops until the current timer count is no longer less than the calculated ‘future time’.

I also added a quick exit in the case of the caller asking for a zero microsecond delay. We’ll still be late getting back, but not as late as if we went ahead and preserved all the registers, etc.

Here is the code for the delay_us function:

delay_us: # delay in microseconds

    # on entry: a0 delay time in microseconds
    # on exit: none

    # register usage:
    #   x3:  pointer to STK_BASE
    #   x4:  calculated end time
    #   x5:  read timer count

    STK_TICKS_PER_MICROSECOND = ((HCLK / 1000000) / 8)

    beqz a0, 9f # exit on 0 microsecond request

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3
    sw x4, 4(sp) # preserve x4
    sw x5, 0(sp) # preserve x5

    la x3, STK_BASE

    # calculate future end time

    slli x4, a0, 1 # x4 = a0 * 2
    slli x5, a0, 2 # x5 = a0 * 4
    add x4, x4, x5 # x4 = x4 + x5
    lw x5, STK_CNTL(x3) # read current timer count
    add x4, x4, x5

1:  lw x5, STK_CNTL(x3) # read system timer count
    blt x5, x4, 1b # loop if x5 < x4, i.e., end time not yet reached

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    lw x4, 4(sp) # restore x4
    lw x5, 0(sp) # restore x5
    addi sp, sp, 16 # restore stack pointer

9:  ret # return from function

And while I am providing a perfectly mathematical solution to the question of how many STK cycles or ‘ticks’ are in a single microsecond, via the STK_TICKS_PER_MICROSECOND symbol (the answer is six here), the QingKe V2 does not support the ‘mul’ (integer multiply) instruction.

If you put an integer multiply instruction in the code, the assembler assembles it, as assemblers do, but the chip throws an exception when it tries to execute it. But why does the assembler allow it to get that far down the chain?

It’s most likely because I just copy/pasted the makefile from another project and it specifically says that the architecture of the chip is “–march=rv32imac_zicsr”, which it most decidedly is not. Changing the “AS_OPTS” variable in the makefile to “–march=rv32ec_zicsr” fixes this, and the assembler throws the very correct error:

src/F4-WS2812B-SPI-asm.S:10: Error: unrecognized opcode `mul a0,a0,a0'

It now also catches my earlier error when I used the non-existent s2 register. These are powerful tools if you will just let them be so.

So there being no integer multiply instruction, it’s not too terribly difficult to multiply two integers together using shifts and adds. In fact, with a constant multiplier such as six, it’s just a matter of shifting the multiplicand to the left, one time using a single bit shift and then again using two bit shifts, then adding those two numbers together.

Now we have a reasonably accurate delay function that does nothing but waste time for a reasonably accurate amount of time. We can use that to send the ~50 us reset signal to the WS2812B LEDs by sending out a 0x00 via the SPI and then just waiting it out. It makes for a pretty simple function:

ws2812b_reset: # send 'reset' signal to WS2812B LED

    # on entry: none
    # on exit: none

    # register usage:
    #   x3:  function arguments

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3

    li a0, 0x00
    call spi_send # set SDO low

    li a0, 50
    call delay_us # ~ 50 us low level

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

This is technically a ‘leaf’ function as it does not ‘branch’ out to any other functions in the performance of its duties. So I could have skipped the ‘preservation’ of the return address register and it would have worked perfectly. But I tend to leave it in as it’s fast and it’s better to have it and not need it than to need it and not have it.

I would really like to come up with a way to streamline the creation of these assembly language functions as they do contain a moderate quantity of boiler-plate code.

If you’ll recall, I had originally built up a hierarchy of function calls to send the right wave forms to the LEDs, but then de-optimized the code to eliminate perceived overhead. Well, that was in the C programming language, and it tends to encourage that sort of algebraic abstraction. At least, it encourages me to do so. Now we’re in the Wild West of bare-metal assembly language and everything comes at a price. So to keep the complexity of each function to a minimum, I’ll reinvent my cascade of function calls here.

The lowest level function sends out an encoded one or a zero. A zero has a shorter high period and a one has a longer high period. We are using the bit patterns 0x60 and 0x7E as zero and one, respectively. Here is the ws2812b_bit function:

ws2812b_bit: # send an encoded zero or one to the WS2812B LED via SPI

    WS2812B_ZERO    = 0x60
    WS2812B_ONE     = 0x7E

    # on entry: a0[0] bit to transmit
    # on exit: a0[7..0] bit pattern sent

    # register usage:
    #   x3:  function arguments

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3

    li x3, WS2812B_ZERO # assume it's a zero
    beqz a0, 1f
    li x3, WS2812B_ONE # well it wasn't

1:  mv a0, x3
    call spi_send

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

I preload the x3 register with a bit pattern for a zero, WS2812B_ZERO or 0x60, assuming that it will be a zero. If it is a zero, it skips the next instruction, which loads the WS2812B_ONE code, or 0x7E. In either case, the contents of x3 are mv’d (moved) over to function argument a0 and the spi_send function is called.

Now that we can write a bit, let’s write a byte. It’s not too terribly difficult, but I think you’re starting to see why I wanted to split this medium-sized problem up into tiny-problem chunks. Tiny problems I can handle. Here’s the ws2812b_byte function:

ws2812b_byte: # send a byte's worth of encoded ones and zeros to the WS2812B LED

    # on entry: a0[7..0] byte to transmit, MSB first
    # on exit: none

    # register usage:
    #   x3:  argument save
    #   x4:  bit counter

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3
    sw x4, 4(sp) # preserve x4
    sw a0, 0(sp) # preserve a0

    mv x3, a0 # save byte argument in x3
    li x4, 8 # initialize bit counter

1:  andi a0, x3, 0x80 # test MSB
    snez a0, a0 # convert 0x00/0x80 to 0/1
    call ws2812b_bit # transmit the bit
    slli x3, x3, 1 # shift all bits one place toward MSB
    addi x4, x4, -1 # decrement bit counter
    bnez x4, 1b # loop if needed

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    lw x4, 4(sp) # restore x4
    lw a0, 0(sp) # restore a0
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

I took the extra step of preserving the function argument so that the caller can just load up a single value and call the function three times in a row without having the reload the argument. That’s just to make the debugging easier, as the final form won’t need that.

And here is the final form: the ws2812b_rgb function, wherein the caller sends the three bytes representing the red, green and blue components of the color they want on the LED:

ws2812b_rgb: # send red, green and blue color components to WS2812B LEDs

    # on entry:
    #   a0[0..7] red data
    #   a1[0..7] green data
    #   a2[0..7] blue data
    # on exit: none

    # register usage:
    #   a0:  function argument
    #   x3:  swap register

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3

    mv x3, a0 # save red data
    mv a0, a1 # green data
    call ws2812b_byte # send green data
    mv a0, x3 # return red data
    call ws2812b_byte # send red data
    mv a0, a2
    call ws2812b_byte # send blue data

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

Note the register-swapping shenanigans to be able to state the color data as R, then G, then B, but transmit in GRB order, as the WS2812B thinks proper.

Now to let the little chip send this sequence a bazillion times and see if it gets confused. I’m actually feeling sort of confident that it won’t at this point, but the proper thing to do is to test it.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 9

6 March 2025

Hmmm. Good news? Well, news. The absolutely simplest test I could envisage ran all night and did not hang up. You saw the code. It was only checking the SPI status register to see if the transmit register was empty (and waiting forever for it to be so) and only then shipping out a 0x55 test pattern to the SDO pin on PC6, and repeat, ad infinitum.

This test was conducted on the most symptom-prone variation of the -003 chips I have on hand, the CH32V003F4U6 QFN20. Now, to be fair, this was done using the upgraded and augmented “robust” prototype, and not the original test platform, a very small solderless breadboard. Should I go back and test the original circuit? Of course! My scientific rigor knows no bounds.

Now I should create a similarly minimalistic diagnostic using the WCH SDK. I copied the same SDK initialization function that I was using in the previous code and added this within the while(1) loop in the main() function:

while(SPI_I2S_GetFlagStatus(SPI1, SPI_I2S_FLAG_TXE) == RESET) {
    // wait for SPI transmit register to be empty
}

SPI_I2S_SendData(SPI1, 0x55); // test pattern 01010101

And it was running along quite nicely, until it wasn’t. Just hung up again. Let’s add a little instrumentation to the code and have it spit out some statistics from time to time. I added some variables to track the counting:

uint32_t loops = 0, millions = 0;

And added this code to occasionally print out a report:

loops++; // count the loops

if(loops == 1000000) { // report only every 1,000,000 loops
    millions++; // count those millions
    loops = 0; // reset loop counter
    printf("Loops (millions): %u\r\n", millions);
}

And off it goes! And stops after “only” 51 million loops. Try again, little machine! OK, 382 million loops this time, but hung up solid again. And this is only taking a few minutes each time.

So the immediate conclusion is that there is something in the SDK that is gunking up the SPI state. I’ll have to look at the SPI_I2S_GetFlagStatus() function in more detail and see how that could be acting up. The SPI_I2S_SendData() function literally only writes the passed value to the SPI data register.

Well, the SPI_I2S_GetFlagStatus() function also is only doing the minimum necessary things to check the status of an individual flag, i.e., read the SPI status register and mask out the status bit of interest, returning either ‘SET’ or ‘RESET’ as appropriate.

Not surprisingly, the WCH development board with the CH32V003F4P6 TSSOP20 package runs flawlessly.

At this point, I see two ways forward with this investigation. I can implement the WS2812B-SPI driver in assembly language and see if that works as expected. The other option is to update my C language framework using the new -003 SVD file and fitting some optimizations into the project wizard, which will take more work than the assembly language framework.

But why choose? Can’t I do both?

I’ll attempt the more full-featured LED demo in assembly first, as the base is already in place for that. But before I forget, there’s something very interesting that I noticed and almost failed to note here. Once I cranked up the system clock from 8 MHz to 48 MHz, the chip still worked. Even though I didn’t configure the flash memory controller to use an additional wait state. Even though the RM says the “prefetch buffer” must be enabled, although it never says how. Even though the CH32X035 yurked and horked all over the place when I did the same exact thing. To be fair, the CH32X is a QingKe V4 and the -003 is a V2. I had noticed this behavior before and had decided to err on the side of caution in the future. But now I want to see if it’s an issue or not. If weird and random things happen over and above the current weird and random things that are happening, we’ll know where to look.

I recall hearing that with great power comes great responsibility. I’m so totally feeling that right now as I struggle to come up with a register usage policy that 1) makes sense and 2) I like. I have 15 registers at my disposal and I can use them as I see fit. The GNU assembler adheres to the ABI (application binary interface) that assigns a few of the registers to specific tasks, such as the stack pointer and return address register. There are some other assumptions made in the implementations of the pseudo-instructions that prove quite useful. But I need a system that is simple to remember.

I may have mentioned it before, but here is a RISC-V reference page that I come back to all the time:

https://projectf.io/posts/riscv-jump-function/#functions

Here are the general-purpose registers that I have access to in this RV32EC architecture:

Register Alias Notes
-------- ----- -----
x0       zero  All the values you want, as long as you want a zero
x1       ra    Return address
x2       sp    Stack pointer
x3       gp    Global variable pointer
x4       tp    Thread pointer
x5       t0    Temporary register 0
x6       t1    Temporary register 1
x7       t2    Temporary register 2
x8       s0    Saved register 0
x9       s1    Saved register 1
x10      a0    Function argument 0
x11      a1    Function argument 1
x12      a2    Function argument 2
x13      a3    Function argument 3
x14      a4    Function argument 4
x15      a5    Function argument 5

There is simultaneously so much and so little you can do with the zero register, x0. It’s really handy when you want to write a zero somewhere, or compare something to zero, or subtract something from zero… you get the idea. But writing anything to it doesn’t do anything.

I’ve become accustomed to using the function argument registers, a0-a5, in their intended manner, so I think I will continue doing so on this project, at least.

The return address register, ra, under the current ABI, is generally x1 but can be x5 in some circumstances. The GNU assembler will assume you want to use x1 as the return address register when it decomposes the pseudo instruction ‘call’ into ‘jal’ or ‘jump and link’. In truth, you can ‘jump and link’ using any register you want. But I’m willing to go along with this idea for the time being. The same thing applies to the ‘ret’ (return from function/subroutine) pseudo instruction.

The stack pointer register, x2/sp, is really more up for grabs. Unlike many other microcontroller architectures that I have used in the past, the RISC-V instruction set does not have a predetermined idea of which of the registers ‘should be’ the stack pointer. Use any one you want. Really.

I already ran into the issue of trying to use s2/x18 on this project. It’s not there. Only x0-x15 are available on the RV32EC platform.

So instead of trying to figure out which of the other ‘suggested usage’ aliases for the remaining registers to use, I think I’ll just use x3-x9 for my random register needs. If I need more, I think it would be OK to use some of the functions argument registers as well. Also, if I can’t keep up with x3-x9, I can always rename them to something else more memorable using a macro.

Since I will be needing a reasonably accurate timer function in order to send the ‘reset’ signal to the string of WS2812B LEDs, I’ll need to configure the STK system timer to help with that.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 8

5 March 2025

Now I am going to write a bare-metal diagnostic for this bizarre SPI timeout behavior. This will eliminate the possibility of some odd malfunction in the vendor-supplied SDK. It will also introduce the possibility of some odd malfunction as a result of my own programming.

As I mentioned yesterday, I have had some limited success with writing bare-metal code for these chips, both in the C programming language as well as native RISC-V assembly code. Both of these approaches rely heavily on the vendor-supplied SVD file for these chips. SVD files are ‘system view descriptor’ files containing machine-readable descriptions of the chip’s on-board resources. In the case of the CH32V003 SVD file, this is limited to the peripheral registers and their respective bit-field contents. Alas, no ‘enumerated values’ are included, so I am forced to supply those myself.

Since the release of MounRiver Studio 2, we have an updated version of the SVD for the -003 family of chips. It is contained within the MRS2 app itself, here:

/Applications
/MounRiver Studio 2.app
/Contents
/Resources
/app
/resources
/darwin
/components
/WCH
/SDK
/default
/RISC-V
/CH32V003
/NoneOS/
CH32V003xx.svd

Now that was a deep dive!

The file, 321 KB in length, as distributed by the vendor, is dated 23 December 2024 at 4:59 AM. Within it is a version number of 1.2. The previous version, labelled “1.1” was what I used when I was first starting to get to know these chips.

We’ll need both the register addresses and their bit field information in order to manipulate the chip into doing what we want. What I originally did was use a Python script to examine the SVD file and emit a C header file that created typedef’d structs that encapsulated the needed register information. I then included that header file in a makefile project that used the custom version of GCC supplied by the original MRS toolset. This would allow me to reference individual bits within a given peripheral without having to know which exact register was indicated, like this:

RCC->SW = RCC_SW_HSI; // select HSI as system clock source

Where the “RCC” is the pointer to the base address of the “Reset and Clock Control” peripheral, “SW” is the “system clock source selection” bitfield within the “Clock Configuration Register 0” and “RCC_SW_HSI” was an enumerated value (constant) that I created and #define’d elsewhere. Notice that I didn’t have to keep track of which register it was in. The data structure keeps all that information for me. Now I don’t have to check the reference manual for register addresses or bit positions. I still have to look up specific bitfield values because the manufacturer decided to omit those from the SVD file as defined enumerated values.

I also created a handful of boilerplate source files that coordinated some of the other, lower-level necessities of the project. These are sometimes referred to as the “C runtime support” files.

I eventually started wanting use the same technique with RISC-V assembly language projects. I modified the original Python conversion script to emit an assembly language header file with the peripheral register addresses and a bit mask representing the bitfield assignments.

In both cases, I created a ‘Makefile’ that allowed me to compiler or assemble the project from the command line. I also created my own linker script to link the variously-transmogrified source files to be coalesced into an executable binary image. The makefile also added ‘phony’ targets to perform such actions as erase, program or launch the debugger.

Since the resulting projects had multiple but formulaic folder structures as well as project-unique headers and footers where appropriate, I wrote a console application that would create a new project and populate the required files for me. This only worked for the -003 devices, however. Well, in truth, it “worked” for the simplest of 203 or 307 projects, as well, but not as comprehensively as I would have liked.

Most of this effort was spurred by the fact that version 1 of the MounRiver Studio was only supported on Windows or Linux/x86 hosts. A set of command line tools was provided for macOS, and that’s what I used.

Now I’d like to review the process in light of the MRS2 native support of macOS, which includes Apple Silicon. Let’s start with the RISC-V assembly language version first, as that is a little more straight-forward in that it will need to do less for us than the C language version.

First let’s see what my Python script thinks of the new SVD file. I recall that there were a few issues with the original version 1.1 SVD file, but the details escape me.

Well, it didn’t burp. It created a new file called ‘ch32v003xx.svd.inc’, which is simply the input filename with ‘.inc’ appended to the end. It also generated this report to the console:

svd2inc.py - SVD to RISC-V ASM header file converter
SVD filename: ch32v003xx.svd
Parsing ch32v003xx.svd... done
Filename 'ch32v003xx.svd.inc' already exists.  Overwrite? (Y/N) y
Note:  Overwriting existing file 'ch32v003xx.svd.inc'
Creating 'ch32v003xx.svd.inc'

Peripherals
PWR/PWR, 0x40007000, Power control
RCC/RCC, 0x40021000, Reset and clock control
EXTEN/EXTEN, 0x40023800, Extend configuration
GPIO/GPIOA, 0x40010800, General purpose I/O
GPIO/GPIOC, 0x40011000, derived from GPIOA
GPIO/GPIOD, 0x40011400, derived from GPIOA
AFIO/AFIO, 0x40010000, Alternate function I/O
EXTI/EXTI, 0x40010400, EXTI
DMA1/DMA1, 0x40020000, DMA1 controller
IWDG/IWDG, 0x40003000, Independent watchdog
WWDG/WWDG, 0x40002C00, Window watchdog
TIM/TIM1, 0x40012C00, Advanced timer
TIM/TIM2, 0x40000000, General purpose timer
I2C/I2C1, 0x40005400, Inter integrated circuit
SPI/SPI1, 0x40013000, Serial peripheral interface
USART/USART1, 0x40013800, Universal synchronous asynchronous receiver transmitter
ADC1/ADC1, 0x40012400, Analog to digital converter
DBG/DBG, 0xE000D000, Debug support
ESIG/ESIG, 0x1FFFF7E0, Device electronic signature
FLASH/FLASH, 0x40022000, FLASH
PFIC/PFIC, 0xE000E000, Programmable Fast Interrupt Controller

Interrupts
2: NMI - Non-maskable interrupt
3: HardFault - Exception interrupt
5: Ecall_M - Callback interrupt in machine mode
8: Ecall_U - Callback interrupt in user mode
9: BreakPoint - Breakpoint callback interrupt
12: STK - System timer interrupt
14: SW - Software interrupt
16: WWDG - Window Watchdog interrupt
17: PVD - PVD through EXTI line detection interrupt
18: FLASH - Flash global interrupt
19: RCC - Reset and clock control interrupt
20: EXTI7_0 - EXTI Line[7:0] interrupt
21: AWU - AWU global interrupt
22: DMA1_Channel1 - DMA1 Channel 1 global interrupt
23: DMA1_Channel2 - DMA1 Channel 2 global interrupt
24: DMA1_Channel3 - DMA1 Channel 3 global interrupt
25: DMA1_Channel4 - DMA1 Channel 4 global interrupt
26: DMA1_Channel5 - DMA1 Channel 5 global interrupt
27: DMA1_Channel6 - DMA1 Channel 6 global interrupt
28: DMA1_Channel7 - DMA1 Channel 7 global interrupt
29: ADC - ADC global interrupt
30: I2C1_EV - I2C1 event interrupt
31: I2C1_ER - I2C1 error interrupt
32: USART1 - USART1 global interrupt
33: SPI1 - SPI1 global interrupt
34: TIM1BRK - TIM1 Break interrupt
35: TIM1UP - TIM1 Update interrupt
36: TIM1RG - TIM1 Trigger and Commutation interrupts
37: TIM1CC - TIM1 Capture Compare interrupt
38: TIM2 - TIM2 global interrupt

Creating interrupt vectors
2: NMI_handler
3: HardFault_handler
5: Ecall_M_handler
8: Ecall_U_handler
9: BreakPoint_handler
12: STK_handler
14: SW_handler
Created 7 system vectors
16: WWDG_handler
17: PVD_handler
18: FLASH_handler
19: RCC_handler
20: EXTI7_0_handler
21: AWU_handler
22: DMA1_Channel1_handler
23: DMA1_Channel2_handler
24: DMA1_Channel3_handler
25: DMA1_Channel4_handler
26: DMA1_Channel5_handler
27: DMA1_Channel6_handler
28: DMA1_Channel7_handler
29: ADC_handler
30: I2C1_EV_handler
31: I2C1_ER_handler
32: USART1_handler
33: SPI1_handler
34: TIM1BRK_handler
35: TIM1UP_handler
36: TIM1RG_handler
37: TIM1CC_handler
38: TIM2_handler
Created 23 device vectors
Created 30 vectors in total

So it actually saw that there was already a file with the proposed new filename, and very politely asked permission to over-write it. How courteous!

The new file is 103 KB long. There are still some rough edges in the script, as it tends to emit duplicate definitions for some of the repeated registers, such as the various DMA channel configuration registers. But I think they are “true duplicates” in that they all just redefine the same symbol with the same value, which wastes file space and assembly compute cycles but will still “work”.

Instead of adding newly-minted enumerated values directly to each new source file that needed them, I decided to collect them in a more generic include file for each device, and have that include file subsequently include the generated register definition include file. This file I will creatively and bravely name, “ch32v003.inc”. You can’t stop me!

Here is the as-yet empty generic include file:

# filename:  ch32v003.inc
# register definitions for WCH CH32V003 devices
# 5 March 2025 - Dale Wheat

.ifndef CH32V003_INC # prevent recursive inclusion
CH32V003_INC = 0 # arbitrary but required value

.include "ch32v003xx.svd.inc"

# hand-crafted enumerated values go here

.endif # end of include guard conditional CH32V003_INC

# ch32v003.inc [end-of-file]

Now each new assembly source file that we create need only add this line to become fully (or mostly-enoughly) aware of the inner workings of the -003 family:

.include "ch32v003.inc"

This is assuming that your makefile knows where we’ve stashed this master record of all -003 knowledge.

I looked through the archive for a suitably simple project to use as a template, and I found a likely candidate, “J4-blink-asm”. Buy why is this blinky project source file over 20 K-bytes long?

Ah, it seems that once I got the basic blinky goodness developed, I just kept adding on to it, one little bit at a time. It’s got a lot of stuff in there that I’m not going to immediately need. Here’s what I’m going to start with for this bare-metal diagnostic:

# filename:  F4-WS2812B-SPI-asm.S
# Diagnostic for WS2812B via SPI
# 5 March 2025 - Dale Wheat

.include "CH32V003xx.svd.inc"

.global start:
start:

# F4-WS2812B-SPI-asm.S [end-of-file]

Notice that the filename ends with an upper-case “.S”, telling the assembler to go ahead and expand any macros that it finds contained within the assembler source file.

Now we just need a makefile for the project. I will again borrow this from the J4-blink-asm project. The linker script for the -003 devices is already in place.

I need to update the “CC_PATH” variable in the makefile to reflect the newest version of the GCC compiler suite, as provided by the MRS2 application. I also had to put single quotes around the path because it now contains spaces. How modern!

Additionally, they also changed the names of all the GCC utilities from the ‘risk-none-elf’ triple to ‘riscv-wch-elf’, so that must be updated in the new makefile, as well.

Now since this very simple example needed no interrupt support, I failed to define a symbol to indicate what I wanted. This is a scenario I hadn’t tested before, because it most definitely does not work. I changed the generated include file by hand from:

.if INTERRUPT_VECTOR_TABLE # use vector table interrupts

to:

.ifdef INTERRUPT_VECTOR_TABLE # use vector table interrupts

and now I have to go back and update the Python script, re-run it and copy the resulting output to the distribution folder. Now I can successfully assemble my little program into a file that has exactly nothing in it. But that’s OK, because that is, after all, what I told it to do.

So like with any other new framework, we have to blink that LED. I’ll start with my own -F4P6 development board and attach a jumper from PA1 to the built-in green LED. It’s set up to be ‘active high’, so writing a 1 to PA1 should turn it on and a zero would turn it off.

But before we can do that, we have to enable the GPIOA peripheral clock and then configure PA1 as an output. Neither of those things are the way we want them to be when the chip first wakes up.

In RISC-V assembly language, to write to a memory location, you first have to load the address into one of the registers and then the value that you want to write into another register. That is, unless you want to write a zero, and you can just use the “zero register”, x0, which already has a zero in it. But we want to write a 1, so we’ll use something else.

I won’t bore you with yet another blinking LED code example, but here’s a fun snippet for about a 1/2 second delay, if your clock is running at ~8 MHz, as the CH32V003 does if not otherwise configured:

    li a2, 1000000
1:  addi a2, a2, -1 # decrement a2
    bnez a2, 1b

This sample uses register a2, one of the ‘function argument’ registers, to count down from one million. It could be any other register available on the RV32EC platform. Don’t try, like I did, to use registers like s2, as they are not present here and will just have the chip restart unless you have a HardFault handler set up to catch them.

I will share with you some working code that spits out a 12 MHz square wave on PC6. It really won’t help you much without the not-supplied header file, but you’ll see some of what I’ve been talking about:

# filename:  F4-WS2812B-SPI-asm.S
# Diagnostic for WS2812B via SPI
# 5 March 2025 - Dale Wheat

.include "CH32V003.inc"

.global start
start:

# set up system clock for HSI * 2 via PLL = 48 MHz

    la a0, RCC_BASE
    lw a1, RCC_CTLR(a0)
    li a2, RCC_PLLON
    or a1, a1, a2
    sw a1, RCC_CTLR(a0) # enable PLL

    li a1, RCC_SW_PLL
    sw a1, RCC_CFGR0(a0) # select PLL as system clock

# enable required peripheral clocks

    li a1, RCC_SPI1EN | RCC_IOPCEN | RCC_IOPAEN
    sw a1, RCC_APB2PCENR(a0)

# initialize GPIO

    # GPIOA

    #   PA1 - LED, active high

    la a0, GPIOA_BASE
    li a1, 0x88888828
    sw a1, GPIO_CFGLR(a0)

    sw zero, GPIO_OUTDR(a0) # LED off

    # GPIOC

    #   PC6 - SPI data out

    la a0, GPIOC_BASE
    li a1, 0x89888888
    sw a1, GPIO_CFGLR(a0)

# initialize SPI

    la a0, SPI1_BASE
    li a1, SPI_BIDIMODE | SPI_BIDIOE | SPI_SSM | SPI_SSI # 0xC300
    sw a1, SPI_CTLR1(a0)
    ori a1, a1, SPI_SPE | SPI_MSTR # 0xC344 enable SPI1 as coordinator
    sw a1, SPI_CTLR1(a0)

# set up for blinking LED, sending SPI data

    la a0, GPIOA_BASE
    li a1, (1 << 1) # PA1

    la a3, SPI1_BASE
    li a4, 0x55

# endless loop

main:

    sw a1, GPIO_OUTDR(a0) # LED on

    sw zero, GPIO_OUTDR(a0) # LED off

1:  lb a5, SPI_STATR(a3) # read SPI status register
    andi a5, a5, SPI_TXE # check for transmit register empty
    beqz a5, 1b # wait for TXE to be set

    sw a4, SPI_DATAR(a3) # SPI data out

    j main # endlessly looping

# F4-WS2812B-SPI-asm.S [end-of-file]

There’s a trick to setting up the SPI peripheral. You have to configure all the communications parameters first, then and only then enable the ‘MSTR’ and ‘SPE’ bits in the control register. Otherwise, it just doesn’t work.

Now the funny thing is that this code executes flawlessly on both the -F4P6 and -A4P6 packages. I’m going to let it run overnight on the -F4U6 prototype and see if it has managed to hang up at any point. As you can see, there’s no timeout checking or restarting of the peripheral should a timeout occur.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 7

4 March 2025

Today I see that my overnight testing of the -F4P6 packaged CH32V003 chips shows no errors for over 144 million loops on the WCH board and over 91 million loops for my own development board. While I was able to induce some “power glitches” in my own board (enhanced wiggling), the WCH board proves to be more robust.

Part of me thinks that this issue of the SPI locking up has to be a software issue. But how could that be? Am I not running the exact same software across all the various chips?

Well, maybe I am and maybe I’m not. If the factory can program in a different “device identifier” in the Vendor Bytes area of the flash memory, depending on package type, could they also change some other memory contents? There’s a boot loader in there, somewhere, and who knows what else.

A quick search for “0x1FFF” through the code created by the MRS2 new project minion reveals some interesting items. This is the first part of the address for the “System FLASH” section of the memory map.

The device description header file, ch32v00x.h, contains these #define’d values:

#define OB_BASE          ((uint32_t)0x1FFFF800)    /* Flash Option Bytes base address */
#define VENDOR_CFG0_BASE ((uint32_t)0x1FFFF7D4)

We also see the reference to address 0x1FFFF7C4 in the DBGMCU_Get…ID() functions mentioned previously.

A surprising find is the GPIO_IPD_Unused() function in the /Peripheral/src/ch32v00x_gpio.c file. It specifically configures GPIOC and GPIOD’s unused pins as inputs with pull-up resistors, but only for the less-than-twenty-pin packages, the -A4M6 and -J4M6. I can’t find where this function is actually being called in the supplied source code, but that doesn’t mean that it’s not being called from a pre-compiled library.

So what exactly is “VENDOR_CFG0_BASE” used for? It’s only reference by the very next line in the device definition file, like this:

#define CFG0_PLL_TRIM (VENDOR_CFG0_BASE)

This address is referenced by the RCC_SYSCLKConfig() and SetSysClockTo_48MHZ_HSI() functions, which I assume help to trim the HSI when it is being used as the system clock.

So there’s another “magic number” being stored in the flash, as set up by the factory.

But what about those boot loaders? Let’s grab all those and hold them up to the light.

To get the contents of a memory region from inside the chip and into a file for our examination, I will use the ‘wlink’ utility, available from:

https://github.com/ch32-rs/wlink

It’s a Rust application that speaks directly to the WCH-LinkE and similar devices. The command to read memory contents is called ‘dump’, and the specific syntax to capture the boot loader area of the memory is:

wlink dump 0x1FFFF000 1920 --out bootloader_xxx.bin

where bootloader_xxx.bin will be renamed for each of the four samples we seek.

For the -F4P6 version, the wlink utility responded with:

00:04:04 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:04:04 [INFO] Attached chip: CH32V003 [CH32V003F4P6] (ChipID: 0x00300500)
00:04:04 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:04:04 [INFO] 1920 bytes written to file bootloader_CH32V003F4P6.bin

I’m not 100% sure what the time-stamp at the beginning of each line means, but in any case it happened pretty quickly. Interestingly, the utility did not disturb the little chip too much in the performance of its duties, as it kept right along, and didn’t lose count of its statistics.

Now for the CH32V003A4M6 variant:

00:09:26 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:09:26 [INFO] Attached chip: CH32V003 [CH32V003A4M6] (ChipID: 0x00320500)
00:09:26 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:09:26 [INFO] 1920 bytes written to file bootloader_CH32V003A4P6.bin

These two files are identical. Let’s gather more data. The -J4P6 variant produces this message:

00:14:50 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:14:50 [INFO] Attached chip: CH32V003 [CH32V003J4M6] (ChipID: 0x00330500)
00:14:50 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:14:50 [INFO] 1920 bytes written to file bootloader_CH32V003J4P6.bin

It’s also the same file. Only one more candidate to investigate: the original troublemaker, the CH32V003F4U6:

00:18:21 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:18:21 [INFO] Attached chip: CH32V003 [CH32V003F4U6] (ChipID: 0x00310510)
00:18:21 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:18:21 [INFO] 1920 bytes written to file bootloader_CH32V003F4U6.bin

All boot loaders are identical. Or at least all the boot loaders in the chips before me are identical. It’s also nice to have the ChipID recorded for each of these samples, as well.

Now we should look at the “Vendor Bytes” section of flash, which is 64 bytes long and starts at address 0x1FFFF7C0. The wlink command line would look like this:

wlink dump 0x1FFFF7C0 64 --out vendor_bytes_CH32V003F4P6.bin

As expected, the files differ. The ‘diff’ utility confirms this quite tersely, without going into much detail:

Binary files vendor_bytes_CH32V003F4P6.bin and vendor_bytes_CH32V003A4P6.bin differ

Well, yeah. Let’s gather the rest of the ‘Vendor Bytes’ images from the remaining chips.

Here’s what’s in the ‘Vendor Bytes’ section of the -F4P6’s flash memory:

00000000 34 FE 78 DC 00 05 30 00 09 18 2A 13 03 5A 00 00
00000010 FF FF FF FF FF FF FF FF 00 00 00 00 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB B3 A5 49 BC C9 0D
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

We can see the ChipID word at offset 0x04: 0x1FFFF7C4, expressed in “little endian’ fashion as “00 05 30 00”, or 0x00300500 in hexadecimal. I’m guessing that the ‘FF’ data is not being used at this time. That’s the “erased” state of this type of memory.

Now let’s compare that to the -A4P6 data:

00000000 34 FE 78 DC 00 05 32 00 09 18 3F 13 03 5A 00 00
00000010 FF FF FF FF FF FF FF FF 00 00 00 00 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB D8 A8 05 BC AA 10
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

The ChipID is different, of course, being 0x00320500 for the -A4P6 package. There’s also a change at offset 0x0A and several starting at offset 0x29. Those might be the HSI calibration data, but I’m totally speculating here. Of course, there’s a way to find out, but let’s continue with this particular exercise, shall we?

The -J4M6 SOP8 package has this data:

00000000 34 FE 78 DC 00 05 32 00 0A 18 4C 13 03 5A 00 00
00000010 FF FF FF FF FF FF FF FF 00 00 00 00 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB DB 87 59 BC 01 F0
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

This shows the same pattern of similarities and differences.

Lastly, here is the CH32V003F4U6 data:

00000000 34 FE 78 DC 10 05 31 00 09 18 3C 13 03 5A 00 00
00000010 FF FF FF FF 0E 00 00 00 FF FF FF FF 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB EA 1A F0 BC A7 83
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

There’s something different here, starting at offsets 0x14 and 0x20. Would you like to guess? That’s all I can do at the moment.

The RM, p. 169, Table 15-1 “ESIG-related registers list”, does tell us about some of the fields in this range:

Address    Offset Name
---------- ------ -----------------------
0x1FFFF7E0 0x20   Flash capacity register
0x1FFFF7E8 0x28   UID register 1
0x1FFFF7EC 0x2C   UID register 2
0x1FFFF7F0 0x30   UID register 3

It’s interesting to me that in every case, the “unique identification code”, which is specified as 96 bits long, has all ones (FF FF FF FF) as the upper 32 bits, yielding “only” 64 bits of ID.

Another way to test this “same or different software” questions is to write my own bare-metal code for this testing, instead of relying on the WCH-supplied SDK. I have been exploring a couple of alternatives already, one for C language programming and another for RISC-V assembly language use. I’ll describe these in more detail for you tomorrow.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 6

3 March 2025

After building and testing what I consider to be a more robust prototype for our experiments, I was dismayed to find that it misbehaved in precisely the same way. So today we are back to the “gold standard” of the WCH CH32V003 development board. Let’s see how well it does on a longer-term run, say 100,000,000 loops.

Now this brings up an interesting question: Does the code need to change for different packages of the -003 chip? The WCH dev board has a -F4P6 TSSOP20 20 pin package, which is different from the -F4U6 QFN20 package on the dev board I designed. I also have a different dev board of my own design that uses the -F4P6 TSSOP20 package, as well as a -A4M6 SOP16 version. I suppose I should haul those out and see what happens. Unfortunately, the -J4M6 SOP8 of “10 cents!” fame does not bring out the PC6 pin, so it can’t help us in this particular investigation.

Or can it? I’m not actually connecting any WS2812B LEDs in these latest tests. The code fails all by itself. The PC6 pin and associated circuitry is still probably in there, somewhere. Unlike my previous experiment to uncover the hidden pins of GPIOA besides PA1 and PA2, there’s no reason to think the chips in different packages are in any way different themselves. The ‘identifying’ codes are programmed into the flash memory at the factory.

Well, what do you know? The -J4 parts fail just like the other ones: randomly and often enough to be problematic.

One thing I can check in the MRS boiler-plate project code is what, if anything, it does with the “Device” information in the project properties settings, General -> Device.

Which, necessarily, brings up the question of whether any -003 device can determine, through code only, what package it is in? Well, I found the answer to that question first, while looking for the answer to the previous question. Weird, but I’ll take it.

Almost ironically, it’s been giving me this information this whole time. What I misunderstood to be the “unique chip ID” code printed out at the beginning of the example application as “ChipID” was in reality the device revision identifier “003” and package identifier, per this list:

Package         ID
------------    ----------
CH32V003F4P6    0x003005x0
CH32V003F4U6    0x003105x0
CH32V003A4M6    0x003205x0
CH32V003J4M6    0x003305x0

This information is retrieved from memory location 0x1FFFF7C4, which falls in the “Vendor Bytes” area, per RM p. 3.

The code in /Peripheral/src/ch32v00x_dbgmcu.c contains three functions that return some or all of this data:

DBGMCU_GetCHIPID()  returns the entire 32 bits as the "chip identifier"
DBGMCU_GetREVID()   returns the upper 16 bits as the "revision identifier"
DBGMCU_GetDEVID()   returns the lower 16 bits as the "device identifier"

However, this does not match the mapping given in the “ChipID List” comment of the DBGMCU_GetCHIPID() function source code. If correct, it would mean each different package was a different chip revision, and that seems unlikely.

But how am I to pursue these interesting and important questions if I have the whole system running an extended diagnostic? That’s right! Set up an entirely new system! So that only took an unreasonable amount of time, involving putting away the not-playing-with-them-right-now toys and making a nice spot adjacent to my desk to run the extended diagnostic. And almost 2,000,000 loops into it, I’m seeing zero errors or glitches, despite my vigorous wiggling of the cables and other apparatus. Go, go, gadget WCH board! You’re the best!

And immediately upon setting it up, my -A4 dev board exhibits the “behavior”. Well, we’re not here to figure out that particular problem at the moment. I’m only wanting to explore the chip’s internal identifiers and see what I can do with that information.

The console prologue give me this: “ChipID:00320500”. Now that’s based on the DBGMCU_GetCHIPID() function provided by the manufacturer’s SDK. Remember, that returns the entirety of the “chip identifier” information burned into the chip’s flash memory at the factory.

Let’s see what the other two functions actually return. First, the DBGMCU_GetREVID() function, which returns 0x0032. Next, the DBGMCU_GetDEVID() function, which returns 0x0500.

So it looks like bits 16-19 of the identifier word specify the package. In this case, it’s the -A4, just like the code comments indicated it would be. Note that I haven’t got all the other data points from other packages yet, but it’s a good start.

Re-attaching the original problem board, my little breadboard-based exploratory vehicle, we get this: “ChipID:00310510”, which corresponds to the F4U6 package. This is correct. We also get a 0x0510, where the x in the source code comment is a ‘1’ in this instance.

Now when I’m setting up a -J4 in the SOP8 package for testing, I am reminded that I have to remap the USART1 TX and RX pins due to the very limited number of available pins. But do I want a special version of the software just for J4 packages?

Now that I can just ask the chip itself what sort of package it has, I don’t need to. I can just test for the one exception and do the pin swap then.

The WCH-supplied SDK already has provision for remapping the USART pins in the example application. I just added a test in the debug.c code to re-#define the DEBUG variable:

// swap USART1 TX & RX pins if it's a J4 SOP8 package

volatile uint16_t dev_id;
dev_id = DBGMCU_GetREVID();

if(dev_id == 0x0033) {
    #undef DEBUG
    #define DEBUG DEBUG_UART1_Remap2
}

I also commented-out the call to the USARTx_CFG() function, as it goes and overrides the USART settings, in this one case incorrectly.

All this confirm that the -J4 packaged devices return 0x0033 from the DBGMCU_GetREVID() function.

Now I’ve built up yet another -F4P6-based development board, and it reports 0x0030 and 0x0500, as it should.

Additionally, I have just now discovered that my ever-so-clever device-check to remap only -J4 devices does not work at all, or rather it works all the time and declares every chip a -J4. That’s because the compiler sees the “#define” and does it at compile time, not at run time.

Now I’ve been able to cause power glitches on this new board, but no timeout errors yet. Again, I’ll have to leave it running for a while, pretend not to be looking at it and get up and sit down… you know, all the standard and accepted ways of making it misbehave.

Other than in the main() function, where the ChipID is reported at the beginning of the program run, I can’t find any other reference to this function being called anywhere. So at this point it looks like the supplied code does not execute differently if a different package is being used.

This doesn’t help me explain why the CH32V004F4P6 packaged devices work perfectly, but every other variant fails consistently. Your thoughts?

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 5

2 March 2025

The test apparatus doesn’t like it when I get up or sit down in front of it. This adds more evidence to the theory that the problem is an intermittent connection, and random vibration from the environment is causing something to conduct either better or worse than it was. There were over 100,000,000 loops and only 1,355 errors overnight, but there was a run of errors just as I came in to view. Should I take this personally?

So I decidedly wish to rebuild a more substantial test platform, but part of me wants to understand exactly what is going wrong with the present system. One of the many un-followed-up-on trouble-shooting ideas was to make the power supply monitor generate an interrupt, as only occasionally glancing at the status bit in code has not revealed a correlation between the failures and the power status.

Adding an interrupt routine to a MRA2 project is not difficult, as most of the required coding gymnastics have already been performed for us. The PVD interrupt is only a little more involved, as it is routed through the external interrupt controller, EXTI.

Here is the code to enable voltage monitoring:

// initialize power monitoring

RCC_APB1PeriphClockCmd(RCC_APB1Periph_PWR, ENABLE); // enable peripheral clock
//PWR_DeInit(); // reset peripheral - hope it doesn't brick the chip! (it does)
//PWR_PVDLevelConfig(PWR_PVDLevel_2V9); // lowest voltage monitoring
PWR_PVDLevelConfig(PWR_PVDLevel_4V4); // highest voltage monitoring

PWR_PVDCmd(ENABLE); // enable programmable voltage detector
//Delay_Ms(100); // short delay for voltage detector to "warm up"
printf("Power is %s\r\n", PWR_GetFlagStatus(PWR_FLAG_PVDO) == SET ? "*** LOW ***" : "OK");

EXTI_InitTypeDef EXTI_InitStruct = { 0 };
EXTI_StructInit(&EXTI_InitStruct); // set default values
EXTI_InitStruct.EXTI_Line = EXTI_Line8; // PVD is connected to EXTI8
EXTI_InitStruct.EXTI_Mode = EXTI_Mode_Interrupt;
EXTI_InitStruct.EXTI_Trigger = EXTI_Trigger_Rising; // rising edge on PVD means voltage is dropping out of specified range
EXTI_InitStruct.EXTI_LineCmd = ENABLE;
EXTI_Init(&EXTI_InitStruct); // initialize EXTI8/PVD

NVIC_EnableIRQ(PVD_IRQn);

And here is the simple interrupt handler I wrote to catch those pesky power glitches:

void PVD_IRQHandler(void) __attribute__((interrupt("WCH-Interrupt-fast")));
void PVD_IRQHandler(void) { // programmable voltage detector interrupt handler

    // the supply voltage has dropped below 4.4 VDC

    power_glitch++; // count this power glitch

    EXTI_ClearITPendingBit(EXTI_Line8); // clear interrupt pending bit

    printf("*** POWER GLITCH! ***\r\n");

    //while(true); // *** debug *** stop here for now

    while(PWR->AWUCSR & PWR_FLAG_PVDO != 0) {
        // wait for power to return to return to normal values, i.e., > 4.4 VDC
    }
}

I originally added a while(true); loop in the interrupt handler to stop and let me see when a power glitch was detected, and sure enough, I was rewarded very quickly. So the power is dipping down enough to confuse the SPI peripheral but not actually reset the core. This is not as surprising as it sounds, as the core power-up and power-down reset voltage levels are set at 2.5V. We’re losing some voltage, somewhere, for only a moment, but not enough to trigger a full system reset.

Again, to me it seems that all this points to a low-quality connection somewhere in the mix. It’s time to build that improved test fixture.