Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 16

10 February 2025

The little Python script I wrote last night was able to open the ‘occident’ file of Hershey font descriptions and then import them into a list of lines. I then iterated over the list, line by line, extracting the character number, the number of vertices as well as the left- and right-hand extents of each of the characters, then write them to the console.

I added some more analysis to the script to get a better feel for the data. Each line can be a different length, as each character can have as many or as few strokes defined as it needs. The number of vertices should give me a clue to the actual length of the line. Since each vertex, which I’m just remembering includes the character extent pair as the first vertex, is exactly two bytes long, and the line header is fixed at 8 bytes, the formula:

vertices * 2 + 8

gives us the expected length of the line. This is the case except for these character numbers:

                                 Actual  Calculated
Number  Vertices Width           Length  Length
------  -------- -------------   ------  ----------
2,331   3        <-13,20>=[33]   214     14
3,258   4        <-10,11>=[21]   216     16
3,313   28       <-16,16>=[32]   264     64
3,323   43       <-16,17>=[33]   294     94
3,502   10       <-12,12>=[24]   228     28
3,508   12       <-12,12>=[24]   232     32
3,511   15       <-12,12>=[24]   238     38
3,513   7        <-14,14>=[28]   222     22
3,518   8        <-12,12>=[24]   224     24

And I see a pattern. Looking at the original data for character number 2,331, we have this very long line:

2331103EfNSOUQVSVUUVSVQUOSNQNOONPMSMVNYP[S\V\Y[[Y\W]T]P\MZJXIUHRHOIMJKLIOHSHXI]KaMcPeTfYf]e`cba RKLJNIRIXJ\L`NbQdUeYe]d_cba RPOTO ROPUP RNQVQ RNRVR RNSVS ROTUT RPUTU RaLaNcNcLaL RbLbN RaMcM RaVaXcXcVaV RbVbX RaWcW

It very clearly declares that there are 103 vertices, but my conversion resulted in a 3, so I’m obviously not pointing to the right segment of the string when extracting that value, missing out on the hundreds digit, for the very small number of characters that have over 100 vertices.

And that’s what it was. I incorrectly specified the ‘slice’ parameters of the vertex segment of the string. I am not very good at the Pythoning yet, but I am getting better.

So now I have some faith in the internal consistency of the data preserved lo these many years. Now I can move on to actually extracting the coordinate pairs from each string, knowing the exact moment that I should stop.

More Python trial and error has produced a working model that will output the coordinate pairs for each character, along with the ‘pen up’ commands.

Now I need to translate that into a series of simple commands that I can send to the OLED device via the serial link and have them drawn on the screen to visualize the characters.

I needed to install PySerial as a module so that Python can talk to the serial port:

python3 -m pip install pyserial

It installed pyserial-3.5.

The serial port available via the WCH-LinkE is found in the /dev folder as:

/dev/cu.usbmodemC0F98F0645CF2

I’ve got a good start on the Python script. It’s pushing out the coordinates both to the console and the serial port. I re-formatted the coding going out the serial port into ‘move to’ commands and ‘draw to’ commands. ‘Move to’ just updates the coordinates and ‘draw to’ actually draws the vector between the points.

As an intermediate stage, I was totally faking it by just drawing the endpoints of the vectors, and you could tell the overall shape of the character that way. I already had the point() function working, so that was an easy step. Adapting Bresenham’s line algorithm to the code was also staight-forward. It’s a delightful thought experiment and has been around longer than I have.

There are still some edge cases that bring them whole thing to its knees, such as character 907 and its -41 y-coordinate. I had added a +32 offset to the data points for serial transmission as a single byte, but that just didn’t work for our friend number 907. But I’ve seen enough of the characters drawn on the target OLED now to be sure that I want to go ahead and build these into the project.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 15

8 February 2025

It’s time to rename this series of posts, as I haven’t been using any sort of RISC-V assembly language at all in this project lately.

So now on to bigger and bolder fonts.

9 February 2025

So not a lot of work got done on the project yesterday, but I did have some time to think about it. And it occurred to me that bit-mapped fonts are great when they’re small but start to take up a lot of resources, i.e., memory space when they get bigger.

My mental arithmetic last night suggests that the biggest possible font on a 128 x 64 pixel display would be 64 x 64 pixels per character, giving a total of two characters on a single line of text. They would be perhaps a bit too squarish for my taste, so I could slim them down a bit and have 42 x 64 pixel characters, allowing up to 3 characters, but still only a single line of them. As I have defined 96 glyphs in my first font design for this project, I project that it would take 32K of memory space for just this one font. The target chip at the moment has 62K of memory available, so perhaps we’ve come up both a good minimum and maximum size for this display. As a point of comparison, the existing font that I have lovingly named font_5x8 takes up 480 bytes of memory.

Pondering further, a font sized to allow two lines of text would be 32 bits tall, 3 lines would allow up to 21 bits tall and four lines would divide nicely into characters 16 bits tall. It was at this point that it occurred to me that bit-mapped fonts were not the only way to go, especially on a resource-constrained device such as I’d prefer to use.

Another option are stroke or vector fonts. Instead of a predetermined array of ones and zeros are used to map out the appearance of the individual characters, a series of lines and perhaps arcs are described for each glyph.

A famous set of vector fonts was developed around 1967 by Dr. Allen Vincent Hershey. Like myself, he struggled with the age-old question of “but which font should I use?” as well as how to do so in an efficient way. These fonts are now referred to collectively as “Hershey fonts”. They use a relatively compact notation to describe a set of strokes between integer coordinates on a Cartesean plane, resulting in very legible characters.

Now while I smile quietly to myself for my efforts to give the world lower case characters with descenders, Dr. Hershey spent untold hours designing and transcribing characters in as many languages as he could find.

I found a copy of the original data file as part of an archive on:

https://media.unpythonic.net/emergent-files/software/hershey/tex-hershey.zip

Within this archive, a file called, simply, ‘occident’ contains a number of lines (1,610, to be exact), each defining the appearance of a single character. They are numbered from 1 to 3,926, as not all the characters are present in this file.

Now I would like to write a simple-ish program to plot these characters to the OLED module and see what they look like. This ‘program’ will be more of a system that has a portion that runs on my laptop and another that is running on the embedded device.

I’ll start writing the big-end of the system in Python and the little-end in C. The big-end will read in the data file in its entirety and convert the provded encoding into a series of ‘move to’ and ‘draw to’ commands for the OLED. So it turns out I’ll be needing those line generating functions, after all.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 14

7 February 2025

Now for some odd reason the display is not working at all today. Ummm, well, no, I’m wrong. It was working just fine. It was just displaying a screen full of zeros, as was right and proper for it to be doing. I was messing around with the screen initialization values, poking various bit patterns in to see where they showed up. Yesterday, the dots would show up in a random-seeming column. As I had not specifically programmed the column address, that was fine and to be expected. But today, oddly, the column pointer was randomly set to one of the ‘invisible’ columns: 0, 1, 130, 131. The SH1106 supports a 132 x 64 display, but this module has a 128 x 64 OLED attached. The designers decided to put it in the middle of the columns, starting with column 2. Again, fine and something that I was already aware of. But disconcertng when you think things are ‘going great’ and suddenly nothing works anymore.

One good thing about this diversion was that I had the opportunity to measure the screen update time to be ~24 ms, which gives an effective frame rate just over 40 Hz. So that’s not going to be the bottleneck that I thought it might be. I’m really not motivatated at this point to try to up the SCL frequency in hopes of a maximized data rate.

Because of the way the SH1106 wraps around from the end of a page to the beginning of the same page, it truly doesn’t matter where you start writing values, as long as you write 132 of them. If it’s all zeros, you can’t see any difference. If it’s a proper image, then it does matter.

The reason I was tinkering with the initialization values is that I had been experimenting yesterday with it and not being happy with the outcome. I eventually added a separate ‘clear screen’ loop that wrote zeros to all the memory and that did the trick. So instead of initializing the data in the frame buffer declaration as ‘{ 0 }’, which I thought would populate all of the elements with zeros, I just specifiy ‘{ }’, and the compiler treats it as ‘uninitialized’ and writes zeros in there for me.

Having a frame buffer for the display is nice. I no longer have to think about accessing the display’s memory buffer in pages and stacks of pixels. This allows me the freedom to think about designing glyphs in their appropriate sizes, not what is mathematically convenient.

I’d like to be able to use a Cartesian coordinate system to refer to the individual pixels on the display, in furtherance of my graphical ambitions. In one respect, half of the work has already been done for me, as the abscissa, also known as the x coordinate or column, maps directly to the index of an array I set up to represent the frame buffer. The ordinate, or y coordinate or row, has to be broken down into two components: the memory page index and a bitmask.

The frame buffer is built as an array of pages, with each page containing a three byte header and another array of 132 bytes. The three byte header contains the secret language of the SH1106 and allows me to just blast the entire 135 byte payload to the module and have it magically go to the right place within the OLED’s memory map.

Each page is defined by this typedef’d structure:

typedef struct { // data structure for holding display data with OLED header
    uint8_t     control_byte_1;
    uint8_t     page_address_command;
    uint8_t     control_byte_2;
    uint8_t     page_data[SH1106_WIDTH];
} SH1106_PAGE_t;

My frame buffer is just an array of these pages:

SH1106_PAGE_t SH1106_frame_buffer[SH1106_PAGES];

where I have previously #define’d various dimensions as:

// specific SH1106 module parameters are defined here

#define SH1106_WIDTH 132
#define SH1106_HEIGHT 64
#define SH1106_PAGES 8

Assuming we stay in Quadrant I of the Cartesean plane, arguably the best quadrant, with the origin (0,0) in the lower left corner, the x coordinate maps directly to the index of the page_data[] array. That part was easy.

The y coordinate is only a bit more complex. Given the range of 0-63 of possible y values, we can represent that with a 6 bit integer. The upper 3 bits determine the page number, which is the index into the frame buffer array, and the lower 3 bits identify a single bit within what I refer to as a ‘stripe’ in the SH1106 memory. It’s a short, vertical space, one bit wide and 8 bits tall. The lowest bit is the top-most spot within the stripe.

Now if we acted like we didn’t care, we could just take the three upper bits of the y coordinate and call that the page number. That would have the consequence of giving us a plane mirrored about the x axis, as page 0 is at the top and page 7 is at the bottom. We just need to subtract the upper 3 bits from 7 to get the right-side-up, happy Quadrant I orientation that I happen to prefer. So a little more complex, but not much.

So having now spelt this out in people jibber-jabber, it’s time to encode this into a series of mathematical transformations and some hopefully readable source code.

My first function will be the point() function. Technically, a point has no dimension, only a location. Our ‘points’ actually have a size of ‘one’ in both dimensions, but they do have a location that can be specified as offsets from the origin of our Cartesean coordinate system.

The parameters of the point() function should include the x and y coordinates as well as a ‘color’ value. Being a display of modest ambition, this OLED supports the binary options of ‘on’ or ‘off’. We can reporesent that as a one or a zero in the code.

I have taken the liberty of formalizing the available color palette:

typedef enum { // all the colors
    COLOR_OFF = 0,
    COLOR_ON = 1
} COLOR_t;

Now I am making an exective-level decision to have the graphics functions pretend that the display is only 128 x 64 pixels in extent. Perhaps this will save me some time in the future and keep me from looking for ‘invisible’ pixels that are there but hiding just off stage.

I will have to try to remember to update the display after these functions, as they only manipulate the contents of the frame buffer but do not actually communicate with the OLED.

So here is the point() function as it currently stands:

void point(uint8_t x, uint8_t y, COLOR_t color) { // plot a single point of color at (x,y)

    uint8_t page = (SH1106_PAGES - 1) - (y >> 3); // top three bits represent page number, reversed to be in Quadrant I
    uint8_t bit_mask = 1 << (y & 0x07); // bit mask of pixel location within display memory stripe

    x += 2; // move into visible portion of OLED screen

    if(color == COLOR_OFF) { // we'll reset a bit in the memory array
        SH1106_frame_buffer[page].page_data[x] &= ~bit_mask; // clear bit
    } else { // we'll set a bit in the memory array
        SH1106_frame_buffer[page].page_data[x] |= bit_mask; // set bit
    }
}

I realized later that I could just invert the top three bits of the y coordinate instead of subtracting them from ‘one less than the number of pages’. Either way seems equally obtuse.

And it works! Why am I always so surprised when anything does as expected?

Now to see how performant this little manifestation of my algorithm can be. I’ll write a loop that sets and then clears all the pixels, one by one. If it’s visibly slow, I’ll have to think about spending some time optimizing the process. If not, I’m not going to worry about it.

It’s pretty fast. It causes a brief flash on the screen, and then it goes blank again, all pretty quickly. There is a visible ‘tearing’ artifact across the bottom of the screen in this process.

Looking at the oscilloscope, I measure ~18 ms to write ones to the screen, and ~16.4 ms to write zeros. That’s a surprising difference. Given there are 8,192 indiviual pixels to be written, the setting function, including the loop overhead, is taking ~2.2us and the clearing function is taking 2us per pixel.

So it takes less time to set or clear all the pixels in the frame buffer than it does to send them to the display via I2C. Good to know.

Here is where, historically, I go nuts writing a bunch of optimized graphics primitives, such as vertical, horizontal and ‘other’ lines, filled and unfilled rectangles and circles, etc.

But for now I want to pretend to focus on actually finishing this project and resist the urge to write yet another library of functions that may or may not ever get used.

So now we will proceed to fonts or glyphs, as you prefer. The first one is always the most interesting. I’ve already got one that I like and will start there, but it was designed to be small and permit a larger amount of text on the screen at one time. One of the overall goals of this project is to make it at least somewhat visible and legible at a distance, so larger formats will be needed.

This brings me back to the need for a better font design tool. I’ve spent way too much time typing in ones and zeros and squinting at the screen while transcribing hexadecimal numbers. I have serached for a more appropriate tool that is already in existence but have yet to find anything that works within the constraints of this project. I feel yet another tangent coming.

Well, before embarking on the world’s greatest font design tool tangent, I’ll have to be happy with a tiny side quest. I noticed that to accomodate the discrepancy between the SH1106 memory and the physical OLED screen width, I had hard-coded a “+2” to the x coordinate in the point() function. The solution was to add a couple of new fields to the page structure to align with the ‘invisible’ columns on the left and the right side of the screen.

That part was easy. Modifying the dimension of the page_data member to not use what looks like (and totally is) another magic number, I had used a uint16_t as the requisite padding on each side, which is exactly two bytes long, then used the friendly-looking (not really) equation:

SH1106_WIDTH - (2 * sizeof(uint16_t))

as the number of elements. So it should still send out all 132 bytes of the frame buffer, but we don’t have to offset the x coordinate every single time. That saves about 20ns per pixel!

Now that’s fixed, I went back and checked the ‘single pixel at the origin’ test and noticed that sometimes the pixel seemed to travel along the bottom edge of the screen. That’s because nowhere was I setting the column address to 0, or to anything else, either. It was going to be whatever it happened to end up being. After a power on reset, the module is supposed to reset the column address to zero, and I’m sure it does. But I have updated the initialization sequence to specifically set the column address to zero. This is done in two steps, as there is a single byte command to set the lower four bits of the column address and another to set the four upper bits. Here is the new sequence:

uint8_t SH1106_init_sequence[] = {
    SH1106_NO_CONTINUATION | SH1106_COMMAND, // control byte
    SH1106_COMMON_REVERSE, // swap column scanning order
    SH1106_SEGMENT_REVERSE, // swap row scanning order
    SH1106_COLUMN_LOWER, // column address lower 4 bits = 0
    SH1106_COLUMN_UPPER, // column address upper 4 bits = 0
    SH1106_DISPLAY_ON, // command to turn on display
};

Now my little pixel is just where it belongs… or is it? Honestly, it’s pretty hard to see. One way to test this is to draw a single line rectangle around the edge or the screen and make sure all edges are visible.

Which reminds me that I am not checking the input arguments to the point() function. I’ll just do a quick test and silently return on out-of-bound values.

So I added a couple of quick argument checks to the point() function that just return on out-of-bounds values. Another option would be to simply mask off the invalid bits and look like we’ve “wrapped around” after passing the edge of the screen.

So the rectangle test shows that there is still room for improvement in my equations. It’s hard to describe, exactly, but it looks like each page starts writing a little to the right of the previous page, so that the ‘vertical’ lines are distinctly leaning.

One thing is for sure, and that’s that my bit mask formula is exactly backwards. In retrospect, I see it now. The larger the y value, the lower the bit position within the strip should be, not the other way. I replaced:

1 << (y & 0x07)

with:

0x80 >> (y & 0x07)

and the horizontal lines seem to be right on the edge of the screen now.

But each page is still scooched over one pixel to the right after the last page. This could be caused by sending out one too many bytes per page in the update function. As the function uses the reported size of the page structure as the byte count, it occurred to me that the compiler was padding the struct somehow. Adding the modifier ‘__attribute__((packed))’ to the struct declaration fixed the problem. This is not the first time that structure packing issues have created off-by-a-little-bit errors for me, especially in communication protocols.

Now my rectangle looks properly rectangular. Going back, I also check that the origin pixel is very decidedly in the lowest leftest spot. With just the right amount of background light, I can barely see the edge of the OLED grid.

Now I can import my existing, hand-crafted OLED font from another, similar project. The font is contained in a C source code file named ‘font_5x8.c’ from the previously-mentioned C8-SH1106 project for the 203.

Copying the bits out of the font definition array and writing them to the frame buffer works like a charm.

I put that code in a little loop to go through and print all the available characters, and it goes by a bit too quickly to be able to see what it happening. I added a short delay to the loop and it’s quite satisfying to see it working so well. Here is the code:

for(uint8_t glyph = 0x20; glyph < 0x80; glyph++) { // all the characters in the font file

    for(x = 0; x < 5; x++) { // columns
        for(y = 0; y < 8; y++) { // rows

            if(font_5x8[glyph][x] & 0x80 >> y) {
                point(x, y, COLOR_ON); // draw the pixel
            } else {
                point(x, y, COLOR_OFF); // erase the pixel
            }
        }
    }

    SH1106_update(); // let's see what happened
    Delay_Ms(250); // short delay
}

The Delay_Ms() function is provided by the boilerplate example project generated by the MRS2 software when asked to create a new project.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 13

6 February 2025

Today’s first objective is to capture and measure the SCL signal and see how close it gets to the requested 400 KHz that I specified in the I2C initialization function.

After attaching an extension cable in order to tap into the SCL line going to the OLED module, I measure a SCL signal trying so hard to wiggle at 423 KHz, which is almost 5% over what I specified. Again, it’s not a critical value, as I have successfully run these OLED displays at 1 MHz in the recent past.

Debugging the program, I can look at the I2C registers directly and see what has been set up for me. The CTLR2 has a field named FREQ, and it has been set to 48. This is in line with what the RM indicates should be done. The CLKCFGR has a field called CCR, for the clock division factor field, and it is set to 0x04.

The actual timing calculations are shrouded in mystery, at least from the standpoint of trying to understand what the RM says. My experimentation suggests that the FREQ field has zero effect on the SCL frequency, and that the CCR field alone sets the pace. It’s also dependent on whether or not you are using ‘fast mode’ or not, as well as the selected clock duty cycle.

Also worthy of note is that the waveform has a very slow rise time and quite abrupt fall time, as would be expected from an open-drain output with no pull-up resistor to help. I have a second OLED module set up with 1KΩ pull-up resistors installed, which is considered quite stiff in the I2C community. This module’s SCL line shows much sharper rise times. So I think that in the lab it’s OK to “get away with” no pull-up resistors for testing purposes, but any final product design should certainly incorporate them. Surface mount resistors are not expensive.

The first improvement I would like to make on the existing system is to use interrupts to pace the transmission and reception of data over the bus, instead of spin loops that may or may not be monitored for a time-out condition. There are two interrupts available for each I2C periperhal on these chips, one for ‘events’ and the other for ‘errors’. I’ll need to define a state machine that is set up before any communications and serviced by the two interrupt handlers.

I will also need to come up with a suitable API to be able to hand off various payloads to the display. While the OLED controller chip allows for both reading and writing, I am not immediately seeing a strong case for ever reading anything back. So I’m thinking that the majority of transfers will be writing some combination of commands and data to the display.

The first case is the initialization phase. Ideally, the screen memory needs to be either cleared or preset to a boot splash screen, followed by commands to adjust any operating parameters. The controller chip’s built-in power on reset sequence does almost everything we need as far as setting up its internal timing. We only need to flip the ‘on’ switch to see dots. But as I alluded to yesterday, the screen on this module is mounted upside down and backwards. While there is no single “rotate 180°” command available, there are two other commands that will do effectively the same thing. One reverses the column scanning order and the other reverses the row scanning order. So we’ll need to send those two commnds before we turn on the display. There’s also a setting called ‘contrast’ that might more accurately be called ‘brightness’ that defaults to spang in the middle of the range.

Unlike the other popular OLED controller, the SSD1306, the SH1106 does not automatically roll over to the next ‘page’ of memory once it gets to the end of the row. This means that the ‘screen fill’ task must be broken up into eight page fills. Each of these must be preceded with a page address command. So the initialization ‘payload’ begins to take shape:

Address page 7, fill with 132 bytes of some pattern
Address page 6, fill with 132 bytes of some pattern
Address page 5, fill with 132 bytes of some pattern
Address page 4, fill with 132 bytes of some pattern
Address page 3, fill with 132 bytes of some pattern
Address page 2, fill with 132 bytes of some pattern
Address page 1, fill with 132 bytes of some pattern
Address page 0, fill with 132 bytes of some pattern
Reverse column scan
Reverse row scan
Optionally set contrast level
Display on

I fill the pages in ‘reverse’ order so that it ends up addressing page 0, which seems the logical place to start in the next stage. It will save at most one page address command, so this trick might get axed to favor clarity over cleverness.

The SH1106, much like the SSD1306, is a simple matrix display controller and does not offer any sort of built-in text capabilities. We have to supply our own fonts, which translates into “We get to supply our own fonts”.

I had originally used a 8×8 font that was very easy to read, but ultimately went with a 6×8 font that was, to me, much nicer looking. I then spent a lot of time writing what I considered ‘optimized’ routines to place characters on the screen in what seemed a sensible manner. Mostly this had to do with working within the constraints of the memory organization of the controller chip’s display RAM. This resulted in feeling very much blocked in to using either 8×8 or 16×16 fonts.

What I’m thinking about doing now is very different. Instead of writing each character directly to the screen’s memory, I’m going to introduce an intermediate frame buffer within the CH32X035 memory space. It’s only 1,056 bytes if we map every display location, but only 1,024 bytes if we only map the visible 128 columns that are supported by the physical OLED screen of this module. Each byte contians, as you know, 8 bits, and each bit corresponds to a single screen pixel. There are no shades of gray; it’s either on or off.

So my ‘print’ and ‘plot’ functions will actually only write to an internal SRAM-based frame buffer, and when ‘the time is right’, the whole memory will be transferred to the OLED display. This could be aided by DMA and interrupts to help off-load some of this burden from the CPU.

So that’s my plan for completely over-engineering this project and multiplying the amount of effort required to get to the finish line.

A couple of little experiments to try before diving into the big stuff. I noticed in the SDK that the GPIO initialization for the I2C port used two separate calls to the GPIO_Init() function, one for each of the two I2C signals. The library can actually set up as many pins on a single port as you need. You just indicate the pins needing initiailization with a bitmap passed in as the GPIO_Pin structure member. So I was able to combine the two calls into one:

// configure SCL/SDA pins

GPIO_InitTypeDef GPIO_InitStructure = {0};

// GPIO_InitStructure.GPIO_Pin = GPIO_Pin_10;
// GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
// GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
// GPIO_Init( GPIOA, &GPIO_InitStructure );

// GPIO_InitStructure.GPIO_Pin = GPIO_Pin_11;
// GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
// GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
// GPIO_Init( GPIOA, &GPIO_InitStructure );

GPIO_InitStructure.GPIO_Pin = GPIO_Pin_10 | GPIO_Pin_11; // PA10/SCL, PA11/SDA
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
GPIO_Init(GPIOA, &GPIO_InitStructure);

I also tried bumping up the SCL frequency to 1 MHz, via the I2C_ClockSpeed structure member passed to the I2C_Init() function. No dice. I don’t know why yet, but I might find out in the future. Right now it’s chugging along at over 400 KHz, and that should be fine for now. In theory, I should be able to push almost 38 frames per second to the display at this speed.

And now on to the Great Embiggening of the I2C API. First, I want to enable the available interrupts and get a feel for how and when they are triggered and then work around that.

The SDK provides a function to enable or disable the various combinations of available interrupts. There appears to be an additional type of event interrupt for when either the TXE or RXNE status bits are set, indicating space is now available for more of whatever was going on at the time.

Right now I just want to look at the event interrupts, then I will look into the error interrupts and once I get the DMA configured, I’ll have another look at the buffer interrupts.

Note that the I2C_ITConfig() function only enables the interrupts at the device level. It does not enable any interrupts at the system level.

To do that, we use the SDK function NVIC_EnableIRQ(). The argument to pass is the interrupt number, and it took a bit of sleuthing on my part to track it down. There is an enumerated type IRQn_Type in ch32x035.h that contains the values of all the interrupt numbers. The one we want right now is I2C1_EV_IRQn, which has a value of 30. I was able to find the value in the RM, but I much prefer to have a defined value referenced and not a “magic number”.

There is also a SDK function called NVIC_Init() that will let you either enable or disable an interrupt as well as set the Preemption priority and subpriority.

Note that the system-level global interrupts are enabled in the supplied startup_ch32x035.S file.

The SDK also defines labels for all the interrupts. The I2C interrupts are:

I2C1_EV_IRQHandler
I2C1_ER_IRQHandler

So at this point, I need to define a function for this interrupt handler. It also needs to specify that it is an interrupt handler, so it gets the proper signature and whatever else the compiler wants.

The first thing the interrupt handler needs to do is figure out why it was invoked. Going in order things we actually did, the first thing to look for would be the start bit SB in STAR1, bit 0 being set, indicating that a START condition was set.

I have seen the I2C event interrupt being triggered as expected. I added code to examine the status registers and respond accordingly. There are really only three condition of note.

1.  I2C_EVENT_MASTER_MODE_SELECT
2.  I2C_EVENT_MASTER_TRANSMITTER_MODE_SELECTED
3.  I2C_EVENT_MASTER_BYTE_TRANSMITTED

The first happens after a START condition is set to indicate that the device has entered MASTER mode.

The second happens after the device address and direction bit have been successfully transmitted.

The third happens after each byte has been transmitted.

Additionally, and for no obvious reason, one more interrupt occurs after the STOP condition is set, even though the status registers all read zero. I choose to ignore this.

So I replaced the entire SH1106_init() function with a call to the new i2c_write() function, passing the SH1106 device address and both a pointer to and a length of an initialization sequence:

uint8_t SH1106_init_sequence[] = {
    SH1106_NO_CONTINUATION | SH1106_COMMAND, // control byte
    SH1106_COMMON_REVERSE, // swap column scanning order
    SH1106_SEGMENT_REVERSE, // swap row scanning order
    SH1106_DISPLAY_ON, // command to turn on display
};

So now the display should be neither umop episdn upside down nor backwards, as well as on. And it works!

Now I need to dive a little deeper into the SH1106 data sheet and try to understand the ways to send data and commands to the controller chip. I’m still a little fuzzy on how the ‘continuation bit’ works as far as sending larger packages of data and commands to module is supposed to work.

The next communique I would like to send to the module is a ‘page fill’ command. This is composed of a ‘page address’ command, from 0-7, followed by 132 of your favorite numbers.

I added a ‘state’ variable to the I2C API, as it exists now, so that it doesn’t clobber itself. This is possible, as starting the process is quick and the function returns immediately, but the transfer takes a small but non-zero amount of time to complete.

I had a bright idea to break up the page fill routine into sending a ‘preamble’ with the page address command preformatted, then send the data as a separate function call. This doesn’t work, because each call to i2c_write() is a self-contained thing, with its own START and STOP conditions. This does not seem to sit well with the SH1106.

I reformatted the frame buffer to actually have some space between the data rows to fit in the OLED commands, and this seems to be working fine. Right now I’m just zeroing out the memory and it clears the screen. Ultimately, I would like to have a ‘splash’ screen that shows up for a second when the device is first powered on.

So the first of my goals (using interrupts) has been realized. I’m debating the value of pursuing the DMA option at this point. I think I will spend some time trying to get some reasonable looking dots onto the screen, such as text and maybe some geometric graphics.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 12

5 February 2025

I’m not giving up on the CH32X line just yet. I remembered last night that I do, in fact, have a CH32X035C8T6 development board in stock in the lab. This is the LQFP48 package, so no remapping need be done for the I2C lines.

The board is the “official” WCH CH32X035C-R0-1v2. It is largely similar to the other board with a smaller package, except is does have an extra push button mounted, labelled “S1/KEY”. The schematic, however, shows it connected to PA21/RSTN, so perhaps it can be configured as a reset button.

The first thing to do is to attach the WCH-LinkE device programmer. I built a custom cable to connect +5V, ground, SWCLK & SWDIO, TX & RX for USART1 and PA21/RSTN. I also connected LED1 to PA0 using an additional jumper.

Now to see if the default MRS2 application will blink the LED for me. And it does. It also prints a debug message on the serial console, so this verifies most of the wiring on the new programming cable.

Next I want to see if the “reset” button actually acts like a reset button, or if further configuration is required. It works! Well, that’s going to save me a bit of time.

And just to be sure, I’ll run the EVT example HOST_KM, making sure to change the ‘device’ setting to the C8 variant. Good news: it also works. It detected when I plugged in the i8 wireless dongle and responded to keypresses on the i8, as well.

Now on to hooking up the OLED module, which was where things became irksome yesterday. I was able to re-use one of the OLED module test cables from another project, so I didn’t have to build one from scratch.

To test the OLED, I need only send it the “DISPLAY ON” command. To get there, a few things have to happen. First I have to initialize the GPIO pins SCL/PA10 and SDA/PA11 correctly, then enable the I2C1EN peripheral clock, as well as some setup for the I2C peripheral itself.

I have created a new MRS2 project called C8-SH1106 in the CH32X035 folder to get the OLED up and working again. I hope that Future Me does not confuse this “C8-SH1106” with the one I wrote for the 203. I’m leaving most of the supplied software in place.

Initializing the GPIO pins has alerted me to the fact that the GPIO pins of the CH32X035 devices are different from the CH32V parts I have used previously. I had noticed before that only one ‘output speed’ configuration was declared in the EVT examples, that being 50 MHz, which ocrresponds to the fastest of the options for the other devices. I did not look to see if that was an oversight or omission in the EVT code at the time. As I was going to configure the pins as ‘alternate function, open-drain’ outputs, I see that this is not an option here. There is a mode for ‘alternate function, push-pull mode’, with a note, “I2C automatic open drain”. Well, that’s what we’re here to find out, I guess.

Enabling the peripheral clock for the I2C port using the vendor-supplied SDK is easy enough:

RCC_APB1PeriphClockCmd(RCC_APB1Periph_I2C1, ENABLE);

I only had to refer the to RM to find out which bus the I2C peripheral was on before invoking the correct command.

Now ‘sending a one-byte command to the OLED via I2C’ sounds straight-forward, doesn’t it? Even assuming that both the GPIO pins and the I2C peripheral are all properly initialized, the process is far from straight-forward.

It’s obviously true that I2C, as a protocol, does work. And even though I have very specific examples of my own code that successfully works using these OLED modules, it never ceases to amaze me how complex the actual interaction can be when dealing with the bare metal.

Here is the outline of how to ‘send a one-byte command’ to the SH1106 OELD controller chip (after initialization):

Step 1: Wait for the I2C bus to be ‘not busy’.

It’s not complicated at all. There is a single bit in the STAR2 status register called, not enigmatically, ‘BUSY’. If this bit is set, the bus is busy. Wait your turn. If the bit is clear, the bus is not busy. Do as you will.

The vendor-supplied SDK has a function to get the value of a single status flag. You pass in the pointer to the peripheral and a bit mask identifying the flag you want. Since this chip only has one I2C peripheral, I’m pretty sure I’m sending the right value here: I2C1. Other chips have two I2C peripherals. This one has only one. The function call looks like this:

while(I2C_GetFlagStatus(I2C1, I2C_FLAG_BUSY) == SET) {
    // wait for bus to be not busy
}

I tend to split up while() loops across multiple lines as it makes it easier for the debugger to show me where it is, precisely. Additionally, it reminds me that there probably ought to be a timeout coded in there, so that this “forever loop” doesn’t hang the system. It can also just as easily be coded as:

while(I2C_GetFlagStatus(I2C1, I2C_FLAG_BUSY) == SET);

My home-grown code just reads the BUSY status bit directly:

while(I2C1->BUSY == true); // wait for bus to not be busy

Step 2. Generate a START condition on the bus.

Having determined, one way or another, that the I2C bus is ready for some traffic, we begin a transmission by setting the START condition. This creates a special condition on the bus that lets all the attached devices know that something is about to happen. To do this, all that is required is to set the START bit in the CTLR1 control register. The WCH SDK has a function to do this:

I2C_GenerateSTART(I2C1, ENABLE); // generate START condition

It simply sets or clears the START bit in CTLR1 based on the ENABLE or DISABLE parameter passed. My way of doing this is even simpler:

I2C1->START = ENABLE; // set the START condition

Doing this triggers a state change within the I2C peripheral. Before proceeding, you have to wait until the status bits line up in the requisite order. The SDK calls it:

I2C_EVENT_MASTER_MODE_SELECT

My code calls it I2C_STATUS_MASTER_MODE. In either case, it’s simply the combination of the BUSY, MSL and SB flags from the two status registers.

BUSY    STAR2, bit 1:  1 = I2C bus is now officially 'busy’
MSL     STAR2, bit 0:  Master mode (1) vs slave mode (0)
SB      STAR1, bit 0:  1 = start bit has been transmitted on the bus

When we have determined that these values all align, it’s time to move on to the next step.

[Breaking News] I just received an answer to my question about changing the base of the values in the Debug view for the registers. Just hover your cursor over the label (not the value) and a tool tip appears with all of the various formats. Why choose, you ask? Can’t we have them all? Yes! Yes, we can.

This is going to be a big help when I’m trying to identify single bits within registers in the future. It happens more than one might think!

Back to the step-by-step guide to I2C function. We’ve set the START condition on the bus, and we have to wait until this is reflected in the peripheral status bits. We can use the SDK function like this:

// wait for peripheral to enter master mode

while(I2C_CheckEvent(I2C1, I2C_EVENT_MASTER_MODE_SELECT) == NoREADY);

The I2C_CheckEvent() function returns either READY or NoREADY. This just means that the bit pattern of status flags passed in as the second argument matches the current status bit in the peripheral’s status registers. We can proceed to the next step now.

Step 3. Send the device address and direction bit

Every device on the I2C bus has an address. It can be either 7 bits long or 10 bits long. The OLED module we’re using has a 7 bit address. It seems to be fixed at 0x3C on this module. The controller chip itself has an option for using 0x3D as the address, but the signal that controls that is not brought out to any sort of convenient spot on the module itself, so we’re kinda stuck with 0x3C.

The address that we want to communicate with is sent one bit at a time onto the bus, along with another bit that indicates if we’re wanting to write to (0) or read from (1) the device. The 7 address bits are scooched up to the top of the outgoing byte and the direction bit is tacked onto the end. So if you’re actually looking at the bus using a protocol decoder or logic analyzer, you might see 0x78 going out. That’s perfectly correct.

At this point, all we have to do is send the combined device address (shifted left one bit) along with the direction bit out onto the bus by writing to the DATAR register. The SDK has a function for that:

I2C_Send7bitAddress()

But all you need is to shove the shifted address and direction bit out the DATAR register.

// send device address for writing

I2C_SendData(I2C1, (SH1106_I2C_ADDRESS << 1) | I2C_WRITE);

Where I have previously #define’d the following values in the code:

#define I2C_WRITE 0
#define I2C_READ 1

#define SH1106_I2C_ADDRESS 0x3C

This does the needed shifting and combining of bits. My version of the code is predictably simple:

I2C1->DATAR = (SH1106_I2C_ADDRESS << 1) | I2C_WRITE; // send address to write

As before, we now need to wait until the sun and the moon and the status bits at night align in the proper way. The SDK refers to this combination as:

I2C_EVENT_MASTER_TRANSMITTER_MODE_SELECTED

My previous code used:

I2C_STATUS_MASTER_TRANSMITTER_MODE

Either way, you’re looking for the BUSY, MSL, ADDR, TXE and TRA flags

BUSY    STAR2, bit 1:  1 = I2C bus is now officially 'busy'
MSL     STAR2, bit 0:  Master mode (1) vs slave mode (0)
ADDR    STAR1, bit 1:  Address sent and matched
TXE     STAR1, bit 7:  Transmit register is empty
TRA     STAR2, bit 2:  Data transmitted

And once you’ve seen these bits are all set, it’s time to actually start talking to the now-addressed OLED module.

Note that if, for whatever reason, the OLED module is not powered up and on the bus, then you’re going to wait a long time. The ADDR bit in STAR1 is only set when the addressed device ‘acknowledges’ the address as part of the protocol. No OLED, no ACK. Here is a really good place to put a timeout or some other code to handle the very real possibility of the OLED not being connected properly.

The ADDR bit is also a good way to write a ‘bus scanner’ program that loops through all the valid I2C addresses and sees who responds with an ACK and who doesn’t. The whole point of the I2C bus was to be able to connect several devices together and talk back and forth using a minimum number of wires.

So can we send the ‘DISPLAY ON’ command, already? No! We cannot. That’s not how one talks to a SH1106 OLED controlled chip.

First, we have to send a ‘control byte’. It only has two interesting bits in it. One is called the ‘continuation bit’ and the other is the ‘D/-C’ bit. If the D/-C bit is cleared (0), then the next byte is a command for the controller chip. If it is set (1), then it is data to be written to the display memory.

I use these values to indicate which bits are set or not in the control byte:

// SH1106 control byte

#define SH1106_NO_CONTINUATION 0x00
#define SH1106_CONTINUATION 0x80
#define SH1106_COMMAND 0x00
#define SH1106_DATA 0x40

Step 4. Send the SH1106 control byte

Since we only want to send a single command byte, we can populate the contol byte as:

I2C_SendData(I2C1, SH1106_NO_CONTINUATION | SH1106_COMMAND); // send control byte

Now we need to wait for this combination of status flags:

// wait for control byte to be transmitted

while(I2C_CheckEvent(I2C1, I2C_EVENT_MASTER_BYTE_TRANSMITTED) == NoREADY);

This is a combination of TRA, BUSY, MSL, TXE and BTF flags

TRA     STAR2, bit 2:  Data transmitted
BUSY    STAR2, bit 1:  1 = I2C bus is now officially 'busy'
MSL     STAR2, bit 0:  Master mode (1) vs slave mode (0)
TXE     STAR1, bit 7:  Transmit register is empty
BTF     STAR1, bit 2:  End of byte send flag

Step 5. Send the command

Finally, we can send the eight bits we’ve worked so hard to prepare for.

#define SH1106_COMMAND_DISPLAY_ON 0xAF

The code looks pretty familiar by now:

// send command to turn on display

I2C_SendData(I2C1, SH1106_COMMAND_DISPLAY_ON);

// wait for command to be transmitted

while(I2C_CheckEvent(I2C1, I2C_EVENT_MASTER_BYTE_TRANSMITTED) == NoREADY);

Step 6. Set the STOP condition

To finish a transmission (or end a reception), we set the STOP condition on the bus:

I2C_GenerateSTOP(I2C1, ENABLE); // set STOP condition

Since it’s just a single bit in the control register CTLR1, bit 9, we can just set it directly:

I2C1->STOP = ENABLE; // set STOP condition

Compile & run, and we are rewarded with a screen of random garbage. But it’s infitely more interesting than what it previously was.

But I wanted to get this all written out so that Future Me does not spend as much time scratching his head and wondering what was I thinking ??? when he looks at this code again next year.

And yes, that’s all “all” you need to do to get the OLED turned on and showing tiny dots. Never mind that the screen is both backwards and upside down. There are commands to fix that, too.

What’s really interesting to me at the moment, though, is that this is working at all, since there are no pull-up resistors attached to the bus lines. Also, I’m not sure exactly how fast the SCL line is wiggling at the moment. I told it 400 KHz, but have yet to verify that. I have been able to push these displays up to 1 MHz in the past. Tomorrow will be a good time to explore these ideas.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 11

4 February 2025

Setting up the OLED portion of the project should be easy, as I’ve already done this on the 003, 203 and 307 variants within this family of chips. There will be opportunities for improvement of some things, and perhaps a chance to add a proper frame buffer so as not to be so reliant on the page boundaries of the OLED controller chip.

First I have to connect the the target OLED module, a 1.3″ 128×64 module based on the SH1106 controller chip. It’s just power, ground and I2C clock and data.

Now I’ve forgotten what powers the CH32X035 on this development board. The chip can actually run on +5V as well as 3.3V. The “VCC” lines measure 3.6V using my oscilloscope. The supplied schematic is not much help, as it shows VBUS1 (assuming +5V from USB) and going into the linear regulator U2 and coming out as VCC/3V3. The chip itself only has a single power pin, but it is designated as VDD. I see no connection shown from VCC to 3V3 to VDD. I was, however, able to measure 3.54V on one side of C1, a 0.1uF decoupling capacitor next to U1.

So I think I’ll power the OLED from 3.3V for the time being. Next I’ll need to decide which of the, let’s see… one I2C ports to use. Well, that narrows things down considerably. There are some remapping options available, but only one port.

The DS shows the default mapping of SCL = PA10 and SDA = PA11. Unfortunately for me, these pins are only brought out on the largest packages, the LQFP48 and LQFP64. So some sort of remapping is going to have to occur.

The first remapping option brings out SCL on PA13 and SDA on PA14. Also not brought out on the QFN28 I’m looking at right now.

The second remapping option brings out SCL on PC16 and SDA on PC17. Now why does PC17 sound familiar? Yes, it’s the pin that the ‘Download’ button is connected to. Which also means that it’s the USB DP pin, with PC16 being the USB DM pin. So that one’s out, if we want to use USB at some point (foreshadowing: we will).

The third remapping option brings out SCL to PC19 (24) and SDA to PC18 (25). These are certainly brought out on the QFN28 package, but unfortunately, they are the SWCLK & SWDIO signals used to program and debug the device.

The fourth remapping option brings the I2C signals out to the USB pins again. The fifth remapping option brings them out to the SWCLK & SWDIO pins again.

Now in truth we will not need the SWCLK & SWDIO pins to be connected to the device programmer in the field. It’s also possible to provide a ‘window of opportunity’ after each device reset or power cycle where the SWD pins are, indeed, SWD pins, and then get re-programmed to be I2C pins. However, I am very reluctant to go that route as I have had, let’s say, unsatisfactory experiences when re-purposing device programming pins.

Now it’s entirely possible if unusually cruel and abusive (to me) to use a ‘software’ I2C implementation by bit-banging the signals. I don’t think I want to do that today.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 10

3 February 2025

I have sent a brief summary of the RM errata to WCH via their technical support submission page. As they are currently celebrating the Lunar New Year, I don’t expect an immediate response.

Referring to Reference Manual v1.8:

On p. 73, Section 8.3.2.2 External Interrupt Configuration Register 1 (AFIO_EXTICR1), the value for assigning PB pins to the EXTI inputs is incorrect.

Per the RM:

00: xth pin of the PA pin.
10: xth pin of the PB pin.
11: xth pin of the PC pin.
Others: Reserved.

But the correct value for PB is 01, not 10.

I have code to demonstrate this issue if you would like to see it.

The value for assigning PA is correct, but I have not tested PC.

So now on to a more informative HardFault handler, in the hopes that I will never need it. The interesting part of this is the formatted hexadecimal printing routine, usart_puthex. Previously, I had a hierarchy of puthex, puthex2, puthex4 and puthex8 routines, but this one does all that and offers optional ‘0x’ prefixing and a variable length of 1-8 characters, depending on what you need. I allocated a little more space on the stack and used that as a string buffer to place the characters after I converted the last 4 bits of the value to an ASCII hexadecimal digit, then shifted the value to the right by four bits, for as many digits as was requested.

I haven’t tested the usart_puthex function extensively yet, but it seems to do the trick.

So now the HardFault handler should print out a message in the format “HardFault 0x<mcause> @ 0x<mepc> [HALT]”, where mcause and mepc are the values of the CSRs at that time.

Now I just have to induce a HardFault on purpose to test it. I use the following code:

.word 0 # *** debug *** induce illegal instruction trap

In RISC-V, any instruction of all ones or all zeros is considered to be an ‘illegal operation’. My trap works, and prints this on the console:

HardFault 0x00000002 @ 0x00000140 [HALT]

Which is precisely correct. Since the upper-most bit of the cause is zero, the lower 31 bits constitute the exception code, which in this case is ‘illegal instruction’. The address corresponds exactly with the location of the bad code in the program.

The only problem with this solution is that since it relies on the USART to transmit a message, it can only effectively be of use for errors after the USART is initialized. One solution would be to check the USART1EN bit to see if USART1’s peripheral clock has yet to be enabled, and if it has, to then check the UE bit to see if the peripheral has been initialized, then proceed with the messaging. It captures the important CSRs at the beginning of the handler in any case.

Now it should be time to set up the OLED interface and USB controller to get the framework for this project fully underway.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 9

2 February 2025

So I’ve had some time to think about it, and I think I will spend a little more time trying to better understand what exactly is going on with whatever it was that I did yesterday to make the chip reset itself on every other keystroke. It’s a bit ironic, as the whole point of this particular tangent was specifically to reset the device.

I’m going to build in a lot more debugging information into the code to help understand this present weirdness and also to be in a better position in the future to deal with it when it inevitably happens again.

The first thing is to add some code at the very beginning:

# *** debug *** check CSRs for exception information

csrr t0, mcause # cause of exception
csrr t1, mepc # address where violation occurred
csrr t2, mtval # exception value
csrr t3, mstatus # status

Then I did a bunch of other stuff, but it didn’t help. What I eventually found out was that I had neglected to configure the EXTI to use PB0 as the input to EXTI0. The default was PA0. That’s why the system reset itself on the second keystroke, because that’s the one that turned the LED on (active low LED) and toggling PA0 would generate the EXTI interrupt, resetting the system. I was able to confirm this by omitting only the LED toggle function, and all worked as expected.

So I set the appropriate bits in the External Interrupt Configuration Register AFIO_EXTICR1 to indicate that PB0 should be routed to EXTI, and for some reason it still didn’t work. Then I remembered that the AFIO controller has to have its peripheral clock enabled just like everything else. Once I added that to the list of the of PB2 peripheral clocks to enable, then everything works as expected.

Now… where was I? Oh, yes: testing the fake reset button. It still doesn’t work.

This time it is because the RM is wrong. Here are the values it indicates on p. 74:

00: xth pin of the PA pin.
10: xth pin of the PB pin.
11: xth pin of the PC pin.
Others: Reserved.

The default is 00, and this was certainly working when PA0 was toggling the LED. It seemed a bit odd to me, yet certainly within the realm of possibility, that the port numbering would be 0, 2, 3, instead of 0, 1, 2. So I just randomly tried a 01 in there and it works.

So now I have an external reset button attached to my board. It’s a bit twitchy. I might add a debounce timer in there to smooth that out a bit.

As a side note, I already knew this should be possible, because I had already implemented this feature on my re-spin of the tinyCylon using the 003, which also does not make the reset signal available on the smallest package, the -J4. The original tinyCylon is based on the 8 bit Atmel (now Microchip) AVR in an eight pin package. It used the external reset pin as a ‘mode advance’ input. Every time the button was pressed, it would advance the device mode. It was just easier to emulate an external reset pin in software than to re-architect the code.

I’m just a little disappointed that it wasn’t some sort of mystrious HardFault condition that would require Much Deep Thought. I’ve still got a nice HardFault reporting function planned. It is going to feature a much more advanced numeric output routine than I’ve previously used, and will be an excellent opportunity to try out an actual data algorithm (gasp).

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 8

1 February 2025

And the i8 battery is still running today. I’ve forgotten how many days it is since I started the experiment! So I guess the battery life is ‘long enough’ at this point.

Removing the USB dongle from the development board allows me to use the PC17 input and the attached push button just like a standard GPIO pin. I wrote a short program to check the status of the pin whenever a character was received from the USART, then print a 1 or 0 depending on if the button was pushed or not, and it delivers the expected results:

# read the status of the 'Download' push button connected to PC17

la s0, GPIOC_BASE
lw s1, GPIO_INDR(s0) # read input pins
PC17 = (1 << 17)
li s2, PC17
and s1, s1, s2 # isolate PC17 input
snez a0, s1
addi a0, a0, '0' # convert to ASCII
call usart_putc # *** debug *** print status of download push button

Note that I was unable to use the ‘andi’ immediate form of the logical AND instruction, because the constant value assigned to PC17 (1 << 17) was too large. So I loaded it into s2 and used that as the other source register for the instruction. I’m using the snez instruciton again to give me a one or zero result, then adding the constant value of the ASCII character ‘0’, which is 0x30, so that either an ASCII 1 or 0 is typed out to the console, in addition to whatever character is echoed.

Since the over-all goal of the project is to operate as a USB host, I’m thinking that I am just going to have to give up on my dream of using the ‘Download’ button as a reset button. However, I still think that it would still be a useful facility to add a reset button of some sort to this board. I will use PB0 as the alternate reset pin, as it is brought out next to a ground pin on the board headers. This allows me to easily add a pushbuttoon already attached to a two pin header. In a previous life, it was the power button for a PC case.

So I should go back to the GPIO initiailization code and add PB0 as an input with a pullup resistor enabled. In fact, I should go ahead and fill out all the rest of the GPIO initialization, as we’ve already touched all three of the available GPIO ports in some way. This will also include all the ‘extended’ configuration registers:

# configure GPIO

    # GPIOA:  0 = LED output, active low

la s0, GPIOA_BASE
sw zero, GPIO_OUTDR(s0) # clear all outputs
li s1, 0x88888882
sw s1, GPIO_CFGLR(s0)
li s1, 0x88888888
sw s1, GPIO_CFGHR(s0)
li s1, 0x88888888
sw s1, GPIO_CFGXR(s0)

    # GPIOB:  0 = reset button, 9 = MCO, 10 = USART1_TX, 11 = USART1_RX

la s0, GPIOB_BASE
#sw zero, GPIO_OUTDR(s0) # clear all outputs
PB0 = (1 << 0)
li s1, PB0
sw s1, GPIO_OUTDR(s0) # clear outputs, enable PB0 pull-up resistor
li s1, 0x88888888
sw s1, GPIO_CFGLR(s0)
li s1, 0x88888AB8
sw s1, GPIO_CFGHR(s0)
li s1, 0x88888888
sw s1, GPIO_CFGXR(s0)

    # GPIOC:  17 = Download button input

la s0, GPIOC_BASE
sw zero, GPIO_OUTDR(s0) # clear all outputs
li s1, 0x88888888
sw s1, GPIO_CFGLR(s0)
li s1, 0x88888888
sw s1, GPIO_CFGHR(s0)
li s1, 0x88888888
sw s1, GPIO_CFGXR(s0)

I’m also going to shut down the MCO output for now, as it tends to bleed over into all the other signals. This makes the system clock initialization pretty simple: write a zero to RCC_CFGR0:

la s0, RCC_BASE
sw zero, RCC_CFGR0(s0) # 48 MHz, no MCO

Then I modified the code to read and report on the push button status:

# read the status of the 'reset' push button connected to PB0

la s0, GPIOB_BASE
lw s1, GPIO_INDR(s0) # read input pins
andi s1, s1, PB0 # isolate PB0 signal
snez a0, s1
addi a0, a0, '0' # convert to ASCII
call usart_putc # *** debug *** print status of download push button

Since the bitmap for PB0 (1 << 0) or 0x00000001, is ‘short enough’ to fit in the immediate field of the andi instruction, I don’t have to load it into a separate register to apply the logical function. Just how short is ‘short enough’? Let’s look at the specification again:

https://github.com/riscv/riscv-isa-manual/releases/tag/riscv-isa-release-6f69218-2025-01-30

The answer to my question is 12 bits, from the specification section 2.4.1. Integer Register-Immediate Instructions, p. 26. So typically any value between 0 and 4,095 will work. Just realize that it will sign-extend the immediate value if bit 11 is a one.

So the button works as expect at this point (just a digital input; not yet a system reset) and prints a 1 when it is not pressed and a 0 when it is pressed. That means my input pull-up resistor is working as expected and all the other configuration so far is behaving as I wish it to.

So now to connect the push button and the system_reset code. This will be done by the External Interrupt and Event Controller (EXTI), which is part of the PFIC and described in Section 7.4 of the RM starting on p. 32.

The first thing we need to do is configure EXTI0 for being triggered on a falling edge by setting bit 0 of EXTI_FTENR. Then we enable EXTI0 by setting bit 0 in the EXTI_INTENR register. Since the reset switch is on PB0, it is channeled to EXTI0, as would any signal on PA0 or PC0.

In the same way that the first function is the most fun, so is the first interrupt. There’s so much to prepare to get htings going. EXTI inputs 0-7 are grouped into one interrupt, EXTI7_0_interrupt_ID = 20. I’m planning to use the ‘shortcut’ of the vector-table free interrupt mechanism on this device, so we need to put the interrupt ID and the address of the interrupt handler routine in the VTF registers of the PFIC. Then there’s some mumbo-jumbo dealing with CSRs to both configure the chip to use the VTF system as well as enable interrupts on a global basis.

I forgot that you have to also add in the ‘enable’ bit to the handler address when setting up the VTF interrupts.

Now something I’ve done has upset the chip. It starts out OK, but after you type the second character, the system resets. What’s even more strange is that it is setting the ‘illegal instructions’ exception in the mcause CSR. So now I have to figure out what I did and undo it.

Perhaps you recall the short list of possible interrupts I wanted to handle in this application? Well, if not, it’s in the log somewhere, but if you do go back and find it, you’ll see that the very first interrupt in the list is the ‘HardFault’ handler. I have found, from my various tinkering, that when things go too far off the rails, the system usually throws an exception and lets you know what the problem is, if you only know where to look for the clues it’s leaving you. Having an interrupt handler that deals with a HardFault exception is a good way to examine the mcause CSR and print out the interrupt or exception number, as well as the address where it happened. This is not hard to implement, but I’ll need to write a little more code and at this point I think it would be best to do it tomorrow.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 7

31 January 2025

So today I want to revisit the la vs li vs lui confusion that I am experiencing about how I’m initializing the stack pointer. Having thought that the matter was settled, I deleted the other ‘control’ instructions that I was using for comparison. Luckily for me, I jotted it all down in my previous notes, so it was there when I needed it. Here is the reconstructed version:

la sp, END_OF_RAM # initialize the stack pointer
la sp, 0x20005000 # *** debug *** for comparison
lui sp, %hi(END_OF_RAM) # initialize the stack pointer 
la t0, END_OF_RAM # *** debug *** for comparison 
la t0, 0x20005000 # *** debug *** for comparison 

And here is what it gets crunched down to after assembly:

la sp, END_OF_RAM # initialize the stack pointer
0: 20005137 lui sp,0x20005
la sp, 0x20005000 # *** debug *** for comparison
4: 20005137 lui sp,0x20005
lui sp, %hi(END_OF_RAM) # initialize the stack pointer
8: 20005137 lui sp,0x20005
la t0, END_OF_RAM # *** debug *** for comparison
c: 200052b7 lui t0,0x20005
la t0, 0x20005000 # *** debug *** for comparison
10: 200052b7 lui t0,0x20005

So I had noticed that the first three instructions all resulted in the same instruction, 0x20005137, while the last two gave me a puzzling 0x200052b7. And in the clear light of day I see that is not the catastrophe I originally thought it was, because they are not initializing the stack pointer, but the t0 register, for a comparison.

Just to be doubly sure, I debug the code a line at a time and watch the registers in real time. Again, I spy the stack pointer holding the mysterious value 0x20002800 before any of the code is executed in this debug session.

The first instruction does what I want, which, in this particular instance was what I told it to do. The second instruction also seems to work properly, but since there was no net observable change of state, I might want to introduce intentional intermediate values just to make sure that the correct value is really being re-written to the stack pointer. The third instruction follows the pattern, leaving sp standing at 0x20005000.

The fourth instruction sets t0 to decimal value 536891392, which translates to 0x20005000, so that’s good. The fifth and final instruction does the same.

So the only mystery remaining is why I forgot that I had intentionally introduced a wrinkle into the testing methodology without following up on it correctly. It helps to write things down.

All this work to understand how to succinctly initialize the stack pointer when I’m not even using it yet. That, I think, is about to change.

Now I can get on to the important business of creating not one but two fake reset buttons for the development board of a chip that has no external reset signal available.

So the plan is to use an external interrupt line to trigger a routine that initiates a self-reset of the chip. So I will start at the very end and write a short routine that resets the chip and then call it when the user types a particular key on the console. And to do that, I need to modify the present code to wait for a character to arrive from the USART, echo it and then check if the reset key has been pressed and jump to the reset routine if it has. It would also be nice if the LED continued to blink, if not all by itself then perhaps every time a character was received.

Right now the USART initialization code spits out a single ‘!’ character to show that the thing is working.

I have just discovered that the GNU assembler treats ‘//’ as the single-line comment pattern, in addition to ‘#’. The old-school /* comment */ format is also supported. I only saw this because MRS2, which evolved from Eclipse, will append ‘// ‘ (including a space) on selected lines of code then you press the command+/ keys, similar to control+/ on other OSes.

So I will create a function that toggles the LED, so that I don’t have to keep track of two peripheral pointers at once.

The first function is always the most interesting. I vaguely remember how to do this in RISC-V assembly. First you decrement the stack pointer by how many bytes of memory your function will need, then preserve any registers that you need to preserve for the caller, if any. Then you do whatever is appropriate for the function to do and then back out the way you came in: ‘pop’ any preserved values off the stack and then return the stack pointer to its original value.

Now this makes me start thinking about whether my use of t0/t1 in the initialization code was the best choice. Technically, since no functions were called and no interrupts were suspected, it didn’t matter which registers I used. But now that we’re entering the grown-up world of proper functions and accountability, perhaps I should switch over to the ‘saved’ registers, s0-s11. They are also known as:

x8  s0/fp   saved register 0, frame pointer
x9  s1  saved register 1
x18 s2  saved register 2
x19 s3  saved register 3
x20 s4  saved register 4
x21 s5  saved register 5
x22 s6  saved register 6
x23 s7  saved register 7
x24 s8  saved register 8
x25 s9  saved register 9
x26 s10 saved register 10
x27 s11 saved register 11

Note that s2-s11 are not present at all on the RV32EC devices, such as the CH32V003.

Again, this is only if I want to stay reasonably aligned with the published ABI, which I am under no obligation whatsoever to observe. I have zero intention at this point of ‘cooperating’ with any other software on this project, so I have the dizzying freedom to do as I think best. And what I think is best does tend to shift a bit over time.

For reference, here is the RISC-V calling conventions as codified from the source:

https://riscv.org/wp-content/uploads/2024/12/riscv-calling.pdf

There is one caveat to that statement, however. Both the QingKeV2 and QingKeV4 processors have a feature called Hardware Prologue and Epilogue (HPE). This described in the V2 PM in Secction 3.4, p. 14 and V4 PM p. 17. This mechanism is triggered by either an interrupt or an exception and saves and restores either 10 (V2) or 16 (V4) of the ‘caller saved’ registers to and from the stack.

The V2 decrements to stack pointer by 48 before the push and adds it back afterward. The V4 saves the list of registers to an internal stack in a single cycle, then restores them when appropriate.

V2 saved registers:

x1  ra
x5  t0
x6  t1
x7  t2
x10 a0
x11 a1
x12 a2
x13 a3
x14 a4
x15 a5

V4 saved registers:

x1  ra
x5  t0
x6  t1
x7  t2
x10 a0
x11 a1
x12 a2
x13 a3
x14 a4
x15 a5
x16 a6
x17 a7
x28 t3
x29 t4
x30 t5
x31 t6

Updating to the latest Processor Manual (PM) for the QingKeV2, V1.2, I am reminded of questions I have yet to solve, such as what is the EABI mode on/off option controlled from the Interrupt System Control Register (INTSYSCR), at CSR address 0x804?

There is a typo in the PM V4 in Table 1-2 RISC-V Registers, where registers x18-27 are referred to as a2-11, when they should be called s2-11. Oopsie.

I’m not 100% sure how I can leverage this hardware capability to my advantage yet, but it’s nice to know it’s there. How much should it affect my choice of working registers?

So I think I’m going to switch over to using s0/s1 for the initialization code and thenceforth into the future. So that seems to work OK, and I can now write a function to toggle the LED and call it from a loop and see what breaks next.

Here’s what the program looks like now with an endless loop that just calls the led_toggle function forever:

loop: # an endless loop

    call led_toggle # toggle the LED

    j loop # do it again

led_toggle: # toggle LED1 on A0

    # on entry: none
    # on exit: none

    addi sp, sp, -16 # allocate space on the stack
    sw s0, 12(sp) # preserve s0
    sw s1, 8(sp) # preserve s1

    la s0, GPIOA_BASE
    lh s1, GPIO_OUTDR(s0) # read present value of GPIO_OUTDR
    xori s1, s1, (1 << 0) # toggle bit 0
    sh s1, GPIO_OUTDR(s0) # write inverted value back

    lw s0, 12(sp) # restore s0
    lw s1, 8(sp) # restore s1
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

It’s nice to leave a comment in the function as to what is expected on entry and exit. In this example, no arguments are passed into the function and no return value is expected.

Alas, TextEdit does not allow me to indent a block of text. I can do it in the MRS2 code editor, copy it to the clipboard and then un-indent it in MRS2, then paste it into TextEdit.

Note that I am allocating four words of space on the stack in preparation for saving the registers, when I only need 2 words. This is because the ABI says to allocate space on the stack in 16 byte (128 bit) blocks. Now the ABI is not the boss of me, but sometimes I find that I need one more register and having allocated a bigger-than-needed block comes in handy. It doesn’t cost anything in execution time.

Now I need a function to see if the USART has received any characters.

usart_rxne: # return USART1 RXNE receive register not empty status

    # on entry: none
    # on exit: a0[0] = USART1_RXNE

    addi sp, sp, -16 # allocate space on the stack
    sw s0, 12(sp) # preserve s0

    la s0, USART1_BASE
    lh a0, USART_STATR(s0) # read status register
    andi a0, a0, USART_RXNE # isolate RXNE receive register not empty status bit
    snez a0, a0 # set not equal to zero

    lw s0, 12(sp) # restore s0
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

So since I am going to use the s0 register as the peripheral pointer to USART1, I preserve its current value on the stack. Setting s0 to USART1_BASE, I can read in the status register STATR and mask out all the bits except USART_RXNE, leaving only the status bit. Now this particular status bit happens to be in bit position 5, leaving either a zero (receive register is empty) or a 0x20 (receive register not empty). Using the pseudoinstruction snez, I effectively set bit 0 to a 1 or a zero, depending on the value in the register. This makes it easier for the calling function to interpret the results, i.e., true or false, than expecting it to know which bits means what in every kind of peripheral status register.

Now I can write another function to actually receive and return a single character from the USART, using the usart_rxne function to tell if there’s anything there yet or not. Now in truth it wold also be possible to structure this another way. I prefer to do it this way, as I might want to have a separate routine to tell if a key has been pressed, e.g., kb_hit().

Here is the code to reset the system:

system_reset: # reset the system

    # on entry: none
    # on exit: does not return

    la s0, PFIC_BASE
    li s1, 0xFA050000 # key 1
    sw s1, PFIC_CFGR(s0)
    li s1, 0xBCAF0000 # key 2
    sw s1, PFIC_CFGR(s0)
    li s1, 0xBEEF0000 | PFIC_RESETSYS # key 3 + system reset request bit
    sw s1, PFIC_CFGR(s0)

1:  j 1b # loop here until reset occurs

I don’t really know if the loop at the end does any good or not, but there’s no specification on how long it takes for the system reset to take hold. We don’t want any more code executing after this point.

I added some code to get a character from the USART and echo it back to the console, then check to see if it was a 0x00 character (control+space) that was received. If it was, it just jumps to the system_reset function. The code there just feeds the three key values to the PFIC_CFGR register, with the last key having the SYSRST bit set (bit 7). The SVD calls it PFIC_RESETSYS.

This seems to work. The chip seems to reset and print a ‘!’ character, then just echoes back anything else typed in, until you type control+space, then you see another ‘!’ appear. Also the LED is toggled after every keystroke, and gets reset to ‘on’ after a reset.

I added a usart_puts function to print a nul-terminated string via USART1. You can declare a constant text string to print like this:

announce_string: .asciz "G8-asm\r\n"

And the code to print it is this:

la a0, announce_string # announce
call usart_puts

So now all I have to do is to wire an external interrupt to trigger the system_reset code.

The fisrt target is the ‘Download’ button on the board. It is connected between Vcc and PC17. So I need to setup PC17 in the GPIO initialization section as an input with a pull-down resistor. I also have to enable the peripheral clock for GPIOC.

Enabling the GPIOC peripheral clock was easy. I just added the RCC_IOPCEN value to the list:

# enable peripheral clocks

li s1, RCC_USART1EN | RCC_IOPCEN | RCC_IOPBEN | RCC_IOPAEN
sw s1, RCC_APB2PCENR(s0)

Configuring PC17 is going to be a little different than all the other GPIO pins I’ve initialized so far. First of all, you might have noticed that its number (17) is out of the ‘normal’ range of 0-15 that most of the GPIO in this series as well as the STM32 devices share. So its configuration is handled through the USB PD periperhal. PC17 is also used as the USB-PD ‘PDM’ signal.

Further reading of the RM (always rewarding, even if not right away) reveals that there are a couple of ‘expansion’ registers for the higher-order GPIO bits, like our little friend PC17.

Ah, I have mis-read the documentation. It’s not the USB PD peripheral that’s connected to PC17, it’s the normal, regular, standard USB-DM line of the USB full speed device. As that is presently connected to the i8 wireless dongle, pushing the button is not going to be immediately detectable by my simple software techniques.

I don’t think it will be possible to use the ‘Download’ button as an alternative reset button.
What I do think is that I ought to think about it some more and get back to it tomorrow.

In other news, the i8 battery is still running.