in Computer Engineering

Harlock – A small language to handle hex and elf files

This is a simple program that blinks a led on an ATMega644 AVR microcontroller, a classic first application used a lot as a sort of hello world in embedded C:

#include <avr/io.h>
#include <util/delay.h>

#define LED         0
#define HEADER_SIZE 24

const uint8_t header[HEADER_SIZE] __attribute__((section(".metadata")));

int main(void) {
    DDRA = (1U << LED);
    PORTA = (0U << LED);
    for(;;) {
        PORTA ^= (1U << LED);
        _delay_ms(2000);
    }
}

The needs of the application

I need this application to be updated via some sort of OTA/OTW update process. Let’s suppose I have a bootloader that can either update or just launch the application, after verifying its integrity. This can happen, for example, by computing the SHA1 hash of the firmware (specifically in relation to the .text section), comparing that with a pre-computed one, obtained at compile time.

Once we have the hash in some way or another, we need it to be written in a known section of the program memory, which in the presented example is represented by the .metadata section. I can put it in there by using a specific linker flag (-Wl,–section-start=.metadata=0xDFE8).

And you know what? It would also be nice to have the possibility of finding this hash value both in the hex and the elf executables produced by the compiler, and to be able to get the address and size of the elf sectors without having to hard-code their values.

Let’s say we have something capable of doing all of that: this would also imply that I may have to install an additional application, adding a dependency to my build system, and it would really be nice if we could avoid that.

To summarise, I want to:

  • Manipulate hex files and elf binaries both in read and write mode;
  • Compute hashes of a certain amount of data extracted from said files;
  • Position the tool inside a build pipeline without too much hassle;
  • Preferably not having to introduce a new build dependency for me and for whomever will have to build the project.

Harlock – A small DSL

I actually needed this kind of tool many times while developing firmware on different hardware platforms: in the end, I rolled up my sleeves and took it upon me to write a small and simple language that could make my life easier.

Harlock is a domain specific language created to manipulate artifacts coming from compiling applications that must run on embedded systems. It comes with its own interpreter and is released under the MIT open source license.

@github.com/Abathargh/harlock
Wiki @github.com/Abathargh/harlock/wiki

Handling hex files and elf binaries in read and write mode

Harlock allows you to manage hex and elf files (and even any other kind of file as binary blobs) by providing a simple and intuitive file API. Let’s suppose we compile the previous application, producing in the process a main.elf binary and a main.hex file.

You can then easily edit and get information from within them:

var section = ".text"

var e = try open("main.elf", "elf")
var addr = try e.section_address(section)
var size = try e.section_size(section)

print("Section", section, "Addr:", addr, "Size:", size)
try e.write_section(section, [0x01, 0x02, 0x03], 0)
save(e)

var h = try open("main.hex", "hex")
var text_hex = try h.read_at(addr, size)
try h.write_at(addr, [0x01, 0x02, 0x03])
save(h)

The open function loads the whole file in memory and makes it possible to operate differently on the file depending on its type:

  • You can programmatically obtain information on single elf sections, read their contents or write data inside of them.
  • You can also read from and write to hex files: these operations are made possible by a hand-written ihex parser, which scans hex records and builds the file layout in the background.

Each and every r/w function uses simple integer “arrays”, internally coded as byte lists. This allows for a simple way of handling data through other features of the language, which support operation on the data type.

You can then dump your changes back to the original file through the save function.

Computing hashes of data extracted from hex and elf files

Computing an hash is trivial, and can be done by using the builtin hash function:

var text_addr = 0x0000
var text_size = 0xae
var h    = try open("main.hex", "hex")
var text = try h.read_at(text_addr, text_size)
var cks  = hash(text, "sha1")
print(cks)

The function expects a bytes array and a string describing the algorithm to use (as of harlock v0.5.0, sha1, sha256 and md5 are supported), and will compute the hash of the passed array.

Launching a script

To launch a script, just install the harlock tool/interpreter (as explained here) and pass the script as its first input argument:

[~]$ echo "var input = if len(args) > 1 { args[1] } else { \"world\" }
dquote> print(\"Hello\", input)" > test.hlk
[~]$ harlock test.hlk                                                 
Hello world
[~]$ harlock test.hlk antima                                          
Hello antima

You can also start a REPL interactive session, to test something on the fly:

[~]$ harlock
Harlock v0.4.1-31-g8cfed2a - amd64 on linux
>>> print("test")
test
>>> [1, 2, 3, 4].reduce(fun(x, y) {           
... 	x+y
... })           
... 
10
>>> hash([1, 2, 3, 4].map(fun(x) { x*2 }), "sha1")
[49, 171, 192, 184, 240, 37, 141, 6, 123, 130, 103, 242, 99, 184, 217, 31, 61, 239, 176, 107]
>>> 

No dependencies?

Adding a tool to a build pipeline is not always easy to do or feasible, expecially if you are working in heterogeneous groups. To simplify things, it’s possible to generate a statically linked binary, which embeds the whole harlock runtime together with the script, by launching the harlock tool with the -embed flag:

[~]$ harlock -embed test.hlk
go: creating new go.mod: module embedded_harlock
go: to add module requirements and sums:
	go mod tidy
go: finding module for package github.com/Abathargh/harlock/pkg/interpreter
go: found github.com/Abathargh/harlock/pkg/interpreter in github.com/Abathargh/harlock v0.4.1
Generated "test" [~]$ file test test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=wpIyZxP0rIUSryTz7m8c/yZ-BQwygAlOEyecHh2K6/7BYmQrxn4xKInS-zJ0fT/hj3ijmHCG1KzI3NAtCSq, not stripped [~]$ ./test Hello world [~]$ ./test antima Hello antima

This process generates a native go binary, removing the need to install the interpreter, that can be sent to anyone that wants to use the tool without installing it. Note that a go 1.18+ installation is required on the machine where you are going to generate the executable.

Harlock in action

A more realistic application of this tool is to use it to dynamically generate metadata during the post-build phase of a build pipeline.

Let’s consider a simple bootloader for an AVR chip, that checks for the integrity of the application that I previously introduced. If eveything is ok it launches the application, otherwise it blinks a led at 2Hz.

The board that I’m using to run this demo is similar to the one I showed in the Embedded Logs – From scratch series.

(Complete project available @github.com/Abathargh/avr-harlock-demo)

#define METADATA ((uint8_t *)(0xdfe8))
#define METADATA_SIZE 24

#define OFFSET_TEXT_ADDR (METADATA + 0)
#define OFFSET_TEXT_SIZE (METADATA + 2)
#define OFFSET_SHA1HASH (METADATA + 4)
#define SHA1HASH_SIZE (20)
int main(void) { uint8_t digest[METADATA_SIZE]; SHA1Context ctx; DDRA = (1U << LED); PORTA = (0U << LED); // @1Hz blink to indicate the bootloader has started hello_blink(); SHA1Reset(&ctx); uint16_t text_addr = pgm_read_word(OFFSET_TEXT_ADDR); uint16_t text_size = pgm_read_word(OFFSET_TEXT_SIZE); for(size_t i = 0; i < text_size; i++) { uint8_t curr_byte = pgm_read_byte((text_addr+i)); SHA1Input(&ctx, &curr_byte, 1); } SHA1Result(&ctx, digest); if(is_app_valid(digest)) { // Valid SHA1 check: start the application __asm__("jmp 0000"); }
// Invalid SHA1 check: blink @2Hz error_blink(); }


The only information the bootloader has beforehand is related to the dimension and the position of the header in program memory, other than the offsets between the different bits of information stored within the header. These bits are embedded within the .metadata section of the hex and elf files by the following harlock script:

// Helper/builder that creates a header in one go
var make_header = fun(text_addr, text_size, cks) {
	var word_size = 2
	var t_base_arr  = try as_array(text_addr, word_size, "little")
	var t_base_size = try as_array(text_size, word_size, "little")
	ret t_base_arr + t_base_size + cks
}

// Get the project name
var project = try args[1]

// Default header section = '.metadata'; if the user passed a third argument, use that instead
var text = ".text"
var section = if len(args) == 3 { args[2] } else { ".metadata" }


var hex_file = "build/" + project + ".hex"
var elf_file = "build/" + project + ".elf"


// Open the .elf file and extract section info from there
var e = try open(elf_file, "elf")

var text_addr = try e.section_address(text)
var text_size = try e.section_size(text)

var meta_addr = try e.section_address(section)
var meta_size = try e.section_size(section)


// Compute and write the application header
var h = try open(hex_file, "hex")

var text_hex = try h.read_at(text_addr, text_size)
var cks_hex  = try hash(text_hex, "sha1")

var header = try make_header(text_addr, text_size, cks_hex)
try h.write_at(meta_addr, header)
try save(h)

// Do the same for the elf binary
var text_elf = try e.read_section(text)
var cks_elf  = try hash(text_elf, "sha1")

var header2 = try make_header(text_addr, text_size, cks_elf)
try e.write_section(section, header, 0)
try save(e)

// Print some info for the user
print(text, "          -- addr: ", hex(text_addr), "  size: ", hex(text_size))
print(section, "      -- addr: ", hex(meta_addr), "size: ", hex(meta_size))

print("Digest (.hex): ", hex(cks_hex), "Length: ", len(cks_hex))
print("Digest (.elf): ", hex(cks_elf), "Length: ", len(cks_elf))
print("Application section: @" + hex(meta_addr) + "\n")

Running make, we can check out the build output:

Compiler output for the example included within the article., which includes the output of the harlock script execution.

A quick hexdump reveals how the data is actually embedded within the specified memory location (dfe8-dfff):

The results


That’s all for now, I hope you enjoyed the article and feel free to leave a comment below if you have any questions or want me to elaborate on any detail

Write a Comment

Comment