Profiling osbuild-composer with Go pprof

One of the projects that I work on is osbuild-composer . This is an operating system image builder, written in Go, and is the replacement for the lorax-composer project. I’m going to skim over the details since this post is about profiling things, but you can read more about it at and the [GitHub repository].

Behind the scenes it uses DNF to handle depsolving and package queries. A few weeks back I wrote a patch to cache the results from the dnf-json program in order to speed up the response time when listing all the packages (over 63000 of them in Fedora 35!) and when searching for details of package lists. Cache entries expire after 60s so that there is minimal chance of the data being out of sync with the DNF repository metadata.

One of the questions that came up was ‘how much memory does this cache use?’ and whether it needed some way, other than the 60s expiration time I implemented, to remove old entries. So I decided to gather some data and see how much RAM was actually being used.

Go has support for profiling via the runtime/pprof module. You can profile your program using go test or via calls to pprof helper functions. This is fine for a short running cmdline program, but for a service like osbuild-composer you need to run it for a while, make multiple queries, and examine the results. Go has a http module that exposes the pprof data on a socket. Documentation is in the net/http/pprof module.

Adding support to osbuild-composer was simple . Import the module and launch the server on port 6060. After rebuilding osbuild-composer with make scratch, I installed the new rpms in the test VM and accessed the pprof data from the cmdline like this:

pprof http://localhost:6060/debug/pprof/heap

If you want to use a browser instead of the interactive cmdline mode you can run this:

pprof -http=:8080 http://localhost:6060/debug/pprof/heap

And a server will be launched on port 8080. You can now use your browser to view the details of the heap data. Note that this is a snapshot in time that is made when you run pprof. To refresh it you exit (Ctrl-C) and run it again. Now we can explore the memory usage of various parts of the running service. I have a script that uses echo 3 > /proc/sys/vm/drop_caches to drop the kernel caches in order to try and get more consistent results inbetween calls to the API. The script looks like this:

function run_curl() {
        echo 3 > /proc/sys/vm/drop_caches
        echo "time for $1"
        time curl --silent --unix-socket /var/run/weldr/api.socket http://localhost/$1 > /tmp/debug.log

run_curl 'api/v0/modules/list/**'
run_curl 'api/v0/modules/list/'
run_curl 'api/v0/projects/list/'
run_curl 'api/v0/projects/info/0ad,0ad-data,0xFFFF,2048-cli,2048-cli-nocurses,2048-cli-sdl,2ping,389-ds-base,389-ds-base-devel,389-ds-base-libs,389-ds-base-snmp,3Depict,3dprinter-udev-rules,3mux,3proxy,4Pane,4diac-forte,4th,4th-devel,4ti2,4ti2-devel,4ti2-libs,5minute,64tass,7kaa,7kaa-data,90-Second-Portraits,9pfs,9wm,AGReader,AcetoneISO,Add64,Agda,Agda-common,Agda-stdlib,Agda-stdlib-docs,AllegroOGG,AllegroOGG-devel,ArpON,AtomicParsley,AusweisApp2,AusweisApp2-data,AusweisApp2-doc,BackupPC,BackupPC-XS,BareBonesBrowserLaunch,BareBonesBrowserLaunch-javadoc,BibTool,BitchX,BlockOutII'
run_curl 'api/v0/modules/list/golang-*'
run_curl 'api/v0/modules/list/*goat*'

After running this I launched a pprof http server and use the `top’ view to examine memory allocation:


The allocation data is related to the functions that allocate the data - there is no way that I know of to look at a specific memory structure. The important functions in this case are SearchMetadata and FetchMetadata. They add dnf-json results to the resultCache which is part of BaseSolver . We can see that it is using 86.66MB, and you can then make more requests and rerun pprof to see how large this number grows.

If you have graphviz installed pprof can also generate the call graph, select the ‘Graph’ option below the ‘VIEW’ dropdown in the browser and you will see something similar to this:

SearchMetadata Call Graph

I also wrote a script to query the info on all available packages in parallel. But it ends up that dnf-json limits how many requests can be processed. The conclusion that I came to was that dnf-json is locking access to the metadata, essentially serializing parallel runs of dnf-json so the cache in osbuild-composer cannot grow large due to the slow response time of dnf-json combined with the 60s expiration time. So there is no need to add extra complexity to try and limit memory use of the cache.

In addition to using pprof I added a /debug route to the API to extract real-time estimate of the usage . This is based on a technique described in a Stack Overflow post that uses the gob module to encode the map as a byte stream. It has some overhead, but gives a general idea of how much space is being used.

For example, after running the above script the output looks like this:

curl --unix-socket /var/run/weldr/api.socket http://localhost/api/debug

Which is fairly close to what I am seeing with pprof.

runtime/pprof can do much more than what I have described here, see the Go documentation for more details. And here are a couple of other blog posts on profiling Go: one , two .