C++ debug/print custom type with GDB : the case of nlohmann json library C++ debug/print custom type with GDB : the case of nlohmann json library python python

C++ debug/print custom type with GDB : the case of nlohmann json library


I found my own answer reading further the GDB capabilities and stack overflow questions concerning print of std::string.The short path is the easiest.The other path was hard, but I'm glad I managed to do this. There is lots of room for improvements.

Short path v3.1.2

I simply defined a gdb command as follows:

# this is a gdb script# can be loaded from gdb using# source my_script.txt (or. gdb or whatever you like)define pjson# use the lohmann's builtin dump method, ident 4 and use space separatorprintf "%s\n", $arg0.dump(4, ' ', true).c_str()end# configure command helper (text displayed when typing 'help pjson' in gdb)document pjsonPrints a lohmann's JSON C++ variable as a human-readable JSON stringend

Using it in gdb:

(gdb) source my_custom_script.gdb(gdb) pjson foo{    "flex" : 0.2,    "awesome_str": "bleh",    "nested": {        "bar": "barz"    }}

Short path v3.7.0 [EDIT] 2019-onv-06One may also use the new to_string() method,but I could not get it to work withing GDB with a live inferior process. Method below still works.

# this is a gdb script# can be loaded from gdb using# source my_script.txt (or. gdb or whatever you like)define pjson# use the lohmann's builtin dump method, ident 4 and use space separatorprintf "%s\n", $arg0.dump(4, ' ', true, json::error_handler_t::strict).c_str()end# configure command helper (text displayed when typing 'help pjson' in gdb)document pjsonPrints a lohmann's JSON C++ variable as a human-readable JSON stringend

April 18th 2020: WORKING FULL PYTHON GDB (with live inferior process and debug symbols)

Edit 2020-april-26: the code (offsets) here are out of blue and NOT compatible for all platforms/JSON lib compilations. The github project is much more mature regarding this matter (3 platforms tested so far). Code is left there as is since I won't maintain 2 codebases.

versions:

  • https://github.com/nlohmann/json version 3.7.3
  • GNU gdb (GDB) 8.3 for GNAT Community 2019 [rev=gdb-8.3-ref-194-g3fc1095]
  • c++ project built with GPRBUILD/ GNAT Community 2019 (20190517) (x86_64-pc-mingw32)

The following python code shall be loaded within gdb. I use a .gdbinit file sourced in gdb.

Github repo: https://github.com/LoneWanderer-GH/nlohmann-json-gdb

GDB script

Feel free to adopt the loading method of your choice (auto, or not, or IDE plugin, whatever)

set print pretty# source stl_parser.gdb # if you like the good work done with those STL containers GDB parserssource printer.py # the python file is given belowpython gdb.printing.register_pretty_printer(gdb.current_objfile(), build_pretty_printer())

Python script

import gdbimport platformimport sysimport traceback# adapted from https://github.com/hugsy/gef/blob/dev/gef.py# their rights are theirsHORIZONTAL_LINE = "_"  # u"\u2500"LEFT_ARROW = "<-"  # "\u2190 "RIGHT_ARROW = "->"  # " \u2192 "DOWN_ARROW = "|"  # "\u21b3"nlohmann_json_type_namespace = \    r"nlohmann::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, " \    r"std::allocator<char> >, bool, long long, unsigned long long, double, std::allocator, nlohmann::adl_serializer>"# STD black magicMAGIC_STD_VECTOR_OFFSET = 16  # win 10 x64 values, beware on your platformMAGIC_OFFSET_STD_MAP = 32  # win 10 x64 values, beware on your platform""""""# GDB black magic""""""nlohmann_json_type = gdb.lookup_type(nlohmann_json_type_namespace).pointer()# for in memory direct jumps. cast to type is still necessary yet to obtain values, but this could be changed by chaning the types to simpler ones ?std_rb_tree_node_type = gdb.lookup_type("std::_Rb_tree_node_base::_Base_ptr").pointer()std_rb_tree_size_type = gdb.lookup_type("std::size_t").pointer()""""""# nlohmann_json reminder. any interface change should be reflected here# enum class value_t : std::uint8_t# {#     null,             ///< null value#     object,           ///< object (unordered set of name/value pairs)#     array,            ///< array (ordered collection of values)#     string,           ///< string value#     boolean,          ///< boolean value#     number_integer,   ///< number value (signed integer)#     number_unsigned,  ///< number value (unsigned integer)#     number_float,     ///< number value (floating-point)#     discarded         ///< discarded by the the parser callback function# };""""""enum_literals_namespace = ["nlohmann::detail::value_t::null",                            "nlohmann::detail::value_t::object",                            "nlohmann::detail::value_t::array",                            "nlohmann::detail::value_t::string",                            "nlohmann::detail::value_t::boolean",                            "nlohmann::detail::value_t::number_integer",                            "nlohmann::detail::value_t::number_unsigned",                            "nlohmann::detail::value_t::number_float",                            "nlohmann::detail::value_t::discarded"]enum_literal_namespace_to_literal = dict([(e, e.split("::")[-1]) for e in enum_literals_namespace])INDENT = 4 # beautiful isn't it ?def std_stl_item_to_int_address(node):    return int(str(node), 0)def parse_std_str_from_hexa_address(hexa_str):    # https://stackoverflow.com/questions/6776961/how-to-inspect-stdstring-in-gdb-with-no-source-code    return '"{}"'.format(gdb.parse_and_eval("*(char**){}".format(hexa_str)).string())class LohmannJSONPrinter(object):    """Print a nlohmann::json in GDB python    BEWARE :     - Contains shitty string formatting (defining lists and playing with ",".join(...) could be better; ident management is stoneage style)     - Parsing barely tested only with a live inferior process.     - It could possibly work with a core dump + debug symbols. TODO: read that stuff     https://doc.ecoscentric.com/gnutools/doc/gdb/Core-File-Generation.html     - Not idea what happens with no symbols available, lots of fields are retrieved by name and should be changed to offsets if possible     - NO LIB VERSION MANAGEMENT. TODO: determine if there are serious variants in nlohmann data structures that would justify working with strucutres     - PLATFORM DEPENDANT TODO: remove the black magic offsets or handle them in a nicer way    NB: If you are python-kaizer-style-guru, please consider helping or teaching how to improve all that mess    """    def __init__(self, val, indent_level=0):        self.val = val        self.field_type_full_namespace = None        self.field_type_short = None        self.indent_level = indent_level        self.function_map = {"nlohmann::detail::value_t::null": self.parse_as_leaf,                            "nlohmann::detail::value_t::object": self.parse_as_object,                            "nlohmann::detail::value_t::array": self.parse_as_array,                            "nlohmann::detail::value_t::string": self.parse_as_str,                            "nlohmann::detail::value_t::boolean": self.parse_as_leaf,                            "nlohmann::detail::value_t::number_integer": self.parse_as_leaf,                            "nlohmann::detail::value_t::number_unsigned": self.parse_as_leaf,                            "nlohmann::detail::value_t::number_float": self.parse_as_leaf,                            "nlohmann::detail::value_t::discarded": self.parse_as_leaf}    def parse_as_object(self):        assert (self.field_type_short == "object")        o = self.val["m_value"][self.field_type_short]        # traversing tree is a an adapted copy pasta from STL gdb parser        # (http://www.yolinux.com/TUTORIALS/src/dbinit_stl_views-1.03.txt and similar links)        #   Simple GDB Macros writen by Dan Marinescu (H-PhD) - License GPL        #   Inspired by intial work of Tom Malnar,        #     Tony Novac (PhD) / Cornell / Stanford,        #     Gilad Mishne (PhD) and Many Many Others.        #   Contact: dan_c_marinescu@yahoo.com (Subject: STL)        #        #   Modified to work with g++ 4.3 by Anders Elton        #   Also added _member functions, that instead of printing the entire class in map, prints a member.        node = o["_M_t"]["_M_impl"]["_M_header"]["_M_left"]        # end = o["_M_t"]["_M_impl"]["_M_header"]        tree_size = o["_M_t"]["_M_impl"]["_M_node_count"]        # in memory alternatives:        _M_t = std_stl_item_to_int_address(o.referenced_value().address)        _M_t_M_impl_M_header_M_left = _M_t + 8 + 16 # adding bits        _M_t_M_impl_M_node_count    = _M_t + 8 + 16 + 16 # adding bits        node = gdb.Value(long(_M_t_M_impl_M_header_M_left)).cast(std_rb_tree_node_type).referenced_value()        tree_size = gdb.Value(long(_M_t_M_impl_M_node_count)).cast(std_rb_tree_size_type).referenced_value()        i = 0        if tree_size == 0:            return "{}"        else:            s = "{\n"            self.indent_level += 1            while i < tree_size:                # STL GDB scripts write "+1" which in my w10 x64 GDB makes a +32 bits move ...                # may be platform dependant and should be taken with caution                key_address = std_stl_item_to_int_address(node) + MAGIC_OFFSET_STD_MAP                # print(key_object['_M_dataplus']['_M_p'])                k_str = parse_std_str_from_hexa_address(hex(key_address))                # offset = MAGIC_OFFSET_STD_MAP                value_address = key_address + MAGIC_OFFSET_STD_MAP                value_object = gdb.Value(long(value_address)).cast(nlohmann_json_type)                v_str = LohmannJSONPrinter(value_object, self.indent_level + 1).to_string()                k_v_str = "{} : {}".format(k_str, v_str)                end_of_line = "\n" if tree_size <= 1 or i == tree_size else ",\n"                s = s + (" " * (self.indent_level * INDENT)) + k_v_str + end_of_line  # ",\n"                if std_stl_item_to_int_address(node["_M_right"]) != 0:                    node = node["_M_right"]                    while std_stl_item_to_int_address(node["_M_left"]) != 0:                        node = node["_M_left"]                else:                    tmp_node = node["_M_parent"]                    while std_stl_item_to_int_address(node) == std_stl_item_to_int_address(tmp_node["_M_right"]):                        node = tmp_node                        tmp_node = tmp_node["_M_parent"]                    if std_stl_item_to_int_address(node["_M_right"]) != std_stl_item_to_int_address(tmp_node):                        node = tmp_node                i += 1            self.indent_level -= 2            s = s + (" " * (self.indent_level * INDENT)) + "}"            return s    def parse_as_str(self):        return parse_std_str_from_hexa_address(str(self.val["m_value"][self.field_type_short]))    def parse_as_leaf(self):        s = "WTFBBQ !"        if self.field_type_short == "null" or self.field_type_short == "discarded":            s = self.field_type_short        elif self.field_type_short == "string":            s = self.parse_as_str()        else:            s = str(self.val["m_value"][self.field_type_short])        return s    def parse_as_array(self):        assert (self.field_type_short == "array")        o = self.val["m_value"][self.field_type_short]        start = o["_M_impl"]["_M_start"]        size = o["_M_impl"]["_M_finish"] - start        # capacity = o["_M_impl"]["_M_end_of_storage"] - start        # size_max = size - 1        i = 0        start_address = std_stl_item_to_int_address(start)        if size == 0:            s = "[]"        else:            self.indent_level += 1            s = "[\n"            while i < size:                # STL GDB scripts write "+1" which in my w10 x64 GDB makes a +16 bits move ...                offset = i * MAGIC_STD_VECTOR_OFFSET                i_address = start_address + offset                value_object = gdb.Value(long(i_address)).cast(nlohmann_json_type)                v_str = LohmannJSONPrinter(value_object, self.indent_level + 1).to_string()                end_of_line = "\n" if size <= 1 or i == size else ",\n"                s = s + (" " * (self.indent_level * INDENT)) + v_str + end_of_line                i += 1            self.indent_level -= 2            s = s + (" " * (self.indent_level * INDENT)) + "]"        return s    def is_leaf(self):        return self.field_type_short != "object" and self.field_type_short != "array"    def parse_as_aggregate(self):        if self.field_type_short == "object":            s = self.parse_as_object()        elif self.field_type_short == "array":            s = self.parse_as_array()        else:            s = "WTFBBQ !"        return s    def parse(self):        # s = "WTFBBQ !"        if self.is_leaf():            s = self.parse_as_leaf()        else:            s = self.parse_as_aggregate()        return s    def to_string(self):        try:            self.field_type_full_namespace = self.val["m_type"]            str_val = str(self.field_type_full_namespace)            if not str_val in enum_literal_namespace_to_literal:                return "TIMMY !"            self.field_type_short = enum_literal_namespace_to_literal[str_val]            return self.function_map[str_val]()            # return self.parse()        except:            show_last_exception()            return "NOT A JSON OBJECT // CORRUPTED ?"    def display_hint(self):        return self.val.type# adapted from https://github.com/hugsy/gef/blob/dev/gef.py# inspired by https://stackoverflow.com/questions/44733195/gdb-python-api-getting-the-python-api-of-gdb-to-print-the-offending-line-numbedef show_last_exception():    """Display the last Python exception."""    print("")    exc_type, exc_value, exc_traceback = sys.exc_info()    print(" Exception raised ".center(80, HORIZONTAL_LINE))    print("{}: {}".format(exc_type.__name__, exc_value))    print(" Detailed stacktrace ".center(80, HORIZONTAL_LINE))    for (filename, lineno, method, code) in traceback.extract_tb(exc_traceback)[::-1]:        print("""{} File "{}", line {:d}, in {}()""".format(DOWN_ARROW, filename, lineno, method))        print("   {}    {}".format(RIGHT_ARROW, code))    print(" Last 10 GDB commands ".center(80, HORIZONTAL_LINE))    gdb.execute("show commands")    print(" Runtime environment ".center(80, HORIZONTAL_LINE))    print("* GDB: {}".format(gdb.VERSION))    print("* Python: {:d}.{:d}.{:d} - {:s}".format(sys.version_info.major, sys.version_info.minor,                                                   sys.version_info.micro, sys.version_info.releaselevel))    print("* OS: {:s} - {:s} ({:s}) on {:s}".format(platform.system(), platform.release(),                                                    platform.architecture()[0],                                                    " ".join(platform.dist())))    print(horizontal_line * 80)    print("")    exit(-6000)def build_pretty_printer():    pp = gdb.printing.RegexpCollectionPrettyPrinter("nlohmann_json")    pp.add_printer(nlohmann_json_type_namespace, "^{}$".format(nlohmann_json_type_namespace), LohmannJSONPrinter)    return pp####### executed at autoload (or to be executed by in GDB)# gdb.printing.register_pretty_printer(gdb.current_objfile(),build_pretty_printer())
BEWARE : - Contains shitty string formatting (defining lists and playing with ",".join(...) could be better; ident management is stoneage style) - Parsing barely tested only with a live inferior process. - It could possibly work with a core dump + debug symbols. TODO: read that stuff https://doc.ecoscentric.com/gnutools/doc/gdb/Core-File-Generation.html - Not idea what happens with no symbols available, lots of fields are retrieved by name and should be changed to offsets if possible - NO LIB VERSION MANAGEMENT. TODO: determine if there are serious variants in nlohmann data structures that would justify working with structures - PLATFORM DEPENDANT TODO: remove the black magic offsets or handle them in a nicer wayNB: If you are python-kaizer-style-guru, please consider helping or teaching how to improve all that mess

some (light tests):

gpr file:

project Debug_Printer is   for Source_Dirs use ("src", "include");   for Object_Dir use "obj";   for Main use ("main.cpp");   for Languages use ("C++");   package Naming is      for Spec_Suffix ("c++") use ".hpp";   end Naming;   package Compiler is      for Switches ("c++") use ("-O3", "-Wall", "-Woverloaded-virtual", "-g");   end Compiler;   package Linker is      for Switches ("c++") use ("-g");   end Linker;end Debug_Printer;

main.cpp

#include // i am using the standalone json.hpp from the repo release #include

using json = nlohmann::json;int main() {  json fooz;  fooz = 0.7;  json arr = {3, "25", 0.5};  json one;  one["first"] = "second";  json foo;  foo["flex"] = 0.2;  foo["bool"] = true;  foo["int"] = 5;  foo["float"] = 5.22;  foo["trap "] = "you fell";  foo["awesome_str"] = "bleh";  foo["nested"] = {{"bar", "barz"}};  foo["array"] = { 1, 0, 2 };  std::cout << "fooz" << std::endl;  std::cout << fooz.dump(4) << std::endl << std::endl;  std::cout << "arr" << std::endl;  std::cout << arr.dump(4) << std::endl << std::endl;  std::cout << "one" << std::endl;  std::cout << one.dump(4) << std::endl << std::endl;  std::cout << "foo" << std::endl;  std::cout << foo.dump(4) << std::endl << std::endl;  json mixed_nested;  mixed_nested["Jean"] = fooz;  mixed_nested["Baptiste"] = one;  mixed_nested["Emmanuel"] = arr;  mixed_nested["Zorg"] = foo;  std::cout << "5th element" << std::endl;  std::cout << mixed_nested.dump(4) << std::endl << std::endl;  return 0;}

outputs:enter image description here

(gdb) source .gdbinitBreakpoint 1, main () at F:\DEV\Projets\nlohmann.json\src\main.cpp:45(gdb) p mixed_nested$1 = {    "Baptiste" : {            "first" : "second"    },    "Emmanuel" : [            3,            "25",            0.5,    ],    "Jean" : 0.69999999999999996,    "Zorg" : {            "array" : [                    1,                    0,                    2,            ],            "awesome_str" : "bleh",            "bool" : true,            "flex" : 0.20000000000000001,            "float" : 5.2199999999999998,            "int" : 5,            "nested" : {                    "bar" : "barz"            },            "trap " : "you fell",    },}

Edit 2019-march-24 : add precision given by employed russian.

Edit 2020-april-18 : after a long night of struggling with python/gdb/stl I had something working by the ways of the GDB documentation for python pretty printers. Please forgive any mistakes or misconceptions, I banged my head a whole night on this and everything is flurry-blurry now.

Edit 2020-april-18 (2): rb tree node and tree_size could be traversed in a more "in-memory" way (see above)

Edit 2020-april-26: add warning concerning the GDB python pretty printer.


My solution was to edit the ~/.gdbinit file.

define jsontostring    printf "%s\n", $arg0.dump(2, ' ', true, nlohmann::detail::error_handler_t::strict).c_str()end

This makes the "jsontostring" command available on every gdb session without the need of sourcing any files.

(gdb) jsontostring object