# My Journey with LLVM (GSoC’20 Phase 1)

## 2020/06/30

It has been one month since my proposal gets accepted by GSoC’20. I learned a lot and had a wonderful time. Besides, we’ve made some progress towards our goal. Hence, it’s a good time to review what I’ve done and what I’ve learned in the first coding period.

### The Project

In LLVM, we use yaml2obj to handcraft simple binaries of various formats in YAML, e.g., ELF, Mach-O, COFF, etc. My project is to add DWARF support to yaml2obj which hopefully makes it easier for people to handcraft debug sections in those kinds of binaries. This project is supervised by James Henderson.

### The Progress

We’ve already ported existing DWARF implementation to yaml2elf as planned. People are able to handcraft DWARF sections at a low level. I have to admit that the current implementation of DWARF sections is hard to use since we have to specify nearly every field of those sections, e.g., the length, the version, the address or offset of the associated DWARF section, etc. That’s because those sections are isolated in the current implementation and DWARFYAML lacks a strategy to make those sections get interlinked properly. This is what we are going to address and I believe it will be improved in the future. We also have a spreadsheet to record the progress against the expected timeline.

### The Implementation Status

The supported DWARF sections’ syntax and known issues are listed below. I’m not going to resolve all of the issues since some DWARF sections are deprecated in DWARFv5 spec and rarely used.

Note: The fields quoted by “[[]]” are optional.

Syntax Known Issues/Possible Improvements
debug_abbrev:  - [[Code: 1]]    Tag: DW_CHILDREN_yes    Attributes:      - Attribute: DW_AT_producer        Form: DW_FORM_strp
* Doesn’t support emitting multiple abbrev tables. D83116
debug_addr:  - [[Format: DWARF32/DWARF64]]    [[Length: 0x1234]]    Version: 5    [[AddressSize: 8]]    [[SegmentSelectorSize: 0]]    Entries:      - Address: 0x1234        [[Segment: 0x1234]]
* yaml2macho doesn’t support emitting the .debug_addr section.
* dwarf2yaml doesn’t support parsing the .debug_addr section.
debug_aranges:  - [[Format: DWARF32/DWARF64]]    Length: 0x1234    CuOffset: 0x1234    AddrSize: 0x08    SegSize: 0x00    Descriptors:      - Address: 0x1234        Length: 0x00
* The Length, AddrSize and SegSize fields should be optional.
* Rename CuOffset to DebugInfoOffset.
* Rename AddrSize to AddressSize.
* Rename SegSize to SegmentSelectorSize.
debug_info:  - [[Format: DWARF32/DWARF64]]    Length: 0x1234    Version: 5    UnitType: DW_UT_compile    AbbrOffset: 0x00    AddrSize: 0x08    Entries:      - AbbrCode: 1        Values:          - Value: 0x1234          - BlockData: [ 0x12, 0x34 ]          - CStr: ‘abcd’
* Rename AbbrOffset to DebugAbbrevOffset.
* Rename AddrSize to AddressSize.
* Rename AbbrCode to AbbrevCode or Code.
debug_line:  - [[Format: DWARF32/DWARF64]]    Length: 0x1234    Version: 4    PrologueLength: 0x1234    MinInstLength: 1    DefaultIsStmt: 1    LineBase: 251    LineRange: 14    OpcodeBase: 3    StandardOpcodeLengths: [ 0, 1, 1 ]    IncludeDirs:      - a.dir    Files:      - Name: hello.c        DirIndex: 0        ModTime: 0        Length: 0    Opcodes:      - Opcode: DW_LNS_extended_op        ExtLen: 9        SubOpcode: DW_LNE_set_address        Data: 0x1234
* The DWARFv5 .debug_line section isn’t tested.
debug_pub_names/types:  Length:    TotalLength: 0xffffffff    TotalLength64: 0x0c  Version: 2  UnitOffset: 0x1234  UnitSize: 0x4321  Entries:    DieOffset: 0x1234    Name: abcd
* Doesn’t support emitting multiple pub tables.
* Replace Length with Format and Length.
debug_ranges:  - AddrSize: 0x04    Entries:      - LowOffset: 0x10        HighOffset: 0x20
debug_str:  - abc  - def

### Accomplishments

I’m very happy that I’m roughly able to reach the goal of the first period. During the first coding period, I learned about how the debug information is represented at a lower level in object files and how to process errors in the LLVM library. I’m also able to dig into some related core libraries, such as DebugInfo, CodeGen, and so on.

### Areas in Need of Improvements

However, there are still some areas that I didn’t do well. When I was working on porting DWARF support to yaml2elf, I found that some DWARF sections were not well-formatted, e.g., the .debug_pub* sections don’t support emitting multiple pub tables, the .debug_abbrev section doesn’t support emitting multiple abbreviation tables, the .debug_pub* and .debug_abbrev sections lack terminating entries, etc. I used to port them to yaml2elf first and then try to fix the issue. However, it’s not the right approach! I should have fixed the issue first and then ported the section to yaml2elf so that I don’t have to update the test cases in many places and this prevents ill-formed test cases from spreading everywhere.

Besides, if I had made elf2yaml support converting DWARF sections back to YAML, my life would be easier. After porting some sections to yaml2elf, I realize that it’s good for us to have a tool that is able to convert DWARF sections back so that I don’t have to handcraft too many sections.

### Acknowledgements

I would love to express my sincere gratitude to James Henderson for mentoring me during this project, and to folks for reviewing my patches and giving many useful suggestions in my proposal!

### Accepted Patches

In case these patches are useful for evaluation.