一、dex文件中的数据结构

1.1 数据类型

类型	含义
u1	表示1byte的无符号数
u2	表示2bytes的无符号数
u4	表示4bytes的无符号数
u8	表示8bytes的无符号数
sleb128	有符号LEB128，可变长度为1~5bytes
uleb128	无符号LEB128，可变长度为1~5bytes
uleb128p1	=无符号LEB128值 + 1，可变长度为1~5bytes

前缀u表示无符号，s表示有符号。LEB128是一种变长编码格式，每个LEB128由1~5个字节组成，所有的字节组合在一起表示一个32位的数据。每个字节只有7bit有效位，最高位bit表示后一个字节是否需要（1表示需要，0表示不需要），因为LEB128最多5字节，所以当读取到的第5个字节的最高位为1时，则表示该dex文件无效，Dalvik虚拟机验证失败。

LEB128的实现可在Android源码目录/dalvik/libdex/Leb128.h中找到。（需要注意数据存储模式是小端存储）

/*
 * Reads an unsigned LEB128 value, updating the given pointer to point
 * just past the end of the read value. This function tolerates
 * non-zero high-order bits in the fifth encoded byte.
 */
DEX_INLINE int readUnsignedLeb128(const u1** pStream) {
    const u1* ptr = *pStream;
    int result = *(ptr++);//取低字节

    if (result > 0x7f) {//大于说明最高位为1，需要跟后一字节拼接
        int cur = *(ptr++);
        result = (result & 0x7f) | ((cur & 0x7f) << 7);//读取uleb128,字节的最高位不需要加进来
        if (cur > 0x7f) {
            cur = *(ptr++);
            result |= (cur & 0x7f) << 14;
            if (cur > 0x7f) {
                cur = *(ptr++);
                result |= (cur & 0x7f) << 21;
                if (cur > 0x7f) {
                    /*
                     * Note: We don't check to see if cur is out of
                     * range here, meaning we tolerate garbage in the
                     * high four-order bits.
                     */
                    cur = *(ptr++);
                    result |= cur << 28;
                }
            }
        }
    }

    *pStream = ptr;
    return result;
}

/*
 * Reads a signed LEB128 value, updating the given pointer to point
 * just past the end of the read value. This function tolerates
 * non-zero high-order bits in the fifth encoded byte.
 */
DEX_INLINE int readSignedLeb128(const u1** pStream) {
    const u1* ptr = *pStream;
    int result = *(ptr++);

    if (result <= 0x7f) {//不需要跟后一字节拼接
        result = (result << 25) >> 25;//int为32bits，而我们只需要取字节的有效位7位，所以<<25然后>>25是对最高有效位进行符号扩展
    } else {
        int cur = *(ptr++);
        result = (result & 0x7f) | ((cur & 0x7f) << 7);
        if (cur <= 0x7f) {
            result = (result << 18) >> 18;
        } else {
            cur = *(ptr++);
            result |= (cur & 0x7f) << 14;
            if (cur <= 0x7f) {
                result = (result << 11) >> 11;
            } else {
                cur = *(ptr++);
                result |= (cur & 0x7f) << 21;
                if (cur <= 0x7f) {
                    result = (result << 4) >> 4;
                } else {
                    /*
                     * Note: We don't check to see if cur is out of
                     * range here, meaning we tolerate garbage in the
                     * high four-order bits.
                     */
                    cur = *(ptr++);
                    result |= cur << 28;
                }
            }
        }
    }

    *pStream = ptr;
    return result;
}

1.2 encoded_value 编码

在 annotation_element 和 encoded_array_item 中会使用到encoded_value 编码。encoded_value是（几乎）任意层次结构数据的编码片。这种编码非常精简，易于解析。

名称	格式	说明
(value_arg << 5) \| value_type	ubyte	高3位为value_arg的值，低5位为value_type的值，value_type指定value的格式。
value	ubyte[]	用于表示值的字节，不同 `value_type` 字节的长度不同且采用不同的解译方式；不过一律采用小端字节序。

下面将介绍value的格式。

1.2.1 值格式

类型名称	value_type	value_arg 格式	value 格式	说明
VALUE_BYTE	0x00	（无；必须为 `0`）	ubyte[1]	有符号的单字节整数值
VALUE_SHORT	0x02	size - 1 (0…1)	ubyte[size]	有符号的双字节整数值，符号扩展
VALUE_CHAR	0x03	size - 1 (0…1)	ubyte[size]	无符号的双字节整数值，零扩展
VALUE_INT	0x04	size - 1 (0…3)	ubyte[size]	有符号的四字节整数值，符号扩展
VALUE_LONG	0x06	size - 1 (0…7)	ubyte[size]	有符号的八字节整数值，符号扩展
VALUE_FLOAT	0x10	size - 1 (0…3)	ubyte[size]	四字节位模式，向右零扩展，系统会将其解译为 IEEE754 32 位浮点值
VALUE_DOUBLE	0x11	size - 1 (0…7)	ubyte[size]	八字节位模式，向右零扩展，系统会将其解译为 IEEE754 64 位浮点值
VALUE_METHOD_TYPE	0x15	size - 1 (0…3)	ubyte[size]	无符号（零扩展）四字节整数值，会被解译为要编入 `proto_ids` 区段的索引；表示方法类型值
VALUE_METHOD_HANDLE	0x16	size - 1 (0…3)	ubyte[size]	无符号（零扩展）四字节整数值，会被解译为要编入 `method_handles` 区段的索引；表示方法句柄值
VALUE_STRING	0x17	size - 1 (0…3)	ubyte[size]	无符号（零扩展）四字节整数值，会被解译为要编入 `string_ids` 区段的索引；表示字符串值
VALUE_TYPE	0x18	size - 1 (0…3)	ubyte[size]	无符号（零扩展）四字节整数值，会被解译为要编入 `type_ids` 区段的索引；表示反射类型/类值
VALUE_FIELD	0x19	size - 1 (0…3)	ubyte[size]	无符号（零扩展）四字节整数值，会被解译为要编入 `field_ids` 区段的索引；表示反射字段值
VALUE_METHOD	0x1a	size - 1 (0…3)	ubyte[size]	无符号（零扩展）四字节整数值，会被解译为要编入 `method_ids` 区段的索引；表示反射方法值
VALUE_ENUM	0x1b	size - 1 (0…3)	ubyte[size]	无符号（零扩展）四字节整数值，会被解译为要编入 `field_ids` 区段的索引；表示枚举类型常量的值
VALUE_ARRAY	0x1c	（无；必须为 `0`）	encoded_array	值的数组，采用下文“`encoded_array` 格式”所指定的格式。`value` 的大小隐含在编码中。
VALUE_ANNOTATION	0x1d	（无；必须为 `0`）	encoded_annotation	子注解，采用下文“`encoded_annotation` 格式”所指定的格式。`value` 的大小隐含在编码中。
VALUE_NULL	0x1e	（无；必须为 `0`）	（无）	`null` 引用值
VALUE_BOOLEAN	0x1f	布尔值 (0…1)	（无）	一位值；`0` 表示 `false`，`1` 表示 `true`。该位在 `value_arg` 中表示。

之后会在下面用到的时候再结合实例一起分析。

二、dex文件结构

2.1 dex文件的整体结构

整体结构如下图所示：

dex文件结构体的定义在Android源码目录/dalvik/libdex/DexFile.h中可以找到，其中定义的dex文件结构体如下：

struct DexFile {
    /* directly-mapped "opt" header */
    const DexOptHeader* pOptHeader;

    /* pointers to directly-mapped structs and arrays in base DEX */
    const DexHeader*    pHeader;
    const DexStringId*  pStringIds;
    const DexTypeId*    pTypeIds;
    const DexFieldId*   pFieldIds;
    const DexMethodId*  pMethodIds;
    const DexProtoId*   pProtoIds;
    const DexClassDef*  pClassDefs;
    const DexLink*      pLinkData;

    /*
     * These are mapped out of the "auxillary" section, and may not be
     * included in the file.
     */
    const DexClassLookup* pClassLookup;
    const void*         pRegisterMapPool;       // RegisterMapClassPool

    /* points to start of DEX file data */
    const u1*           baseAddr;

    /* track memory overhead for auxillary structures */
    int                 overhead;

    /* additional app-specific data structures associated with the DEX */
    //void*               auxData;
};

下面就对这几部分进行详细分析。

2.2 dex_header（DexHeader）

dex_header对应DexHeader结构体，该结构体定义如下：

struct DexHeader {
    u1  magic[8];//dex标识+版本号
    u4  checksum;//32位校验码
    u1  signature[kSHA1DigestLen];// SHA-1哈希值
    u4  fileSize;//dex文件的大小
    u4  headerSize;//dex header的大小，009版本为0x5c,035版本为0x70
    u4  endianTag;//标识字节顺序的常量
    
    u4  linkSize;//链接段的大小
    u4  linkOff;//链接段的偏移量
    
    u4  mapOff;//dex map list的偏移量
    
    u4  stringIdsSize;//dex string id的个数
    u4  stringIdsOff;//dex string id的偏移量
    
    u4  typeIdsSize;
    u4  typeIdsOff;
    
    u4  protoIdsSize;
    u4  protoIdsOff;
    
    u4  fieldIdsSize;
    u4  fieldIdsOff;
    
    u4  methodIdsSize;
    u4  methodIdsOff;
    
    u4  classDefsSize;
    u4  classDefsOff;
    
    u4  dataSize;
    u4  dataOff;
};

各个字段解释如下：

字段	偏移量	长度	解释
magic	0x0	8	魔数字段，格式如“dex/n035/0”，其中035表示dex结构的版本号
checksum	0x8	4	dex文件的校验和，通过它来判断dex文件是否被损坏或篡改
signature	0xC	20	文件剩余内容（除 `magic`、`checksum` 和此字段之外的所有内容）的 SHA-1 签名（哈希）；用于对文件进行唯一标识
fileSize	0x20	4	整个dex文件的大小（byte数）
headerSize	0x24	4	dex_header（即DexHeader结构体）的大小
endianTag	0x28	4	指定dex运行环境的cpu字节序（即大端还是小端），有小端字节序（ENDIAN_CONSTANT = 0x12345678）和大端字节序（REVERSE_ENDIAN_CONSTANT = 0x78563412）两种。
linkSize	0x2C	4	链接段的大小
linkOff	0x30	4	链接段的文件偏移量
mapOff	0x34	4	dex_map_list（即DexMapList结构体）的文件偏移量
stringIdsSize	0x38	4	string_ids区中的字符串索引的个数
stringIdsOff	0x3C	4	string_ids区的文件偏移量（一般与headerSize相等）
typeIdsSize	0x40	4	type_ids区中的类型索引的个数
typeIdsOff	0x44	4	type_ids区的文件偏移量
protoIdsSize	0x48	4	proto_ids区中的方法原型索引的个数
protoIdsOff	0x4C	4	proto_ids区的文件偏移量
fieldIdsSize	0x50	4	field_ids区中的域索引的个数
fieldIdsOff	0x54	4	field_ids区的文件偏移量
methodIdsSize	0x58	4	method_ids区中的方法索引的个数
methodIdsOff	0x5C	4	method_ids区的文件偏移量
classDefsSize	0x60	4	class_def区中的类的个数
classDefsOff	0x64	4	class_def区的文件偏移量
dataSize	0x68	4	data区的大小，必须为4字节的整数倍
dataOff	0x6C	4	data区的文件偏移量

以下是一个具体实例的dex_header展示图：

2.3 string_ids（DexStringId列表）

string_ids中的项为DexStringId结构体，该结构体定义如下：

/*
 * Direct-mapped "string_id_item".
 */
struct DexStringId {
    u4 stringDataOff;      /* 字符串的文件偏移量 */
};

DexStringId结构体中只有一个stringDataOff字段，大小4字节，存储指向字符串数据的文件偏移量。需要注意的是字符串采用的是MUTF-8编码表示，它与UTF-8的区别如下：

MUTF-8使用1~3个字节编码长度。
大于16位的Unicode编码U+10000~U+10ffff使用3字节来编码。
U+000采用2字节来编码。
采用类似C语言中的空字符（’/0x00’）作为字符串的结尾。

MUTF-8字符串的头部（1byte）存放的是字符串的字节数。

我们通过具体的实例来更好的理解。在下图中，第4项DexStringId结构体中的stringDataOff的值为0x615E7，指向的字符串的第一个字节为0xC，表示字符串长度为12bytes，后面紧跟着的就是字符串并以0x00作为结束符。

2.4 type_ids（DexTypeId列表）

type_ids中的项为DexTypeId结构体，该结构体的定义如下：

/*
 * Direct-mapped "type_id_item".
 */
struct DexTypeId {
    u4  descriptorIdx;      /* 指向DexStringId列表的索引 */
};

DexTypeId结构体中只有一个descriptorIdx字段，大小4字节，存储指向DexStringId列表的索引，对应的字符串代表具体类的类型（smali语法中的类型）。例如：

type_ids中的第一项的值为0x4C7（即十进制1223），那么就应该寻找string_ids[1223]。

之后的字符串寻找在 2.3 string_ids中说过了，这里不再重复。

2.5 proto_ids（DexProtoId列表）

proto_ids中的项为DexProtoId结构体，该结构体的定义如下：

/*
 * Direct-mapped "proto_id_item".
 */
struct DexProtoId {
    u4  shortyIdx;          /* 方法声明，指向DexStringId列表的索引 */
    u4  returnTypeIdx;      /* 方法的返回类型，指向DexTypeId列表的索引 */
    u4  parametersOff;      /* 参数类型列表，指向type_list(DextypeList结构体)的文件偏移量 */
};

DexFieldId是方法声明（方法签名）的结构体，该结构体中有3个字段，shortyIdx最终指向方法声明字符串（方法声明由返回类型和参数类型列表组成），returnTypeIdx最终指向方法的返回类型字符串，parametersOff指向一个DextypeList结构体，存放了方法的参数类型的列表。

2.5.1 DexTypeList

该结构体的定义如下：

/*
 * Direct-mapped "type_list".
 */
struct DexTypeList {
    u4  size;               /* DexTypeItem的个数*/
    DexTypeItem list[1];    /* 首个DexTypeItem的值，非偏移量 */
};

/*
 * Direct-mapped "type_item".
 */
struct DexTypeItem {
    u2  typeIdx;            /* 指向DexTypeId列表的索引 */
};

现在结合实例来解释proto_ids的寻找过程。例如下图中的proto_id[7]。shortyIdx的值为0x5f2，指向string_ids[1522]，最终的字符串为”CL”；returnTypeIdx的值为0x1，指向type_ids[1]，值为0x5b6，指向string_ids[1462]，最终的字符串为“C”；parametersOff的值为0xE1A08，指向的DexTypeList结构体，该结构体中字段size的值为1，表示只有一个参数，接着就是DexTypeItem结构体，其中字段typeIdx的值为0x7B9，指向type_ids[1977]，值为0x14B3，指向string_ids[5299]，最终的字符串为”Ljava/lang/String;“。

2.6 field_ids（DexFieldId列表）

field_ids中的项为DexFieldId结构体，该结构体的定义如下：

/*
 * Direct-mapped "field_id_item".
 */
struct DexFieldId {
    u2  classIdx;           /* 类的类型，指向DexTypeId列表的索引 */
    u2  typeIdx;            /* 字段类型，指向DexTypeId列表的索引 */
    u4  nameIdx;            /* 字段名，指向DexStringId列表的索引 */
};

DexFieldId结构体指明了成员变量所在的类、类型以及变量名。

这里放一张dex_field部分的图，方便理解，重复的寻找过程就不重述。

2.7 method_ids（DexMethodId列表）

method_ids中的项为DexMethodId结构体，该结构体的定义如下：

/*
 * Direct-mapped "method_id_item".
 */
struct DexMethodId {
    u2  classIdx;           /* 方法的所属的类，指向DexTypeId列表的索引 */
    u2  protoIdx;           /* 声明类型，指向DexProtoId列表的索引 */
    u4  nameIdx;            /* 方法名，指向DexStringId列表的索引 */
};

DexMethodId结构体指明了方法所在的类、方法的声明（签名）以及方法名。

同样附带method_ids部分的示例图以供理解。

2.8 class_def（DexClassDef列表）

/*
 * Direct-mapped "class_def_item".
 */
struct DexClassDef {
    u4  classIdx;           /* 类的类型（即全限定类名），指向DexTypeId列表的索引 */
    u4  accessFlags;		/* 访问标志，它是以ACC_开头的一个枚举值，例如ACC_PUBLIC（0x1）、ACC_PRIVATE（0x2）*/
    u4  superclassIdx;      /* 父类类型，指向DexTypeId列表的索引*/
    u4  interfacesOff;      /* 接口，指向DexTypeList的文件偏移，如果类中不含有接口声明和实现，则值为0 */
    u4  sourceFileIdx;      /* 类所在源文件的文件名，指向DexStringId列表的索引 */
    u4  annotationsOff;     /* 注解，指向DexAnnotationsDirectoryItem结构体，根据类型不同会有注解类、注解方法、注解字段与注解参数，如果类中没有注解，则值为0 */
    u4  classDataOff;       /* 指向DexClassData结构的文件偏移，DexClassData结构是类的数据部分 */
    u4  staticValuesOff;    /* 指向DexEncodedArray结构的文件偏移，DexEncodedArray结构记录类中的静态数据 */
};

依次解析其中未曾出现过的结构体。

2.8.1 DexAnnotationsDirectoryItem

该结构体的定义如下：

/*
 * Direct-mapped "annotations_directory_item".
 */
struct DexAnnotationsDirectoryItem {
    u4  classAnnotationsOff;  /* 类注释，值为DexAnnotationSetItem的文件偏移量 */
    u4  fieldsSize;           /* 域注释，值为DexFieldAnnotationsItem的数量 */
    u4  methodsSize;          /* 方法注释，值为DexMethodAnnotationsItem的数量 */
    u4  parametersSize;       /* 参数注释。值为DexParameterAnnotationsItem的数量 */
    /* 如果上述后三者中存在一个或多个，则在后面追加以下数据，并按下列顺序排列 */
    /* followed by DexFieldAnnotationsItem[fieldsSize] */
    /* followed by DexMethodAnnotationsItem[methodsSize] */
    /* followed by DexParameterAnnotationsItem[parametersSize] */
};

classAnnotationsOff指向DexAnnotationSetItem结构体。DexFieldAnnotationsItem、DexMethodAnnotationsItem、DexParameterAnnotationsItem如果存在，则按顺序排列在parametersSize字段后面！

下面继续解析未曾出现过的结构。

2.8.1.1 DexAnnotationSetItem

该结构体定义如下：

/*
 * Direct-mapped "annotation_set_item".
 */
struct DexAnnotationSetItem {
    u4  size;						/* DexAnnotationItem的数量 */
    u4  entries[1];                 /* 第一个DexAnnotationItem的文件偏移量 */
};

其中DexAnnotationItem的结构体定义如下：

/*
 * Direct-mapped "annotation_item".
 *
 * NOTE: this structure is byte-aligned.
 */
struct DexAnnotationItem {
    u1  visibility;					/* 此注释的预期可见性 */
    u1  annotation[1];              /* encoded_annotation格式的注释内容 */
};

第一个字段visibility表示注释的可见性，主要有以下几种情况：

名称	值	说明
VISIBILITY_BUILD	0x00	预计仅在构建（例如，在编译其他代码期间）时可见
VISIBILITY_RUNTIME	0x01	预计在运行时可见
VISIBILITY_SYSTEM	0x02	预计在运行时可见，但仅对基本系统（而不是常规用户代码）可见

第二个字段annotation表示采用encoded_annotation格式的注释内容。encoded_annotation格式如下：

名称	格式	说明
type_idx	uleb128	注释的类型，指向DexTypeId列表的索引值
size	uleb128	此注解中 name-value 映射的数量
elements	annotation_element[size]	注解的元素，直接以内嵌形式（不作为偏移量）表示。元素必须按 `string_id` 索引以升序进行排序。

annotation_element格式如下：

名称	格式	说明
name_idx	uleb128	元素名称，指向DexStringId列表的索引值
value	encoded_value	元素值

好！现在结合实例来分析。在下图中，红框中为DexAnnotationSetItem结构体（包括的它偏移量），根据该结构体的定义，首先是四字节的数值表示大小，值为0x3，接着就是四字节表示DexAnnotationItem的文件偏移量，值为0xE922A。那么就到了DexAnnotationItem数组了，这里我们只分析annotation_item[0]即可，根据该结构体的定义，第一个字节表示注释的可见性，值为0x2，对应VISIBILITY_SYSTEM，然后就是encoded_annotation格式的数据。根据encoded_annotation格式，第一个字段type_idx采用uleb128编码，那么值就是0x0775，之后按照索引值查找即可，然后第二个字段size也采用uleb128编码，那么值就是0x01，表示只有一个 name-value键值对，接下来是elements字段，采用annotation_element格式，那么采用uleb128编码的name_idx的值为0x5185，接下来的value采用encoded_value编码，根据1.2节所讲的，高3位为value_arg的值（0x1），低5位为value_type的值（0x18），根据value_type的值查表，可知是后面跟随无符号（零扩展）四字节整数值，且是DexTypeId列表的索引，那么只需要在后面取四字节就是value了，值为0x11D，然后根据索引值查找即可。

2.8.1.2 DexFieldAnnotationsItem

/*
 * Direct-mapped "field_annotations_item".
 */
struct DexFieldAnnotationsItem {
    u4  fieldIdx;					/* 指向DexFieldId列表的索引值 */
    u4  annotationsOff;             /* DexAnnotationSetItem的文件偏移量 */
};

DexAnnotationSetItem结构在2.8.1.1中解析过，不再重复。

2.8.1.3 DexMethodAnnotationsItem

/*
 * Direct-mapped "method_annotations_item".
 */
struct DexMethodAnnotationsItem {
    u4  methodIdx;					/* 指向DexMethodId列表的索引值 */
    u4  annotationsOff;             /* DexAnnotationSetItem的文件偏移量 */
};

DexAnnotationSetItem结构在2.8.1.1中解析过，不再重复。

2.8.1.4 DexParameterAnnotationsItem

/*
 * Direct-mapped "parameter_annotations_item".
 */
struct DexParameterAnnotationsItem {
    u4  methodIdx;					/* 指向DexMethodId列表的索引值 */
    u4  annotationsOff;             /* DexAnotationSetRefList的文件偏移量 */
};

其中DexAnotationSetRefList结构体定义如下：

/*
 * Direct-mapped "annotation_set_ref_list".
 */
struct DexAnnotationSetRefList {
    u4  size;							/* 列表中元素个数，即DexAnnotationSetRefItem的个数 */
    DexAnnotationSetRefItem list[1];	/* 第一个DexAnnotationSetRefItem的内容，非偏移量 */
};

/*
 * Direct-mapped "annotation_set_ref_item".
 */
struct DexAnnotationSetRefItem {
    u4  annotationsOff;             /* DexAnnotationSetItem的偏移量 */
};

最终又绕回到了DexAnnotationSetItem结构体。

2.8.2 DexClassData

DexClassData结构的声明在DexClass.h文件中，该结构体的定义如下：

/* expanded form of class_data_item. Note: If a particular item is
 * absent (e.g., no static fields), then the corresponding pointer
 * is set to NULL. */
struct DexClassData {
    DexClassDataHeader header;					/* 指定字段与方法的个数的DexClassDataHeader结构体 */
    DexField*          staticFields;			/* 静态字段，DexField结构体 */
    DexField*          instanceFields;			/* 实例字段，DexField结构体 */
    DexMethod*         directMethods;			/* 直接方法，DexMethod结构体 */
    DexMethod*         virtualMethods;			/* 虚方法，DexMethod结构体 */
};

下面接着分析DexClassDataHeader、DexField、DexMethod结构体。

2.8.2.1 DexClassDataHeader

该结构体定义如下：

/* expanded form of a class_data_item header */
struct DexClassDataHeader {
    u4 staticFieldsSize;					/* 静态字段个数 */
    u4 instanceFieldsSize;					/* 实例字段个数 */
    u4 directMethodsSize;					/* 直接方法个数 */
    u4 virtualMethodsSize;					/* 虚方法个数 */
};

这几个属性都是给DexClassData结构体中的下面四个结构体做辅助，方便定位查找。

请注意！DexClass.h文件中所有结构体的u4类型其实都是uleb128类型！！！

2.8.2.2 DexField

该结构体定义如下：

/* expanded form of encoded_field */
struct DexField {
    u4 fieldIdx;    			/* 指向DexFieldId的索引 */
    u4 accessFlags;				/* 访问标志 */
};

其中accessFlags字段与DexClassDef中的相应字段的类型相同。

2.8.2.3 DexMethod

DexMethod结构体描述方法的原型、名称、访问标志以及代码数据块，该结构体定义如下：

/* expanded form of encoded_method */
struct DexMethod {
    u4 methodIdx;			    /* 指向DexMethodId的索引 */
    u4 accessFlags;				/* 访问标志 */
    u4 codeOff;				    /* DexCode结构的文件偏移量 */
};

其中codeOff指向DexCode结构体，该结构体用于进一步描述方法更为详细的信息以及方法中的指令，该结构体在DexFile.h文件中，定义如下：

/*
 * Direct-mapped "code_item".
 *
 * The "catches" table is used when throwing an exception,
 * "debugInfo" is used when displaying an exception stack trace or
 * debugging. An offset of zero indicates that there are no entries.
 */
struct DexCode {
    u2  registersSize;		/* 使用的寄存器个数 */
    u2  insSize;			/* 参数个数 */
    u2  outsSize;			/* 调用其他方法时使用的寄存器个数 */
    u2  triesSize;			/* try_item的个数 */
    u4  debugInfoOff;       /* 指向调试信息的文件偏移量 */
    u4  insnsSize;          /* 指令集个数，以2字节为单位 */
    u2  insns[1];			/* 指令集，insns 数组中的代码格式由随附文档 Dalvik 字节码指定 */
    /* 如果 triesSize 不为零，下面存在*/
    /* 两字节填充，使下面的try_item实现4字节对齐 */
    /* followed by try_item[triesSize]，用于表示代码中捕获异常的位置以及如何对异常进行处理的数组 */
    /* followed by uleb128 handlersSize */
    /* followed by catch_handler_item[handlersSize]，用于表示“捕获类型列表和关联处理程序地址”的列表的字节 */
};

之后还有更为细节的结构，但不再深入，先在这里停一停。（嵌套的也太深了吧！🥲）

2.8.3 DexEncodedArray

该结构体定义如下：

/*
 * Direct-mapped "encoded_array".
 *
 * NOTE: this structure is byte-aligned.
 */
struct DexEncodedArray {
    u1  array[1];                   /* encoded_array格式的数据 */
};

接下来解释encoded_array格式。

2.8.3.1 encoded_array格式

该格式定义如下：

名称	格式	说明
size	uleb128	表示数组中的元素数量
values	encoded_value[size]	采用encoded_value编码的数据

DexEncodedArray部分比较简单，且encoded_value编码在2.8.1.1小节实例分析过，这里不再重复。

2.9 DexMapList

Dalvik虚拟机解析dex文件的内容后，最终将其映射成DexMapList数据结构。DexMapList由DexHeader中的mapOff字段指明，该结构体的定义在DexFile.h文件中，代码如下：

/*
 * Direct-mapped "map_list".
 */
struct DexMapList {
    u4  size;               /* DexMapItem个数 */
    DexMapItem list[1];     /* DexMapItem数组 */
};

DexMapItem结构体如下：

/*
 * Direct-mapped "map_item".
 */
struct DexMapItem {
    u2 type;              /* KDexType开头的类型 */
    u2 unused;			  /* 未使用，用于字节对齐 */
    u4 size;              /* 类型的个数 */
    u4 offset;            /* 类型数据的文件偏移 */
};

第一个字段type是一个枚举常量，如下所示，通过类型名称很容易判断它的具体类型。

/* map item type codes */
enum {
    kDexTypeHeaderItem               = 0x0000,
    kDexTypeStringIdItem             = 0x0001,
    kDexTypeTypeIdItem               = 0x0002,
    kDexTypeProtoIdItem              = 0x0003,
    kDexTypeFieldIdItem              = 0x0004,
    kDexTypeMethodIdItem             = 0x0005,
    kDexTypeClassDefItem             = 0x0006,
    kDexTypeCallSiteIdItem           = 0x0007,
    kDexTypeMethodHandleItem         = 0x0008,
    kDexTypeMapList                  = 0x1000,
    kDexTypeTypeList                 = 0x1001,
    kDexTypeAnnotationSetRefList     = 0x1002,
    kDexTypeAnnotationSetItem        = 0x1003,
    kDexTypeClassDataItem            = 0x2000,
    kDexTypeCodeItem                 = 0x2001,
    kDexTypeStringDataItem           = 0x2002,
    kDexTypeDebugInfoItem            = 0x2003,
    kDexTypeAnnotationItem           = 0x2004,
    kDexTypeEncodedArrayItem         = 0x2005,
    kDexTypeAnnotationsDirectoryItem = 0x2006,
};

第二个字段size指定了特定类型的个数，它们以特定的类型在dex文件中连续存放。第三个字段offset则是该类型的文件起始偏移地址。

接下来就是结合实例具体分析。在大红框中的是DexMapList结构，可以看出一共有11个DexMapItem，拿map_item[1]进行分析，type的值为0x0001，对应kDexTypeStringIdItem，也就是对应string_id_item类型，size的值为0x52F4（十进制21236），正好与string_id_item的个数相同（图中小红框中，索引值从0开始），最后是offset的值为0x70，对应string_ids结构的文件偏移量。

三、总结

整个dex文件分为dex_header、string_ids、type_ids、proto_ids、field_ids、method_ids、class_defs、map_list这8部分。虽然分了这么多块，但它们之间的联系非常紧密，其中map_list指明了各部分的大小和文件偏移量。所有的常量和字符串都在string_ids中，其他部分需要通过索引在string_ids中查找才能获得具体的数据。

整个dex文件中最复杂的还得是class_defs部分，这部分结构太多了，嵌套的太深了，需要日后在深入该部分，然后补充该部分知识。

目前非常有必要做的事情就是编写dex文件分析代码来加深这方面的知识！

参考：

Dalvik 可执行文件格式 | Android 开源项目 | Android Open Source Project (google.cn)

Dalvik 字节码 | Android 开源项目 | Android Open Source Project (google.cn)

dex文件解析(第三篇)「建议收藏」-腾讯云开发者社区-腾讯云 (tencent.com)

[原创]android中Dex文件结构详解-Android安全-看雪-安全社区|安全招聘|kanxue.com

Android

#Android #dex

dex文件格式解析

http://example.com/2023/11/10/Android安全/dex文件格式解析/

作者