当前位置:网站首页>JVM -- class file structure (1)

JVM -- class file structure (1)

2020-12-07 17:49:42 osc_ s7fsyuo1

I don't know if you are right about .java After the file is compiled .class The papers were curious about .

We all know Java Medium class The document goes through Java The compiler to Java Class file is compiled . I think there are more than a few C Programmers are learning Java After that, they will roughly think that C The program is compiled to produce .out File with the .class The documents are roughly the same in all respects , I started to confuse myself like this , But with the deepening of learning , We have to figure out .class What is a document .

So today I'll share with you Java Medium .class What is the document , What exactly is stored in it ~

by the way , If you use IDEA This environment integrates development tools to view .class Word of the file , You'll find that it's no different from the source code , Because it will .class The file was decompiled by the way .


.class Document and .out The difference of documents

To understand the difference between the two documents , We first need to understand the definition of the two documents .

.class file

java The compiler is compiling java Class file , The original text file will be (.java) Translate into Binary bytecode , And store these bytecodes in .class file .

in other words java Properties in class files 、 Method , And the constant information in the class , Will be stored separately in .class In file .

From this passage, we extract the key points :.class Files are binary bytecode . from JVM distinguish 、 analysis 、 perform .

Being able to read bytecode directly is also in work analysis Java Necessary tools and basic skills for code semantic problems .


.out file

C Language source (.c file ), Compiled by compiler , Generated by source code Machine instructions , And add descriptive information , Save in .out file ( Executable file ) in . Executable files can be loaded and run by the operating system , The computer executes the machine instructions in the file .

From this passage, we extract the key points :.out Files are binary machine instructions . Load and run by the operating system .

At this point, the difference between the two files is very obvious : First of all, although both files are binary , But the way it's stored is completely different , One is bytecode , One is machine instructions . And then it runs on different platforms , One is the operating system , One is the virtual machine .


.class Meaning of document

Understand the above paragraph , Although we already know the difference between the two documents in essence , But it still doesn't feel any difference when using it , Both are executable files , What's the difference between bytecode and machine instruction ?

From the first lesson on computers , The teacher kept telling us ,“ Computers only know binary data , Inside the computer , The nature of its operation , It's just a bunch of 010101010101… …”, This string 010101… … Machine instructions are machine instructions , So the operating system can handle .out File to load , function .

What is bytecode ? This is the time to think , You can think about Java What are the advantages of . Did you remember in Java There is a saying in the world ,“ Write once , Run anywhere ”. you 're right , Bytecode is the cornerstone of platform independence .

Java Programs are compiled on a variety of different platforms, but all generate the same bytecode , These bytecodes are generated by JVM Loading , function . This unified program storage format , So that Java Cross platform .

Say something out of the question , Another big neutral feature of virtual machine has been paid more and more attention by developers —— Language independence , in other words Java Virtual machines are more than just executable Java Program , image JRuby,Groovy Other languages can be found in Java Running on a virtual machine .


.class The overall structure of the document

If you want to know .class The specific content stored in the file , First of all, we should be right .class A comprehensive understanding of the storage structure of a file . Before that, of course , Let's go first .class File to make a detailed definition .

Any class or interface corresponds to a unique .class file , Specifically, it is shown in the figure below :( Steal other people's pictures )

 Picture description here

Class A stream is based on a set of bytes , The data items are arranged in a compact way Class In file . And the way data items are stored is similar to the big end mode , Do not understand the big end mode of their own Baidu .

Class There are only two data types in the file structure : Signed numbers and tables . So it's not complicated .

An unsigned number is the basic data type , We use u1,u2,u4,u8 To express separately 1 Bytes ,2 Bytes ,4 Bytes ,8 Bytes . Unsigned numbers can be used to describe numbers , Index reference , Quantity value or UTF-8 Encoded string . If you feel abstract from the above sentence , Don't worry. , Look back at the end , You will find that the doubts have solved themselves .

A table is a composite data structure composed of several unsigned numbers or other tables as data items , Get used to “_info” ending .Class A document is essentially a table .

The following post Class File format :

 Picture description here

Okay ,Class The basic introduction of the document has been completed . So let's take a look and learn more about it Class Class file structure , Which aspects should we start from .

Corresponding to the above picture , We need to focus on magic numbers 、Class Document version 、 Constant pool 、 Class index , Parent index , Interface index set 、 Field table set 、 Method table set 、 Property sheet set, etc Class An important part of the document . Explanation of property sheet collection , I'll put it in the second one .


Magic number and Class Document version

To learn Class The structure of the document , There must be a Class Files for us to analyze . So let's look at a simple Java Code and compile it , This code and the resulting Class After the file, it will be used all the time , So don't look at the past and forget ~~

public class TestClass {
    private int m;

    public int inc() {
        return m+1;
    }
}

We are Ubuntu 16.04 Next use GHex16 Binary text editor for the compiled .class File to view . Here's the picture :

 Picture description here

magic number Is each Class The first four bytes of the file , The purpose of this paper is to determine whether the file can be accepted by the virtual machine Class file . Its value is also very easy to remember , Full of romance :CAFEBABE( Coffee baby ?), and Java Of logo There seems to be a connection ~~

Just follow the magic number 4 Bytes of storage is Class Version number of the document :5、6 Byte is the minor version number ,7、8 Byte is the major version number , The purpose of these four bytes is to let us distinguish the current JDK edition , The high version of the JDK Can be downward compatible with lower versions , On the other hand, you can't .

Generally speaking, we don't need to care too much about the minor version number , Version number conversion corresponds to JDK The general steps are as follows :

- Convert the major version number to 10 Base number
- Subtract from the version number 45(JDK Version number from 45 Start ) Plus 1

For example, in the picture above, my minor version number is 0x00, The main version number is 0x34, Convert decimal to 52, be JDK Version is 52-45+1, That is to say 8, So my current JDK Version is JDK1.8, There is no mistake .


Constant pool

Want to say Class What are the important components of the document , I think it must be a constant pool ( Other items are associated with the most data types ,Class One of the largest data projects in file space , Table type data items ) And property sheet , Let's talk about the property sheet next time .

Java The constant pool in the virtual machine runtime method area is after the class is loaded into memory .class Constant pool in file .


Constant pool capacity count

At the entrance to the constant pool is, first of all, a term u2 type ( Unsigned ,2 Bytes ) The data of , Represents the constant pool capacity meter value , This value is set because the number of constants in the constant pool is not fixed . You can see this in the picture above Class The constant pool capacity of the file is 0x0013.

It is worth mentioning that , The capacity count in the constant pool is from 1 Start not from 0 Start ,Class In the file, only the capacity count of the constant pool is from 1 Start . The purpose of this design is to satisfy some of the following Data pointing to the index value of the constant pool It is necessary to express “ Don't reference any constant pool project ” The meaning of .

So my Class The file constant pool capacity is converted to 10 Into the system for 19, It's just 18 Item constant , Index value range 1~18.


Constant pool stores item types

It's about constant pool capacity , Then we need to analyze the contents of the constant pool , Before analyzing what is stored in the constant pool , We need to introduce the storage type of constant pool .

Two major constants are stored in the constant pool : Literal and symbolic references .

The literal amount is close to Java The concept of constant at the linguistic level , Such as text string ,final Constant value, etc .

Symbolic references contain the following three types of constants ( We'll talk about fully qualified names and descriptors later ):

  • Fully qualified names of classes and interfaces
  • Name and descriptor of the field
  • The name and descriptor of the method

Javap command

Some people say , Then, if we let us read Class Isn't bytecode in a file exhausting , don 't worry , This is an era of efficiency , It's no longer the era when programmers have to punch holes one by one when writing programs .

We can use Javap The command uses the computer to output the table of constants .

Let's take a look at usage :

//TestClass  It's above us TestClass.java The result of compiling a file Class file 
javap -verbose TestClass

Take a look at the output :( Information outside the constant pool is omitted )

Classfile /home/hg_yi/ In depth understanding of Java virtual machine / Class file structure /TestClass.class
  Last modified 2017-10-20; size 275 bytes
  MD5 checksum 4bb559d0c40918dfedd533c18bd75add
  Compiled from "TestClass.java"
public class TestClass
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #4.#15         // java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#16         // TestClass.m:I
   #3 = Class              #17            // TestClass
   #4 = Class              #18            // java/lang/Object
   #5 = Utf8               m
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               inc
  #12 = Utf8               ()I
  #13 = Utf8               SourceFile
  #14 = Utf8               TestClass.java
  #15 = NameAndType        #7:#8          // "<init>":()V
  #16 = NameAndType        #5:#6          // m:I
  #17 = Utf8               TestClass
  #18 = Utf8               java/lang/Object

We talked about that before Class There are many data items in the file that reference constants in the constant pool , So the output above will be used later , Don't forget ~~

There is a lot of information in the output above that we have just analyzed Class The results of the document agree with :

1. Minor version and major version number

minor version: 0
major version: 52

2. Index value range

#1——#18

3. Constant pool project type analysis

#1 = Methodref          #4.#15         // java/lang/Object."<init>":()V

The code above indicates that the first constant points to 4 And the 15 Item constant .

as for java/lang/Object."<init>":()V This thing , About the fully qualified names and descriptors of the classes we mentioned earlier , Wait a minute .


Access signs

After the constant pool ends , Only the next two bytes represent the access flag , Information used to identify some classes or interface levels , Include : This Class Class or interface ; Is it defined as public type ; Is it defined as abstract type ; If it is a class , Is it declared as final etc. .

The specific signs and meanings are as follows :

 Picture description here

Use... From above Javap After the command outputs the constant table structure, there is a line of code :

flags: ACC_PUBLIC, ACC_SUPER

Let's look at the access signs to see , This flags The description of the class is correct , So it's access_flags The value should be :0x0001|0x0020=0x0021. If you correspond to the uppermost primitive bytecode , You will find that this is consistent with the two bytes displayed after the constant pool .


Class index 、 Parent index 、 Interface index set

After the visit sign , It's the class index and the parent class index 、 The interface index sets . Class index 、 The parent index is a u2 Data of type , The interface index set is a set of u2 Type of data set ,Class The file uses these three data to determine the inheritance relationship of this class .

  • The class index determines the fully qualified name of the class ;
  • The parent class index determines the fully qualified name of the parent class of this class ;
  • The interface index set determines which interfaces the class implements , The implemented interface will press implements The interface order after the statement is arranged in the interface index set from left to right .

Tips : because Java It's a single inheritance , And all of Java Class has Object The parent class , except java.lang.Object In itself , So except for Object,Java The parent index of the class is not 0.

We continue to use the example we started with :

 Picture description here

The bytecode I marked in the figure above is 0x0003、0x0004、0x0000, That is to say, the class index is the number in the constant pool 3 Constant 、 The index of the parent class is... In the constant pool 4 Constant . The interface index is a little different , The first entry in the interface index —u2 The data of type is interface counter , Indicates the capacity of the index table . As I said just now , The value of the interface counter is 0, The index table of the subsequent interface no longer takes up any bytes .

Combined with what we just Javap Output of command :

  #3 = Class              #17            // TestClass
  #4 = Class              #18            // java/lang/Object

  #17 = Utf8               TestClass
  #18 = Utf8               java/lang/Object

You can see the 3、4 Constant points to 17、18 Constant , And their values are UTF-8 Format TestClass and java/lang/Object.

This part of the analysis is finished .


Field table set

Introduction to the field table

This table is used to describe the variables declared in an interface or class .

Fields include class level variables (static) And instance level variables , But it doesn't include local variables .

What information does the field contain ?

Field scope (public、private、protected)、static、final、volatile、transient( serialize )、 Field data type 、 Field name .

Except for the field data type 、 The byte length of the field name cannot be fixed, but it needs to refer to the content in the constant pool , Other modifiers are very suitable for the use of flag bits .

Therefore, the field table mainly stores the following information :

Index of names 、 Descriptor index 、 Access signs 、 Property sheet

As for the back attributes_info Additional information used to describe a field , Such as :final static int m = 123;, There will be one in the field table ConstantValue Properties of , Its value points to a constant 123.

Next, I will introduce what is called “ Simple name 、 All names are limited 、 The descriptor ”.


Simple name 、 All names are limited 、 The descriptor

Let's go back to the other data in the field table structure : Name index and descriptor index . They are all References to constant pools , It represents the simple name of the field and the descriptor of the field and method .

The simple name refers to the method or field name without type and parameter modification , In this class inc() Methods and m The simple names of the fields are “inc” and “m”.

Descriptors are relatively complex , And the question we left on it , It's about descriptors . Descriptors are used to describe the data types of fields 、 Method ( Include quantity 、 type 、 The order ) And return values .

Basic data types and void All in one uppercase character , And the object type uses the character L Add the fully qualified name of the object to indicate .

 Picture description here

For array types , Each dimension will use a leading “[” Character to describe , For example, define a “java.lang.String[][]” Two dimensional array of type , Will be recorded as :“[[Ljava/lang/String;”, Speaking of this, we have to mention the expression of fully restricted names .

As the name suggests, full name is the complete name , But its representation is different from what we usually write , For example, the fully qualified name of our initial test class is ”org/fenixsoft/clazz/TestClass;”, It is the name of the class in the full name of “.” Replace with “/”, And add... At the end “;” And it means the end of a fully qualified name .


Use of descriptors

Pre parameter list 、 Then return the value .

You don't know what ? Let's look directly at the example :

void inc()
java.lang.String.toString()
int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex)

Descriptor language :

()V
()Ljava/lang/String;
([CII[CIII)I

The example analysis

Class index , Parent index 、 After the interface index set is the field table , Its first u2 Type data is capacity counter field_flags. From the figure above, we can see that the value is 0x0001, That is, there is only one field , Next u2 That is access_flags sign … … By analogy , Fixed data item analysis of field table is completed .

Be careful : Last , The field table collection does not list fields inherited from the parent class , But it's possible to list the original Java Fields that don't exist in the code , For example, in order to maintain the accessibility of external classes in internal classes , Fields that point to external class instances are automatically added .

And for bytecode , If the descriptors of the two fields are inconsistent , The duplicate name of that field is legal . This is in Java It's obviously impossible inside .


Method table set

The method table is very similar to the field table , Here I give the access flag for the method table :
 Picture description here

I will not repeat the analysis of the method table , The results are basically the same as the previous analysis . Let's take a look at some of the areas that need attention in the method table :

  • The code in the method is stored in “Code” Properties of the , We'll talk about it in the next section .
  • There is also an entry to the method table collection u2 Type of counter capacity data .
  • Methods added automatically by the compiler may also appear in the method table , Such as <clinit> Method .
  • stay Class In file , Methods with inconsistent descriptors can coexist , That is to say Class Even if only the return value in the file is different , Heavy load is also a kind of heavy load .

版权声明
本文为[osc_ s7fsyuo1]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201207174810370v.html