当前位置:网站首页>Learning experience of handwritten JSON parser

Learning experience of handwritten JSON parser

2020-12-08 08:51:38 Bacteriostasis

 Oh ~ from "{" Start , It seems to be an object !

One . Introduce

A week ago , My old classmate Ali gave me a piece of Zhihu's answer , The standard of testing whether a language is mastered or not is to realize a Json Parser , Netease game in the past Python One of the entry training assignments is to achieve one in five days Json Parser .

Zhihu answer --- link

The answer to the corresponding question refers to an open source “ From scratch JSON Library Tutorial ”, It happens that I'm just starting to learn go Language , Yes Json The understanding of the end-to-end interaction data format only remains , So I wrote the course again , Benefit a lot , At least it's very helpful for someone with little programming experience like me , Here is my learning experience .

From scratch JSON Library Tutorial address --- link

Self realization --- link

Two . Overall gains

1. Testing and refactoring

In fact, at the beginning of the contact with programming , I often hear that I have to write tests for my own code , But I haven't learned the relevant methodology , I don't know how to practice , I didn't realize the importance of testing until I was an intern in the company . Every time I finished writing a function , My tutor will ask me to make data for testing , From what I understood at the time , The purpose of writing test cases is to cover users' behaviors as much as possible , Ensure the stability of system operation .

But after going through this door Json After the parser tutorial , I have a further understanding of writing test cases . This tutorial introduces in detail a method called TDD Development model , Chinese is test driven development , And implement it from unit 1 .

in my opinion , Writing tests first and then developing can help us identify the features we want to develop , Reduce the possibility of us taking detours . But sometimes it's not easy to plan ahead , The test may not be easy to write , At this time, it will be easier for us to develop the functions first . The authors also recommend that we use both styles in actual development , To achieve a balance .

Tell the truth , At the beginning, when I saw that my code passed all the tests smoothly, I felt a sense of accomplishment . But as the course goes on , I've found that complete testing doesn't just give me a sense of accomplishment , It's more of a sense of security .

Because as the function of the parser increases , Our code will have some common modules , In order to improve versatility , We need to refactor , And complete unit testing is an important guarantee for us to boldly refactor .

in addition , Because the language used in the tutorial is different , Some places need to be written according to their own understanding , You can't copy everything , A lot of places didn't start out well . What impresses me most is that at the beginning we have to analyze null,false,true, Numbers and strings , These are all single functions , It's not very difficult to go through separate test cases . But when we're going to parse arrays , Because there are multiple values in the array , And there may be nested arrays , At this time, it is necessary to ensure that the resolution of a single value does not affect the global resolution .

I encountered a lot of problems when I was doing array parsing , It's basically due to the imperfection of the single value parsing code . Fortunately, I wrote enough test cases after the tutorial , Support me to write the entire array parsing function correctly , From then on, I fell in love with writing unit tests .

2. C The charm of language

The tutorial is standard C Written language , The author himself is C/C++ The bull of , remarkable craftsmanship . Although I am C I don't know much about language , But follow the instructions in the tutorial to read C There's no big problem with the code .

The tutorial is about C There are many knowledge points about language , For example, the definition of macro , The allocation and release of memory , Memory leak detection, etc , What I admire most is the author's use of the pointer , It's so delicate . although Go There are also pointers in language , But in the process of doing this tutorial ,Go More often than not, the pointer is only used for address . And because the pointer is not so powerful , It's not easy for me to implement a general powerful stack like the author , For staging Json The analytical content of . but Go Language has a powerful Slices, It's also great to use , Very convenient .

Since the same function is implemented in different languages , Then we must give full play to the advantages of our language , This is also the key to getting started with a language through a project .

3、 ... and . The harvest of each stage of the project

1. Launch/The Start

In the first chapter, my biggest achievement was to figure out the structure of the parser .

At the beginning of the project , Let's start with a simple testing framework , For example, the number of tests passed , The number of failed passes and error messages are printed out , It is convenient for you to observe the passing of the test .

And then you need to define Json The data structure of the parser , Once the data structure is defined , The software is half done . Here we will use a tree structure to organize the data we parse , Each data is stored in a node , What we need to do is to define this node .

according to Json agreement ,Json Altogether 7 Type of data :

object, array, string, number, "true", "false", "null"

In order to distinguish which data type a node is , We need to add a type Field , Used to identify the type of node ,type We can maintain it with an enumeration . At the same time, prepare a received field for various data types ( For the convenience of handling , Not for true/false/null Set fields ).

type EasyValue struct {
	vType int // Node data type 
	num   float64
	str   []byte
	len   int
	e     []EasyValue
	o     []EasyObj
}

After the data structure is set up , The framework of our parser is very clear :

  1. Pass in a Json character string , Create a root node , And use the parser to parse , Specifically, it's character by character analysis .
  2. Suppose it's a number , Then set the number type of the root node to number , And put the parsed number into the node's num Field .
  3. When we want to get the result of the parsing , Just get the corresponding value according to the data type of the node to the corresponding field of the node .

That's the whole thing Json The idea of the parser , After laying such a foundation in the first chapter , There is a holistic view of the overall situation , The following sections are based on various data types for parsing .

2. Parsing numbers

When parsing numbers , The author chose to call the library function of string to number directly , Because the receiving domain of library function is relatively wide , Some error situations need to be dealt with in advance , On the whole, it's easy to realize .

But in the process of processing, I came across a Go The more difficult problem in language : When we call the library function of string to number , It's possible to make mistakes , There are usually two kinds of mistakes , One is that the numbers are illegal ( This string is not a number ), The other is digital overflow . In other languages, you can judge error types very well , Then the corresponding error code is returned to the client . however Go Language is more concise in dealing with errors , It only provides one error Interface , Only one in the interface string Field is used to indicate the error message . This means that if two errors are thrown in a function at the same time , You have to use the error message to determine what went wrong . Specifically, the error type can be identified by judging whether a string contains another string , It seems a bit rustic .

f, err := strconv.ParseFloat(convStr, 64)
if err != nil {
	if strings.Contains(err.Error(), strconv.ErrRange.Error()) {
		return EASY_PARSE_NUMBER_TOO_BIG
	}
	return EASY_PARSE_INVALID_VALUE
}

Google still doesn't seem to have a particularly good solution , The existing open source solutions and the official solutions are basically to encapsulate the errors , But it doesn't seem to work for library functions . Or maybe I just started using go Language , Less experience , I have to pay attention to this problem in future use .

3. Parse string - 4. Unicode

And then it comes to parsing strings , In this chapter, I was convinced by the author's pointer manipulation , But when it comes to self realization , Discovery use Go Of Slices It seems very simple to achieve , I don't know if the performance is bad .

The biggest gain in this chapter is , Getting started Unicode code . Programming used to be a shuttle , Code this knowledge, and you'll jump at a glance , Out of the code on Google solutions , I didn't think about the knowledge behind it . But here we have to deal with the conversion of characters , Our goal is to store strings as UTF-8 In the form of , The relationship behind it has to be clear .

The first time I used it was ASCII code ,ASCII Code only 7 position , That is to say 128 Characters . But there are too many characters in the world ,128 It's not enough , This is the time to come out Unicode code .Unicode The code records thousands of characters , But it also means it needs more storage space ,Unicode The abbreviation of the transformation form of is our common UTF, and UTF-8 That is to say Unicode With 8 Bits are stored for a unit .

With all this pre knowledge , So we need to check the string Unicode Code to convert , The specific process is to put Unicode The character is converted to the corresponding symbol ( Hexadecimal number ), Then encode the hexadecimal number as UTF-8 In the form of .

According to the tutorial, I have a preliminary understanding of coding , Feel good , This is probably the pleasure of knowledge ^_^

5. Parsing arrays - 6. Parse object

When it comes to parsing arrays and objects , I feel the power of recursion , This may be why the author calls it a recursive descent parser .

But in this part , My biggest gain is that I deeply appreciate the benefits of unit testing . When parsing arrays , We probably need to parse multiple types of values , At this time, we will string the parsing functions previously implemented separately .

For example, such a string :

"[123,null,\"abc\",[1,2,3]]"

First you need to parse 123, And then parse null, After we finish parsing 123 When , The pointer should come to , The location of , adopt , After the partition, continue to parse the next value . I remember that I didn't handle the position of the pointer when I was parsing a single value , This causes the whole array parsing to fail , But it also deepened my understanding of the whole thing Json Understanding of string parsing process .

So far the whole Json The function of the parser has been basically completed , The next two sections are about generator and parse object access and other functions .

Four . summary

This tutorial is to use C Written language , The author used a lot of C Characteristics of language , It can improve performance very well , And I just started Go Language , Yes Go Little is known about the characteristics of , Maybe it doesn't work in some places Go Language processing to deal with .

And in our daily development , It's usually used this way Json: Transform a custom data structure into Json strand , Or put Json String conversion to our custom structure , I haven't implemented this function yet . And for such a function ,Go Language provides native support .

I took a look Go Native parsing Json Source code , There are many similarities between the course and the analytical idea . The big difference is : In our handwriting Json The parser , We store the parsed data in our custom node structure . And in the Go In language , because Json Usage scenarios of are often associated with structures ,Go Language will assign the analytic value to the corresponding structure directly through reflection , In this way, the steps of self built data structure are omitted .

Finally, thank you very much for this tutorial , Let me be right about Json We have a preliminary understanding of it , A deeper understanding of testing and refactoring , At the same time, I have achieved my original intention , Be able to use with familiarity Go Language write branch loop judgment . But I know Go This is not the charm of language , There are many features waiting for me to learn , Continue refueling ~

版权声明
本文为[Bacteriostasis]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/202012080851151585.html