当前位置:网站首页>You're going to learn Python on the sly, and then you'll be stunned (day 10)

You're going to learn Python on the sly, and then you'll be stunned (day 10)

2020-11-10 17:23:16 Python a meow

 

List of articles

  • Preface
  • Welcome to our circle
  • The little reptile kicked the steel plate
  • Mission requirements
  • Demand analysis
  • 1.
  • 2.
  • 3.
  • Scheme 1 :
  • Option two :
  • data type
  • Regular expressions
  • What is regular expression ?
  • Basic grammar
  • Ordinary character
  • demonstration
  • qualifiers
  • Locator
  • choice
  • python Regular expressions
  • solve the problem

Preface

Previous review : You have to learn Python( Ninth days )

It's still this paragraph

 This series of articles default that you have certain C or C++ Basics , Because I learned a little C++ After the fur of Python.
 This series of articles default you will Baidu , Study ‘ modular ’ The words of this module , Or suggest you have your own editor and compiler , The last article has already made a recommendation for you ?

 so what , The catalogue of this series , To be honest, I prefer those two books  Primer Plus, So follow their directory structure .

 This series will also focus on developing your hands-on skills , After all, I can't tell you all the knowledge , So the ability to solve their own needs is particularly important , So I buried holes in the article, please don't regard them as pits , That's the exercise I left you , Please show your powers , Take care of yourself .
1234567

What are you doing today ? Think I'm going to write cookies ? No more than that. , I kicked the steel plate the day before yesterday , Climb down a pile of messy code , I've consulted my predecessors , Use regular expressions .

A lot of time , It's not that you are incompetent , It's because you don't have that insight , I don't have that vision .
So we have to go to different fields , I'm an experienced student 、 teacher 、 The elders ask for advice .

therefore , Here we still want to talk about our study group .

 

Welcome to our circle

If you have difficulties in learning , Looking for one python Learning communication environment , Can join us python circle , Skirt number 947618024, Can claim python Learning materials , It will save a lot of time , Reduce a lot of problems .


The little reptile kicked the steel plate

This is how it happened , Yesterday my little reptile crawled back pitifully , It seems to have been wronged . How can I bear it ? It must be operated like a tiger , So I went to see what was sacred .

Mission requirements

Crawling through the lyrics of Lin Zhixuan , Which song ? Which song did you tell me ? It's good to make your own decision on such a small matter .

Demand analysis

1.

First of all, judge that it is impossible for you to capture directly from the web page , So we opened it network. Why can not ? I must have failed .

2.

There are two pages for lyrics ,
One is a page that hasn't played songs yet :https://y.qq.com/n/yqq/song/001PGGQ81Xxw9l.html
The other is the page where the lyrics are played :https://y.qq.com/portal/player.html

The second page was tried , Not as good as the first page , You can grab it as an exercise .
So we chose the first page :

 

3.

Skip a wave of find operations here , Specific view 《 Ninth days 》

The direct result :

import requests
import json
from bs4 import BeautifulSoup


headers = {
    'origin':'https://y.qq.com',
 #  Source of the request , In this case, we don't need to add this parameter , Just to demonstrate 
 'referer':'https://y.qq.com/n/yqq/song/004Z8Ihr0JIu5s.html',
 #  Source of the request , Carry more information than “origin” Richer , In this case, we don't need to add this parameter , Just to demonstrate 
 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
 #  What device is the request from , What browser 
 }
#  Camouflage request header 
#url = 'https://y.qq.com/n/yqq/song/001PGGQ81Xxw9l.html'
url = 'https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg?nobase64=1&musicid=106678944&-=jsonp1&g_tk_new_20200303=5381&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0'
res_song = requests.get(url,headers = headers)

#  Scheme 1 
soup = BeautifulSoup(res_song.text,'html.parser')
print(soup)

# Option two 

# json_res = json.loads(res_song.text)
# print(json_res['lyric'])
1234567891011121314151617181920212223242526

Excellent , Let's take a look at the results of these two options :

Scheme 1 :

 

Option two :

 


What to do with that ? Fortunately, both of these things can be used as string processing , It would be , Regular .

data type

 

 

 

Regular expressions

good , Now let's look at regular expressions .

What is regular expression ?

Regular expressions (Regular Expression) It's a text pattern , Include normal characters ( for example ,a To z Between the letters ) And special characters ( be called " Metacharacters ").
Regular expressions are described using a single string 、 Match a string that matches a syntax rule .

Basic grammar

Ordinary character

Normal characters include all printable and nonprintable characters that are not explicitly specified as metacharacters . This includes all uppercase and lowercase letters 、 All figures 、 All punctuation and some other symbols .

Character interpretation [ABC] matching […] All characters in , for example [aeiou] Match string “google runoob taobao” All of the e o u a Letter .[^ABC] Match except […] All the characters of the characters in , for example [^aeiou] Match string “google runoob taobao” In addition to e o u a All the letters of the letter .[A-Z][A-Z] Represents an interval , Match all capital letters ,[a-z] Means all lowercase letters .. Match break (\n、\r) Any single character other than , Equivalent to [^\n\r].[\s\S] Match all .\s Is to match all blanks , Including line breaks ,\S Not blank , Including line breaks .\w Match the letter 、 Numbers 、 Underline . Equivalent to [A-Za-z0-9_]

demonstration

 

 

 

 

 

 


qualifiers

Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match . Yes * or + or ? or {n} or {n,} or {n,m} common 6 Kind of .

The qualifiers for regular expressions are :

Qualifier expression * Match previous subexpression zero or more times . for example ,zo* Can match “z” as well as “zoo”.* Equivalent to {0,}.+ Match previous subexpression one or more times . for example ,‘zo+’ Can match “zo” as well as “zoo”, But can't match “z”.+ Equivalent to {1,}.? Match previous subexpression zero or once . for example ,“do(es)?” Can match “do” 、 “does” Medium “does” 、 “doxy” Medium “do” .? Equivalent to {0,1}.{n}n Is a non negative integer . Matched definite n Time . for example ,‘o{2}’ Can't match “Bob” Medium ‘o’, But it matches “food” Two of them o.{n,}n Is a non negative integer . Match at least n Time . for example ,‘o{2,}’ Can't match “Bob” Medium ‘o’, But it can match. “foooood” All in o.‘o{1,}’ Equivalent to ‘o+’.‘o{0,}’ Is equivalent to ‘o*’.{n,m}m and n All non negative integers , among n <= m. Least match n  Times and at most m Time . for example ,“o{1,3}” Will match “fooooood” Top three in o.‘o{0,1}’ Equivalent to ‘o?’. Please note that there cannot be spaces between commas and two numbers .

 

 

 

 

 

 

 

 

 

Locator

Locators enable you to fix regular expressions to the beginning or end of a line . They also enable you to create regular expressions like this , These regular expressions appear in a word 、 At the beginning of a word or at the end of a word .

Locators are used to describe the boundaries of strings or words ,^ and $ Refers to the beginning and end of a string ,\b Describe the front or back boundary of a word ,\B Indicates a non word boundary .

The locators of regular expressions are :

Character description ^ Matches where the input string starts . If set RegExp Object's Multiline attribute ,^ Also with \n or \r Position matching after .$ Matches the position of the end of the input string . If set RegExp Object's Multiline attribute ,$ Also with \n or \r Previous position match .\b Matches a word boundary , That is, the position between words and spaces .\B Non word boundary matching .

Be careful : Cannot use qualifier with locator . Because there cannot be more than one position immediately before or after the line feed or word boundary , Therefore, such as ^* Expressions like that .
To match the text at the beginning of a line of text , Please use at the beginning of regular expression ^ character . Don't put ^ This usage of is confused with the usage within the bracket expression .
To match the text at the end of a line of text , Use at the end of the regular expression $ character .

choice

Use parentheses () Enclose all the options , Use... Between adjacent options | Separate .

python Regular expressions

A regular expression is a special sequence of characters , It can help you easily check whether a string matches a certain pattern .
Python since 1.5 Version has been added re modular , It provides Perl Style regular expression pattern .
re Module enable Python The language has all the regular expression functions .

Portal


Come back and practice again , Not very skilled yet ...


solve the problem

I have a skull ache today , Just put the code directly , It's a little bit flawed , Let's say that :‘ word ’ And the name of the writer should be used : Separate ,‘ song ’ In the same way .

import re
import requests
from bs4 import BeautifulSoup

headers = {
    'origin':'https://y.qq.com',
 #  Source of the request , In this case, we don't need to add this parameter , Just to demonstrate 
 'referer':'https://y.qq.com/n/yqq/song/004Z8Ihr0JIu5s.html',
 #  Source of the request , Carry more information than “origin” Richer , In this case, we don't need to add this parameter , Just to demonstrate 
 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
 #  What device is the request from , What browser 
 }
#  Camouflage request header 

url = 'https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg?nobase64=1&musicid=106678944&-=jsonp1&g_tk_new_20200303=5381&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0'
res_song = requests.get(url,headers = headers)

#  Scheme 1 
soup = BeautifulSoup(res_song.text,'html.parser')
#print(soup.text)

pat=re.compile(r'[\u4e00-\u9fa5]+')
result=pat.findall(soup.text)
result = '\n'.join(result[5:])
print(result)

One more sentence at the end , Want to learn Python Please contact Xiaobian , Here's my own set python Learning materials and routes , Anyone who wants this information can enter q skirt 947618024 receive .

The material of this article comes from the Internet , If there is infringement, please contact to delete .

版权声明
本文为[Python a meow]所创,转载请带上原文链接,感谢