当前位置:网站首页>I'm a programmer, and I remember history in this way

I'm a programmer, and I remember history in this way

2020-12-07 10:20:25 kokohuang

War Of Resistance Live: Record 14 Day and night of the war of resistance against Japan

Open source address https://github.com/kokohuang/WarOfResistanceLive

Preview address https://kokohuang.github.io/WarOfResistanceLive

Preface

In the current impetuous Internet Environment , It's not hard to do a good thing , The hard part is continuity 8 To do something meaningful .

There is such a blogger on Weibo , from 2012 year 7 month 7 The day begins , Up to 2020 year 9 month 2 Japan ,@ Live broadcast of the Anti Japanese War In graphic form , Recorded from 1937 year 7 month 7 solstice 1945 year 8 month 15 Japan The history of the Chinese nation's all-round war of resistance against Japan .2980 God , Uninterrupted , Average daily 12 strip , Cumulative 35214 piece .

2020 year 9 month 18 Japan 7 Time zero 7 branch , Silent for half a month @ Live broadcast of the Anti Japanese War Restore updates , They will continue to record in graphic form 1931 year 9 month 18 solstice 1937 year 7 month 7 Japan The six-year history of the war of resistance against Japan .

next 6 year , They are already on their way .

History can't be forgotten .

As a programmer, I , In front of history , What can I do ?

Admiration except @ Live broadcast of the Anti Japanese War So many years of persistence , I want to do something meaningful that I can do .

After getting the blogger @ Live broadcast of the Anti Japanese War After the permission and support of , So there was the birth of this project .

War Of Resistance Live

├── .github/workflows #  Workflow configuration file 
├── resources #  Microblog data 
├── site #  Blog source code 
└── spider #  Micro blog crawler 

WarOfResistanceLive Is a major result of Python Reptiles + Hexo Blog + Github Actions Open source project composed of continuous integration services , Open source in GitHub On , And deployed in Github Pages. Currently it includes the following functions :

  • Automatically and automatically update the data every day
  • View all of the blogger's current microblog data
  • Support RSS Subscribe to the function
  • be based on Github Actions Continuous integration services
  • ...

Next , I will briefly introduce some of the core logic and implementation of the project .

Python Reptiles

The crawler used in this project is based on weibo-crawler Project simplification and modification implementation ( For research purposes only ), Thanks to the author dataabc.

Realization principle

  • Bypass the login verification by visiting the mobile version of Weibo , You can view most of the microblog data of a blogger , Such as :https://m.weibo.cn/u/2896390104
  • Through the developer tool to see , adopt json Interface https://m.weibo.cn/api/container/getIndex You can get the microblog data list :

    def get_json(self, params):
        """ Get the web page json data """
        url = 'https://m.weibo.cn/api/container/getIndex?'
        r = requests.get(url,
                         params=params,
                         headers=self.headers,
                         verify=False)
        return r.json()

How to use

Installation dependency :

pip3 install -r requirements.txt

Use :

python weibo.py

matters needing attention

  • Too fast is easy to be limited by the system : Random logic can be added by waiting , It can reduce the risk of being limited by the system ;
  • Can't get all microblog data : By adding cookie Logic gets all the data ;

More on weibo-crawler.

Hexo

After some choice , Final choice Hexo + Next Theme as the blog framework of this project .

Hexo It's based on Node.js Static blog framework of , Less dependence, easy to install and use , Can easily generate static web hosting in GitHub Pages On , There are plenty of topics to choose from . About how to install and use Hexo You can check the official documents in detail :https://hexo.io/zh-cn/docs/.

that , How to achieve RSS What about the subscription function ?

Thanks to the Hexo Rich plug-in features ,hexo-generator-feed It can be very convenient for us to achieve .

First , Install the plug-in under the root directory of the blog :

$ npm install hexo-generator-feed --save

next , In the blog root directory _config.yml Add relevant configuration to the file :

feed:
  enable: true #  Is plug-in enabled 
  type: atom # Feed The type of , Support  atom  and  rss2, Default  atom
  path: atom.xml #  Path to build file 
  limit: 30 #  Maximum number of articles generated , If  0  or  false  Then generate all the articles 
  content: true #  If  true  Show all the contents of the article 
  content_limit: #  The length of the content presented in the article , Only when the  content  by  false  It works 
  order_by: -date #  Sort by date 
  template: #  Custom template path 

Last , Under the theme root _config.yml Add... To the file RSS Subscription portal :

menu:
  RSS: /atom.xml || fa fa-rss # atom.xml File path address and icon settings 

such , We can add RSS Subscribe to the function .WarOfResistanceLive Your subscription address is :

https://kokohuang.github.io/WarOfResistanceLive/atom.xml

Github Actions Continuous integration

Github Actions By Github On 2018 year 10 month Continuous integration services , Before that , We may use more Travis CI To achieve continuous integration services . In my personal sense ,Github Actions Very powerful , Than Travis CI More playable ,Github Actions Rich in action market , Will these action combined , We can do a lot of interesting things very simply .

Let's take a look first Github Actions Some basic concepts of :

  • workflow: Workflow . That is, the process of continuous integration into one operation . The document is stored in the warehouse .github/workflows Directory , Can contain more than one ;
  • job: Mission . One workflow May contain one or more jobs, It represents an integrated operation , It can accomplish one or more tasks ;
  • step: step . One job By multiple step form , It represents the steps needed to complete a task ;
  • action: action . Every step It can contain one or more action, That means within a step , Can execute multiple action action .

I understand Github Actions After these basic concepts , Let's see WarOfResistanceLive How is the continuous integration service implemented for , The following is used in this project workflow Complete implementation :

# workflow  The name of 
name: Spider Bot

#  Set time zone 
env:
  TZ: Asia/Shanghai

#  Set workflow trigger mode .
on:
  #  Timing trigger , stay  8:00-24:00  Every  2  Update every hour (https://crontab.guru)
  #  because  cron  The set time is  UTC  Time , therefore  +8  Beijing time 
  schedule:
    - cron: "0 0-16/2 * * *"

  #  Allow manual triggering  Actions
  workflow_dispatch:

jobs:
  build:
    #  Use  ubuntu-latest  As the operating environment 
    runs-on: ubuntu-latest

    #  The sequence of tasks to be performed 
    steps:
      #  Check out warehouse 
      - name: Checkout Repository
        uses: actions/checkout@v2

      #  Set up  Python  Environmental Science 
      - name: Setup Python
        uses: actions/setup-python@v2
        with:
          python-version: "3.x"

      #  cache  pip  rely on 
      - name: Cache Pip Dependencies
        id: pip-cache
        uses: actions/cache@v2
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('./spider/requirements.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-
      
      #  install  pip  rely on 
      - name: Install Pip Dependencies
        working-directory: ./spider
        run: |
          python -m pip install --upgrade pip
          pip install flake8 pytest
          if [ -f requirements.txt ]; then pip install -r requirements.txt; fi

      #  Run the crawler script 
      - name: Run Spider Bot
        working-directory: ./spider  #  assign work directory , Only on  run  The order comes into effect 
        run: python weibo.py

      #  Get the current time of the system 
      - name: Get Current Date
        id: date
        run: echo "::set-output name=date::$(date +'%Y-%m-%d %H:%M')"

      #  Commit changes 
      - name: Commit Changes
        uses: EndBug/add-and-commit@v5
        with:
          author_name: Koko Huang
          author_email: huangjianke@vip.163.com
          message: " The latest data has been synchronized (${{steps.date.outputs.date}})"
          add: "./"
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      #  Push the far end 
      - name: Push Changes
        uses: ad-m/github-push-action@master
        with:
          branch: main
          github_token: ${{ secrets.GITHUB_TOKEN }}

      #  Set up  Node.js  Environmental Science 
      - name: Use Node.js 12.x
        uses: actions/setup-node@v1
        with:
          node-version: "12.x"

      #  cache  NPM  rely on 
      - name: Cache NPM Dependencies
        id: npm-cache
        uses: actions/cache@v2
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-${{ hashFiles('./site/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-node-

      #  install  NPM  rely on 
      - name: Install NPM Dependencies
        working-directory: ./site
        run: npm install

      #  structure  Hexo
      - name: Build Hexo
        working-directory: ./site #  assign work directory , Only on  run  The order comes into effect 
        run: npm run build

      #  Release  Github Pages
      - name: Deploy Github Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./site/public #  Specify the path address to be published 
          publish_branch: gh-pages #  Specify the remote branch name 

workflow There are many configuration fields in the file , Detailed comments are also given in the configuration file . Next , Let's mainly look at the following more important configurations :

How workflow is triggered

#  Set workflow trigger mode .
on:
  #  Timing trigger , stay  8:00-24:00  Every  2  Update every hour (https://crontab.guru)
  #  because  cron  The set time is  UTC  Time , therefore  +8  Beijing time 
  schedule:
    - cron: "0 0-16/2 * * *"

  #  Allows manual triggering of workflow 
  workflow_dispatch:

We can use on Workflow syntax configures a workflow to run for one or more events . Support automatic and manual trigger .schedule Events allow us to trigger workflow at a planned time , We can use POSIX cron grammar To schedule the workflow to run at a specific time .

The scheduled task syntax has five fields , Space between , Each field represents a time unit :

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of the month (1 - 31)
│ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
│ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
│ │ │ │ │                                   
│ │ │ │ │
│ │ │ │ │
* * * * *

We can use https://crontab.guru To generate the scheduled task syntax , You can also check out more crontab guru Example .

in addition , We can also configure workflow_dispatch and repository_dispatch Field to trigger workflow manually .

on Fields can also be configured as push, The warehouse has push When the operation is performed, the execution of the workflow is triggered , Detailed trigger workflow configuration can be viewed Configure workflow Events .

Sequence of steps

From the configuration file, we can see that , A continuous integration run of the project includes the following steps :

image

Check out warehouse --> Set up Python Environmental Science --> cache pip rely on --> install pip rely on --> Run the crawler script --> Get the current time --> Commit changes --> Push the far end --> Set up Node.js Environmental Science --> cache NPM rely on --> install NPM rely on --> structure Hexo --> Release Github Pages

This project workflow The main points are as follows :

  • Running environment : The whole workflow runs in a virtual environment ubuntu-latest. You can also specify other virtual environments , Such as Windows ServermacOS etc. ;
  • Cache dependency : By caching dependencies , It can speed up the installation of related dependencies . For specific use, please see : Cache dependencies to speed up workflow ;
  • Get the current time : follow-up Commit changes In the step commit message In this step, the current time is obtained , Here is the use of step Context Related concepts of , We can step To specify a id, follow-up step We can go through steps.<step id>.outputs To get information about the steps that have been run ;
  • structure Hexo: The perform hexo generate Command to generate a static web page ;
  • Authentication in workflow : Submit push and publish steps require authentication .GitHub Provide a token , Can be used to represent GitHub Actions Authentication . All we need to do is create a named GITHUB_TOKEN The token . The specific steps are as follows :Settings --> Developer settings --> Personal access tokens --> Generate new token, Name it GITHUB_TOKEN , And check the permissions you need , And then you can do it in step Through the use of ${{ secrets.GITHUB_TOKEN }} Authentication .

more Action Can be found in Github Official market see .

Conclusion

Last , Quote bloggers @ Live broadcast of the Anti Japanese War A passage from :

“ We broadcast the Anti Japanese War live , Not to stir up negative emotions like hatred , It's about moderately evoking forgetfulness , When we always remember the sufferings of our ancestors 、 Fear and humiliation ; When we appreciate how our ancestors abandoned the past when the nation was in danger , When national reconciliation is achieved , When we see how our ancestors have gone to death with ease and generosity , When the body is sacrificed for this country , I believe that we will have more mature and rational thinking about reality .”

Remember the history , Sheer endeavour .

Not forget national humiliation , We are self-improvement .

版权声明
本文为[kokohuang]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201207101726081z.html