当前位置:网站首页>[natural language processing] Introduction to pytorch (essential basic knowledge)

[natural language processing] Introduction to pytorch (essential basic knowledge)

2021-10-14 06:36:24 ZSYL

PyTorch Basics

In this book , We use it widely PyTorch To implement our deep learning model .PyTorch It's open source 、 Community driven deep learning framework . And Theano、Caffe and TensorFlow Different ,PyTorch A kind of “ Tape based automatic differentiation ” Method , Allows us to dynamically define and execute computational graphics . This is very helpful for debugging and building complex models with minimal effort .


dynamic VS Static calculation diagram image Theano、Caffe and TensorFlow Such a static framework needs to first declare 、 Compile and execute the calculation diagram . Although this leads to a very efficient implementation ( Very useful in production and mobile settings ), But it may become very troublesome in the process of research and development .

image Chainer、DyNet and PyTorch Such a modern framework The dynamic calculation diagram is realized , To support a more flexible imperative development style , Instead of compiling the model before each execution .

Dynamic calculation diagram Modeling NLP Especially useful for tasks , Each input may lead to a different graph structure .


PyTorch It is an optimized tensor operation library , It provides a series of packages for deep learning .

The core of this library is tensor , It is a mathematical object that contains some multidimensional data .

0 The order tensor is a number , perhaps Scalar .

First order tensor ( First order tensor ) Is an array of numbers , Or a vector . Similarly , The second-order tensor is an array of vectors , Or a matrix .

therefore , Tensor can be generalized to scalar n Dimension group .

In the following sections , We will use PyTorch Learn the following :

  • Create tensor
  • Operations and tensors
  • Indexes 、 Slice and connect with tensor
  • Calculate the gradient with tensor
  • Use a gpu Of CUDA tensor

In the rest of this section , We will use... First PyTorch To get familiar with all kinds of PyTorch operation . We recommend that you now have installed PyTorch And ready Python 3.5+ The notebook , And follow the examples in this section . We also recommend that you complete the exercises later in this section .

install PyTorch

The first step is through pytorch.org Select your system preferences and install on your machine PyTorch. Choose your operating system , Then select package manager ( We recommend conda/pip), Then select the... You are using Python edition ( We recommend 3.5+). This will generate the command , So that you can perform the installation PyTorch. At the time of writing ,conda The installation commands of the environment are as follows :

conda install pytorch torchvision -c pytorch


Be careful : If you have a support CUDA Graphics processor unit (GPU), You should also choose the right CUDA edition . For more details , Please refer to pytorch.org Installation instructions on .


Please refer to : PyTorch The latest installation tutorial (2021-07-27)

Create tensor

First , We define an auxiliary function , describe (x), It sums up the tensor x Various properties of , For example, the type of tensor 、 The dimension of the tensor and the content of the tensor :

Input[0]:
def describe(x):
  print("Type: {}".format(x.type()))
  print("Shape/size: {}".format(x.shape))
  print("Values: \n{}".format(x))

PyTorch Allow us to use torch Packages create tensors in many different ways . One way to create a tensor is to initialize it by specifying the dimension of a random tensor , For example 1-3 Shown .

Example 1-3: stay PyTorch Use in torch.Tensor Create tensor

Input[0]:
import torch
describe(torch.Tensor(2, 3))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 3.2018e-05,  4.5747e-41,  2.5058e+25],
        [ 3.0813e-41,  4.4842e-44,  0.0000e+00]])

We can also create a tensor through the uniform distribution on the random initialization value interval (0,1) Or standard normal distribution ( The tensor is initialized randomly from a uniform distribution , say , It's important , As you will see in chapters 3 and 4 ), See example 1-4.

Example 1-4: Create randomly initialized tensors

Input[0]import torch
describe(torch.rand(2, 3))   # uniform random
describe(torch.randn(2, 3))  # random normal
Output[0]:
Type:  torch.FloatTensor
Shape/size:  torch.Size([2, 3])
Values:
 tensor([[ 0.0242,  0.6630,  0.9787],
        [ 0.1037,  0.3920,  0.6084]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[-0.1330, -2.9222, -1.3649],
        [ 2.3648,  1.1561,  1.5042]])

We can also create tensor , All tensors are filled with the same scalar . For create 0 or 1 tensor , We have built-in functions , For filling in specific values , We can use fill_() Method .

Any underlined (_) Of PyTorch Methods all refer to local (in place) operation ; in other words , It modifies content in place without creating new objects , As the sample 1-5 Shown .

Example 1-5: Create a filled tensor

Input[0]:
import torch
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 5.,  5.,  5.],
        [ 5.,  5.,  5.]])

Example 1-6 Demonstrates how to use Python The list creates tensors declaratively .

Example 1-6: Create and initialize tensors from the list

Input[0]:
x = torch.Tensor([[1, 2, 3],  
                  [4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  2., 3.],
        [ 4.,  5., 6.]])

Values can come from the list ( As in the previous example ), It can also come from NumPy Array . Of course , We can also from PyTorch Tensor transformation to NumPy Array .

Be careful , The type of this tensor is a double tensor , Not the default FloatTensor. This corresponds to NumPy Data type of random matrix float64, As the sample 1-7 Shown .

Example 1-7: from NumPy Create and initialize tensors

Input[0]:
import torch
import numpy as np
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))
Output[0]:
Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.8360,  0.8836,  0.0545],
        [ 0.6928,  0.2333,  0.7984]], dtype=torch.float64)

Use... In processing Numpy Legacy Library of format values (legacy libraries) when , stay NumPy and PyTorch The ability to switch between tensors becomes very important .

Tensor type and size

Each tensor has a related type and size . Use torch The default tensor type at . The tensor constructor is torch.FloatTensor. however , The tensor can be specified at initialization , You can also use the type conversion method to convert the tensor to another type later (floatlongdouble etc. ). There are two ways to specify the initialization type , One is to call a specific tensor type directly ( Such as FloatTensor and LongTensor) Constructor for , The other is to use special methods torch.tensor, And provide dtype, For example 1-8 Shown .

Example 1-8: Tensor properties

Input[0]:
x = torch.FloatTensor([[1, 2, 3],  
                       [4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.]])
Input[1]:
x = x.long()
describe(x)
Output[1]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1,  2,  3],
        [ 4,  5,  6]])
Input[2]:
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.int64)
describe(x)
Output[2]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1,  2,  3],
        [ 4,  5,  6]])
Input[3]:
x = x.float()
describe(x)
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.]])

We use the shape characteristics and size method of tensor object to obtain the measured value of its size . The two ways to access these metrics are basically the same . In the debug PyTorch Code , Checking the shape of the tensor becomes an essential tool .

Tensor operation

After creating the tensor , It can be like dealing with traditional programming language types ( Such as +-* and /) Operate on them like that . Except for the operator , We can also use .add() Functions like that , As the sample 1-9 Shown , These functions correspond to the symbolic operator .

Example 1-9: Tensor operation : Add

Input[0]:
import torch
x = torch.randn(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])
Input[1]:
describe(torch.add(x, x))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923,  0.8048, -2.0231],
        [ 0.4335, -1.2245,  1.0072]])
Input[2]:
describe(x + x)
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923,  0.8048, -2.0231],
        [ 0.4335, -1.2245,  1.0072]])

There are also some operations that can be applied to specific dimensions of tensors . As you may have noticed , about 2D tensor , We represent rows as dimensions 0, The list is shown as dimension 1, As the sample 1-10 Shown .

Example 1-10: Dimension based tensor operations

Input[0]:
import torch
x = torch.arange(6)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([6])
Values:
tensor([ 0.,  1.,  2.,  3.,  4.,  5.])
Input[1]:
x = x.view(2, 3)
describe(x)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[2]:
describe(torch.sum(x, dim=0))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([3])
Values:
tensor([ 3.,  5.,  7.])
Input[3]:
describe(torch.sum(x, dim=1))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([  3.,  12.])
Input[4]:
describe(torch.transpose(x, 0, 1))
Output[4]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 0.,  3.],
        [ 1.,  4.],
        [ 2.,  5.]])

Usually , We need to perform more complex operations , Include index 、 section 、 Connection and mutation (indexing,slicing,joining and mutation) The combination of . And NumPy Like other digital libraries ,PyTorch There are also built-in functions , This kind of tensor operation can be made very simple .

Indexes , Slicing and linking

If you are a NumPy user , Then you may be very familiar with the example 1-11 Shown in PyTorch Indexing and slicing scheme .

Example 1-11: Slice and index tensors

Input[0]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[1]:
describe(x[:1, :2])
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([1, 2])
Values:
tensor([[ 0.,  1.]])
Input[2]:
describe(x[0, 1])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
1.0

Example 1-12 Demonstrated PyTorch It also has functions for complex indexing and slicing operations , You may be interested in effectively accessing discontinuous positions of tensors .

Example 1-12: Complex index : Discontinuous index of tensor

Input[0]:
indices = torch.LongTensor([0, 2])
describe(torch.index_select(x, dim=1, index=indices))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 0.,  2.],
        [ 3.,  5.]])
Input[1]:
indices = torch.LongTensor([0, 0])
describe(torch.index_select(x, dim=0, index=indices))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 0.,  1.,  2.]])
Input[2]:
row_indices = torch.arange(2).long()
col_indices = torch.LongTensor([0, 1])
describe(x[row_indices, col_indices])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([ 0.,  4.])

Pay attention to the index (indices) It's a long tensor ; This is the use of PyTorch Function to index . We can also use the built-in connection function to connect tensors , As the sample 1-13 Shown , By specifying tensors and dimensions .

Example 1-13: Connection tensor

Input[0]:
import torch
x = torch.arange(6).view(2,3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[1]:
describe(torch.cat([x, x], dim=0))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([4, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[2]:
describe(torch.cat([x, x], dim=1))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 6])
Values:
tensor([[ 0.,  1.,  2.,  0.,  1.,  2.],
        [ 3.,  4.,  5.,  3.,  4.,  5.]])
Input[3]:
describe(torch.stack([x, x]))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2, 3])
Values:
tensor([[[ 0.,  1.,  2.],
         [ 3.,  4.,  5.]],

        [[ 0.,  1.,  2.],
         [ 3.,  4.,  5.]]])

PyTorch Efficient linear algebraic operations on tensors are also realized , Like multiplication 、 Inverse sum trace , As the sample 1-14 Shown .

Example 1-14: Linear algebra on tensor : Multiplication

Input[0]:
import torch
x1 = torch.arange(6).view(2, 3)
describe(x1)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[1]:
x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 1.,  2.],
        [ 1.,  2.],
        [ 1.,  2.]])
Input[2]:
describe(torch.mm(x1, x2))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[  3.,   6.],
        [ 12.,  24.]])

up to now , We've looked at creating and manipulating constants PyTorch The method of tensor object . Like a programming language ( Such as Python) Variables encapsulate a piece of data , Additional information about data ( Such as memory address storage , for example ),PyTorch Tensors deal with the bookkeeping required to build a computational graph (bookkeeping) what is needed Build a calculation diagram Machine learning is just instantiated by enabling a boolean flag .

Tensor and calculation diagram

PyTorch Tensor classes encapsulate data ( The tensor itself ) And a series of operations , Such as algebraic operations 、 Indexing and shaping operations .

However ,1-15 The example shown is , When requires_grad The boolean flag is set to True Tensor , Bookkeeping is enabled , Traceable Gradient tensor as well as Gradient function , These two needs are based on the discussion of promoting gradient learning “ Supervised learning paradigm ”.

Example 1-15: Create tensors for gradient records

Input[0]:
import torch
x = torch.ones(2, 2, requires_grad=True)
describe(x)
print(x.grad is None)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
True
Input[1]:
y = (x + 2) * (x + 5) + 3
describe(y)
print(x.grad is None)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 21.,  21.],
        [ 21.,  21.]])
True
Input[2]:
z = y.mean()
describe(z)
z.backward()
print(x.grad is None)
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
21.0
False

When you use requires_grad=True When creating tensors , You need to PyTorch To manage the calculation of gradients bookkeeping Information .

First ,PyTorch The value passed forward by the trace . then , At the end of the calculation , Use a single scalar to calculate the backward pass .

Reverse transfer By using a tensor backward() Method to initialize , This tensor is obtained by evaluating a loss function . Pass back calculates the gradient value for the tensor object participating in the pass forward .

Generally speaking , gradient It's a value , It represents the slope of the function output relative to the function input .

In the calculation drawing settings , Every parameter in the model has a gradient , It can be considered as the contribution of the parameter to the error signal . stay PyTorch in , have access to .grad Member variables access the gradient of nodes in the calculation graph . Optimizer usage .grad The variable updates the value of the parameter .

up to now , We have been CPU Allocate tensor on memory . When doing linear algebraic operations , If you have one GPU, Then it may be meaningful to use it .

Use GPU, First need to allocate GPU Tensor on memory . Yes gpu Your access is through a named CUDA It's special API On going .

CUDA API By NVIDIA Created , And only in NVIDIA gpu Upper use .PyTorch Provided CUDA Tensor objects are different from conventional objects in use cpu Binding tensors make no difference , Except for different internal distribution methods .

CUDA tensor

PyTorch Let create these CUDA Tensors become very easy ( Example 1-16), It takes the tensor from CPU Transferred to the GPU, While maintaining its underlying types .PyTorch The preferred method in is device independent , And write it in GPU or CPU Code that works on .

In the following code snippet , We use torch.cuda.is_available() Check GPU Is it available , And then use torch.device Retrieve device name . then , All future tensors will be instantiated , And use .to(device) Method to move it to the target device .

Example 1-16: establish CUDA tensor

Input[0]:
import torch
print (torch.cuda.is_available())
Output[0]:
True
Input[1]:
# preferred method: device agnostic tensor instantiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (device)
Output[1]:
cuda
Input[2]:
x = torch.rand(3, 3).to(device)
describe(x)
Output[2]:
Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values:
tensor([[ 0.9149,  0.3993,  0.1100],
        [ 0.2541,  0.4333,  0.4451],
        [ 0.4966,  0.7865,  0.6604]], device='cuda:0')

Right CUDA He Fei CUDA Object to operate on , We need to make sure they're on the same device . If we don't , The calculation will be interrupted , This is shown in the following code snippet .

for example , When calculating the monitoring indicators that do not belong to the calculation chart , This will happen . When you manipulate two tensor objects , Make sure they're on the same device . Example 1-17 Shown .

Example 1-17: blend CUDA Tensor sum CPU Bound tensor

Input[0]
y = torch.rand(3, 3)
x + y
Output[0]
----------------------------------------------------------------------
RuntimeError                         Traceback (most recent call last)
      1 y = torch.rand(3, 3)
----> 2 x + y

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #3 'other'
Input[1]
cpu_device = torch.device("cpu")
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y
Output[1]
tensor([[ 0.7159,  1.0685,  1.3509],
        [ 0.3912,  0.2838,  1.3202],
        [ 0.2967,  0.0420,  0.6559]])

please remember , Take data from GPU Moving back and forth is very expensive . therefore , Typical processes include GPU Perform many parallel calculations on , The final result is then transmitted back to CPU. This will allow you to make the most of gpu. If you have several CUDA Visible devices ( namely , Best practice is to use... When executing programs CUDA_VISIBLE_DEVICES environment variable , As shown in the figure below :

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py

In this book, we do not deal with parallelism and multi gpu Training , But they are essential in scaling experiments , Sometimes even when training large models . We suggest you refer to PyTorch Documentation and discussion forums , For more help and support on this topic .

practice

The best way to master a topic is to solve a problem . Here are some warm-up exercises . Many questions will involve access to official documents [1] And looking for useful features . .

  1. Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.

  2. Remove the extra dimension you just added to the previous tensor.

  3. Create a random tensor of shape 5x3 in the interval [3, 7)

  4. Create a tensor with values from a normal distribution (mean=0, std=1).

  5. Retrieve the indexes of all the nonzero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

  6. Create a random tensor of size (3,1) and then horizontally stack 4 copies together.

  7. Return the batch matrix-matrix product of two 3-dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

  8. Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

    Solutions

  9. a = torch.rand(3, 3) a.unsqueeze(0)

  10. a.squeeze(0)

  11. 3 + torch.rand(5, 3) * (7 - 3)

  12. a = torch.rand(3, 3) a.normal_()

  13. a = torch.Tensor([1, 1, 1, 0, 1]) torch.nonzero(a)

  14. a = torch.rand(3, 1) a.expand(3, 4)

  15. a = torch.rand(3, 4, 5) b = torch.rand(3, 5, 4) torch.bmm(a, b)

  16. a = torch.rand(3, 4, 5) b = torch.rand(5, 4) torch.bmm(a, b.unsqueeze(0).expand(a.size(0), * b.size()))

summary

In this chapter , We introduced the goal of this book —— natural language processing (NLP) And deep learning —— The supervised learning paradigm is understood in detail .

At the end of this chapter , You should now be familiar with or at least understand the various terms , For example, observation 、 The goal is 、 Model 、 Parameters 、 forecast 、 Loss function 、 Express 、 Study / Training and reasoning . You also learned how to use single hot coding to input learning tasks ( Observations and objectives ) Encoding .

We also study count based representation , Such as TF and TF-IDF. We first understand what is a calculation diagram , Static and dynamic calculation diagrams , as well as PyTorch Tensor manipulation . In chapter two , We're interested in the traditional NLP It is summarized . Chapter two , This chapter should lay the necessary foundation for you , If you are new to the subject of this book , And prepare for the rest of your book .

The key is TF-IDF

Word frequency (TF)= The number of times a word appears in the article / The total number of words in the article

Reverse document frequency (IDF)= log( The total number of documents in the corpus / ( Number of documents containing the word + 1))

TF It should be easy to understand that it is to calculate word frequency ,IDF Measure how common words are . For calculation IDF We need to prepare a corpus in advance to simulate the language environment , If a word is more common , Then the larger the denominator in the formula , The closer the inverse document frequency is 0. Here's the denominator +1 It's to avoid denominator 0 The situation of

TF-IDF = Word frequency (TF)× Reverse document frequency (IDF)

TF-IDF It can be realized very well The purpose of extracting keywords in the article .

For the purpose of learning , Quote from this book , Non commercial use , I recommend you to read this book , Learning together !!!


come on. !

thank !

Strive !

版权声明
本文为[ZSYL]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/10/20211002145858557L.html

随机推荐