当前位置:网站首页>Large file upload optimization, breakpoint continuous transmission, piecewise upload

Large file upload optimization, breakpoint continuous transmission, piecewise upload

2020-12-08 09:53:39 Master of Wannian Da Ye Yi

The whole idea

The first step is to combine the project background , Research the optimized solution . File upload failure is a cliche , A slicing scheme is often used to make multiple files , Parallel request interface for upload , When all requests are answered , Merge all the fragment files on the server side . When fragment upload fails , You can judge when you upload again , Only upload the last failed part , Reduce user waiting time , Relieve server pressure . This is the piecemeal upload file .

Big file upload

So how to realize the large file fragment upload ?

The flow chart is as follows :

Big file upload flow chart

Divided into the following steps to achieve :

1. file MD5 encryption

MD5 Is the unique identification of the file , You can make use of documents MD5 Query file upload status .

According to the revision time of the document 、 File name 、 Finally, modify the time and other information , adopt spark-md5[2] Generate a file of MD5. It should be noted that , Large size files need to be read in pieces , Add the contents of the read file to spark-md5[3] Of hash In calculation , Until the file is read , Finally back to the final hash Code to callback In the callback function . Here you can add a progress bar for file reading as needed .

MD5 The encryption process

The implementation method is as follows :

//  Modification time + File name + Last modified -->MD5
md5File (file) {
  return new Promise((resolve, reject) => {
    let blobSlice =
      File.prototype.slice ||
      File.prototype.mozSlice ||
      File.prototype.webkitSlice
    let chunkSize = file.size / 100
    let chunks = 100
    let currentChunk = 0
    let spark = new SparkMD5.ArrayBuffer()
    let fileReader = new FileReader()
    fileReader.onload = function (e) {
      console.log('read chunk nr', currentChunk + 1, 'of', chunks)
      spark.append(e.target.result) // Append array buffer
      currentChunk++
      if (currentChunk < chunks) {
        loadNext()
      } else {
        let cur = +new Date()
        console.log('finished loading')
        // alert(spark.end() + '---' + (cur - pre)); // Compute hash
        let result = spark.end()
        resolve(result)
      }
    }
    fileReader.onerror = function (err) {
      console.warn('oops, something went wrong.')
      reject(err)
    }
    function loadNext () {
      let start = currentChunk * chunkSize
      let end =
        start + chunkSize >= file.size ? file.size : start + chunkSize
      fileReader.readAsArrayBuffer(blobSlice.call(file, start, end))
    }
    loadNext()
  })
}

2. Query file status

The front end gets the file MD5 after , Query from the background whether there is a name of MD5 Folder , If there is , List all the files in the folder , Get the list of uploaded slices , If it doesn't exist , Then the uploaded slice list is empty .

//  Check the file MD5
checkFileMD5 (file, fileName, fileMd5Value, onError) {
  const fileSize = file.size
  const { chunkSize, uploadProgress } = this
  this.chunks = Math.ceil(fileSize / chunkSize)
  return new Promise(async (resolve, reject) => {
    const params = {
      fileName: fileName,
      fileMd5Value: fileMd5Value,
    }
    const { ok, data } = await services.checkFile(params)
    if (ok) {
      this.hasUploaded = data.chunkList.length
      uploadProgress(file)
      resolve(data)
    } else {
      reject(ok)
      onError()
    }
  })
}

3. File fragmentation

The core of file upload optimization is file fragmentation ,Blob Object slice Method can cut the file ,File Objects are inheritance Blob Object's , therefore File There are also objects. slice Method .

Define the size variable of each fragment file as chunkSize, By file size FileSize And slice size chunkSize Get the number of slices chunks, Use for Circulation and file.slice() Methods the files were divided , Serial number is 0 - n, Compare with the uploaded slice list , Get all the pieces that haven't been uploaded ,push To the request list requestList.

File fragmentation

async checkAndUploadChunk (file, fileMd5Value, chunkList) {
  let { chunks, upload } = this
  const requestList = []
  for (let i = 0; i < chunks; i++) {
    let exit = chunkList.indexOf(i + '') > -1
    //  If it already exists ,  You don't need to upload the current block 
    if (!exit) {
      requestList.push(upload(i, fileMd5Value, file))
    }
  }
  console.log({ requestList })
  const result =
    requestList.length > 0
      ? await Promise.all(requestList)
        .then(result => {
          console.log({ result })
          return result.every(i => i.ok)
        })
        .catch(err => {
          return err
        })
      : true
  console.log({ result })
  return result === true
}

4. Upload the shard

call Promise.all Upload all slices simultaneously , Serial number the slice 、 Slicing files 、 file MD5 To the backstage .

After the background receives the upload request , First look at the name file MD5 Is there a folder for , Create a folder if it does not exist , And then through fs-extra Of rename Method , Move slice from temporary path to slice folder , give the result as follows :

Upload the shard

When all the pieces are uploaded successfully , Inform the server to merge , When a fragment upload fails , Tips “ Upload failed ”. When uploading again , Through documents MD5 Get the upload status of the file , When the server already has the MD5 The corresponding slice is , It means that the slice has been uploaded , No need to upload again , When the server can't find it MD5 The corresponding slice is , Represents that the slice needs to be uploaded , Users only need to upload this part of the slice , You can upload the entire file , This is the breakpoint continuation of files .

Breakpoint continuation diagram

//  Upload chunk
upload (i, fileMd5Value, file) {
  const { uploadProgress, chunks } = this
  return new Promise((resolve, reject) => {
    let { chunkSize } = this
    //  Construct a form ,FormData yes HTML5 Newly added 
    let end =
      (i + 1) * chunkSize >= file.size ? file.size : (i + 1) * chunkSize
    let form = new FormData()
    form.append('data', file.slice(i * chunkSize, end)) // file Object's slice Method is used to cut out part of the file 
    form.append('total', chunks) //  The total number of films 
    form.append('index', i) //  What is the current film 
    form.append('fileMd5Value', fileMd5Value)
    services
      .uploadLarge(form)
      .then(data => {
        if (data.ok) {
          this.hasUploaded++
          uploadProgress(file)
        }
        console.log({ data })
        resolve(data)
      })
      .catch(err => {
        reject(err)
      })
  })
}

5. Upload progress

Although it is much faster to upload large files in batches than to upload large files in a single time , There is still a loading time , At this time, you should add a prompt to the upload progress , Real time display of file upload progress .

Native Javascript Of XMLHttpRequest offer progress event , This event returns the size of the file that has been uploaded and the total size . Project use axios[4] Yes ajax encapsulate , Can be in config add onUploadProgress Method , Monitor file upload progress .

Upload progress

const config = {
  onUploadProgress: progressEvent => {
    var complete = (progressEvent.loaded / progressEvent.total * 100 | 0) + '%'
  }
}
services.uploadChunk(form, config)

6. Merging and slicing

After uploading all the files into slices , The front end actively informs the server to merge , When the server receives the request, it will actively merge the slices , Through documents MD5 Find the folder with the same name in the file upload path of the server . As can be seen from the above , File fragmentation is named according to the fragment number , The shard upload interface is asynchronous , There is no guarantee that the slices received by the server are spliced in the order of request . So it should be in the merge folder before the fragment file , Sort by file name , And then through concat-files Merge fragment files , Get the file uploaded by the user . At this point, the upload of large files is completed .

merge

Merging and slicing diagram

Node End code :

//  Merge files 
exports.merge = {
  validate: {
    query: {
      fileName: Joi.string()
        .trim()
        .required()
        .description(' File name '),
      md5: Joi.string()
        .trim()
        .required()
        .description(' file md5'),
      size: Joi.string()
        .trim()
        .required()
        .description(' file size '),
    },
  },
  permission: {
    roles: ['user'],
  },
  async handler (ctx) {
    const { fileName, md5, size } = ctx.request.query
    let { name, base: filename, ext } = path.parse(fileName)
    const newFileName = randomFilename(name, ext)
    await mergeFiles(path.join(uploadDir, md5), uploadDir, newFileName, size)
      .then(async () => {
        const file = {
          key: newFileName,
          name: filename,
          mime_type: mime.getType(`${uploadDir}/${newFileName}`),
          ext,
          path: `${uploadDir}/${newFileName}`,
          provider: 'oss',
          size,
          owner: ctx.state.user.id,
        }
        const key = encodeURIComponent(file.key)
          .replace(/%/g, '')
          .slice(-100)
        file.url = await uploadLocalFileToOss(file.path, key)
        file.url = getFileUrl(file)
        const f = await File.create(omit(file, 'path'))
        const files = []
        files.push(f)
        ctx.body = invokeMap(files, 'toJSON')
      })
      .catch(() => {
        throw Boom.badData(' Large file fragmentation merge failed , Please try again later ~')
      })
  },
}

summary

This paper describes some methods of large-scale file upload optimization , Sum up as follows 4 spot :

  1. Blob.slice Slice the file , Upload multiple slices simultaneously , After all slices are uploaded, tell the server to merge , To achieve large file fragment upload ;
  2. Native XMLHttpRequest Of onprogress Monitoring the progress of slice upload , Get the file upload progress in real time ;
  3. spark-md5 Calculate the document according to the content of the document MD5, Get the unique ID of the file , Bind to file upload status ;
  4. Pass the file before uploading MD5 Query the list of uploaded slices , When uploading, only the slices that have not been uploaded are uploaded , Implement breakpoint continuation .

版权声明
本文为[Master of Wannian Da Ye Yi]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201208095322468f.html