当前位置:网站首页>Understanding XPath | lxml | markup | markdown

Understanding XPath | lxml | markup | markdown

2020-12-07 16:01:58 Quant_ Learner

  • XML

    Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

    The design goals of XML emphasize simplictiy, generality, and usability across the Internet.

    Markup language

    A markup language is a system for annotating a document in a way that is syntactically distinguishable from the text, meaning when the document is processed for display, the markup language is not shown, and is only used to format the text.

    Markdown

    Markdown is a lightweight markup language with plain-text-formatting syntax, created in 2004 by John Gruber and Aaron Swartz.

  • XPath

    XPath(XML Path Language) is a query language for selecting nodes from an XML document. XPath was defined by the World Wide Web Consortium (W3C).

    The XPath language is based on a tree representation of the XML document, an XPath expression is often referred to simply as “an XPath”.

  • XPath syntax & tutorial

    XML Path Language (XPath) 3.0

    XPath Tutorial

  • Terminology

    XPath Nodes has several kinds: element, attribute, text, namespace, processing-instruction, comment, and document nodes.

    XML documents are treated as trees of nodes. The topmost element of the tree is called the root element.

    Atomic values are nodes with no children or parent.

    Items are atomic values or nodes.

    XPath axes represents a relationship to the context (current) node, and is used to locate nodes relative to that node on the tree.

  • Relationship of Nodes
    1. Parent

      Each element and attribute has one parent.

    2. Children

      Element nodes may have zero, one or more children.

    3. Siblings

      Nodes that have the same parent.

    4. Ancestors

      A node’s parent, parent’s parent, etc.

    5. Descendants

      A node’s children, children’s children.

  • XPath Syntax

    XPath uses path expressions to select nodes or node-sets in an XML document.

    Expression Description
    nodename Selects all nodes with the name “nodename
    / Selects from the root node
    // Selects nodes in the document from the current node that match the selection no matter where they are
    . Selects the current node
    Selects the parent of the current node
    @ Selects attributes
    /bookstore/book[1] Predicates, used to find a specific node
    * Matches any element node
    @* Matches any attribute node
    node() Matches any node of any kind
    | and
  • Axes
    axisname::node[predicate]
    child::book
    
    AxisName Result
    ancestor Selects all ancestors (parent, grandparent, etc.) of the current node
    ancestor-or-self Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself
    attribute Selects all attributes of the current node
    child Selects all children of the current node
    descendant Selects all descendants (children, grandchildren, etc.) of the current node
    descendant-or-self Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself
    following Selects everything in the document after the closing tag of the current node
    following-sibling Selects all siblings after the current node
    namespace Selects all namespace nodes of the current node
    parent Selects the parent of the current node
    preceding Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes
    preceding-sibling Selects all siblings before the current node
    self Selects the current node
  • XPath Operators

    An XPath expression returns either a node-set, a string, a Boolean, or a number.

    Operator Description Example
    | Computes two node-sets //book | //cd
    + Addition 6 + 4
    - Subtraction 6 - 4
    * Multiplication 6 * 4
    div Division 8 div 4
    = Equal price=9.80
    != Not equal price!=9.80
    < Less than price<9.80
    <= Less than or equal to price<=9.80
    > Greater than price>9.80
    >= Greater than or equal to price>=9.80
    or or price=9.80 or price=9.70
    and and price>9.00 and price<9.90
    mod Modulus (division remainder) 5 mod 2
  • XSLT

    XPath is a major element in the XSLT standard.

    XSLT(Extensible Stylesheet Language Transformation) is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG.

  • Python module : lxml

    See 《 understand lxml module in Python

版权声明
本文为[Quant_ Learner]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/202012071547394313.html

随机推荐