Qt Performance and Tools Update Part 1
Qt性能和工具更新第1部分
July 05, 2024 by Veli-Pekka Heinonen | Comments
2024年7月5日:Veli Pekka Heinonen |评论
Performance optimisation matters when you are trying to get your application working in a resource-constrained environment. This is typically the case in embedded but also in some desktop scenarious you may run short on resources so it’s not a matter without significance on desktop either.
当试图让应用程序在资源受限的环境中工作时,性能优化非常重要。这通常是嵌入式的情况,但在某些桌面场景中也会出现这种情况,可能会缺少资源,因此这在桌面上也不是没有意义的事情。
What we mean by performance here is the ability to get the application running to fulfill its purpose, in practice typically meaning sufficient fps in the UI and meeting other nonfunctional requirements, such as startup time, memory consumption and CPU/GPU load.
我们这里所说的性能是指让应用程序运行以实现其目的的能力,在实践中通常意味着UI中有足够的fps,并满足其他非功能要求,如启动时间、内存消耗和CPU/GPU负载。
There have been a number of discussions on Qt performance aspects and as we have been working on a number of related items we thought now could be a good time to provide a summary of all the activities and tools we have. You can optimise the performance of your application by utilising them and also use them in testing. We have been working on improving existing performance tools as well as adding new ones and providing guidelines, so let’s look at the latest additions. This post is starting a stream of blog posts to help you with performance optimisation and provide a view to our activities in this area.
已经就Qt性能方面进行了多次讨论,由于我们一直在研究一些相关项目,我们认为现在可能是总结我们所有活动和工具的好时机。可以通过使用它们来优化应用程序的性能,也可以在测试中使用它们。我们一直在努力改进现有的性能工具,添加新的性能工具并提供指导方针,所以让我们看看最新添加的性能工具。这篇文章正在启动一系列博客文章,以帮助优化性能,并提供我们在这一领域的活动视图。
Qt Lite
The Qt framework consists of over fifty modules that you can easily select to be deployed with the application as needed. We have been working on enhancing Qt6 configurability so that you could more easily remove functionality you do not need.
Qt框架由50多个模块组成,您可以根据需要轻松选择这些模块与应用程序一起部署。我们一直致力于增强Qt6的可配置性,以便可以更轻松地删除不需要的功能。
Qt Configure Options, also often referred to as features, is a concept that allows developers to optimize their applications for better performance and efficiency. With Qt Configure Options, applications can be delivered in smaller packages, fitted into smaller RAM footprints, and launched faster. Together with LTCG Qt Configure Options will also improve runtime performance.
Qt配置选项,也称为功能,是一个允许开发人员优化应用程序以获得更好性能和效率的概念。使用Qt配置选项,应用程序可以以更小的包交付,安装在更小的RAM占用中,并更快地启动。与LTCG Qt配置选项一起使用也将提高运行时性能。
Please stay tuned for separate blog posts on this topic in the near future.
在不久的将来,请继续关注有关此主题的单独博客文章。
Application Trace Events
应用程序跟踪事件
We have blogged about application trace events previously in conjunction of the QML profiler and our events (Q_TRACE):
我们之前在博客中结合QML探查器和我们的事件(Q_trace)介绍了应用程序跟踪事件:
https://www.qt.io/blog/qtquick3d-qml-profiler-events
We are continuing the work on this to add more events and a separate blog post is coming out in a few weeks as well to cover latest aspects, in particular using your own events.
我们正在继续这方面的工作,以增加更多的活动,几周后还会发布一篇单独的博客文章,介绍最新的方面,特别是使用自己的活动。
Application trace events allow you to see low level C++ code tracing info without building Kernel or debug frames in an OS that does not support tracing. It allows you to get full stack tracing to trace from the top level QML or JavaScript down to the C++ and all the way to the kernel space. This enables you to for instance measure the performance of an application and to check whether it is CPU or I/O bound or influenced by other applications running on the same system.
应用程序跟踪事件允许查看低级别C++代码跟踪信息,而无需在不支持跟踪的操作系统中构建内核或调试框架。它允许进行全栈跟踪,从顶级QML或JavaScript一直跟踪到C++,一直跟踪到内核空间。例如,这够测量应用程序的性能,并检查它是否受CPU或I/O限制,或者是否受同一系统上运行的其他应用程序的影响。
Common Trace Format Viewer (CTF) -support was also added in Qt 6.5 for trace events. It can be used also in cases that are not supported by LTTng, for instance on Windows and allows you to get a full view of your system. It also works on some RTOSs. You can open traces using trace-compass or convert them to text using babeltrace.
通用跟踪格式查看器(CTF)-在Qt 6.5中还添加了对跟踪事件的支持。它也可以用于LTTng不支持的情况,例如在Windows上,并允许获得系统的完整视图。它也适用于一些RTOS。可以使用跟踪指南针打开跟踪,也可以使用babeltrace将它们转换为文本。
LTTng-based tracing can be enabled on Linux as long as Qt has been built with support enabled.
只要Qt是在启用支持的情况下构建的,就可以在Linux上启用基于LTTng的跟踪。
Qt Creator Performance Tools
Qt Creator性能工具
There are many performance related tools available also in Qt Creator, such as QML Profiler that is a debugging tool inside Qt Creator for finding root causes for typical performance issues. Full list of Qt Creator tools is available here: Analyzing Code | Qt Creator Manual
Qt Creator中还提供了许多与性能相关的工具,例如QML Profiler,它是Qt Creaator中的一个调试工具,用于查找典型性能问题的根本原因。Qt Creator工具的完整列表可在此处获得:分析代码| Qt Creator手册
QML Profiler provides QML or JavaScript stack traces by recording every single function call with exact timestamps. Viewing the collected data can be done separately in Qt Creator.
QML Profiler通过记录每个带有精确时间戳的函数调用,提供QML或JavaScript堆栈跟踪。可以在Qt Creator中单独查看收集的数据。
The main difference of QML profiling and application trace events is that application trace events also support tracing on the C++ level.
QML评测和应用程序跟踪事件的主要区别在于,应用程序跟踪活动还支持C++级别的跟踪。
Please see the link below for more information on QML profiling:
有关QML分析的更多信息,请参阅下面的链接:
Profiling QML Applications | Qt Creator Manual
There is also a profiler available for CMake from Qt Creator 12 onwards. This allows you to see where CMake is spending time configuring your project: https://doc.qt.io/qtcreator/creator-how-to-profile-cmake-code.html
从Qt Creator 12起,还有一个可用于CMake的探查器。这允许您查看CMake在配置项目时所花费的时间:https://doc.qt.io/qtcreator/creator-how-to-profile-cmake-code.html
Additionally there is a tool for analysing CPU usage that we have found handy:
此外,还有一个用于分析CPU使用情况的工具,我们发现它很方便:
Analyzing CPU Usage | Qt Creator Manual
Please also see Qt Creator documentation link below for more information on trace visualisation of full stack tracing using Chrome Trace Events which is especially useful when viewing large trace files that are difficult to visualize using the built-in trace-viewer:
还请参阅下面的Qt Creator文档链接,了解有关使用Chrome trace Events进行全栈跟踪可视化的更多信息,这在使用内置跟踪查看器查看难以可视化的大型跟踪文件时尤其有用:
Visualizing Chrome Trace Events | Qt Creator Manual
Qt Quick Compilers
Qt Quick编译器
We have been blogging about Qt Quick compilers for QML and related performance enhancements previously:
我们之前一直在写关于QML的Qt Quick编译器和相关性能增强的博客:
https://www.qt.io/blog/qt-6.6-and-6.7-make-qml-faster-than-ever-a-new-benchmark-and-analysis
https://www.qt.io/blog/compiling-qml-to-c-a-4x-speedup
Qt Quick compiler offers significant performance improvement compared with interpreting it by compiling QML to C++ – with significant improvement (see links above) using a non-UI benchmarking app utilising QObjects which is a typical use case.
与通过将QML编译到C++来解释它相比,Qt Quick编译器提供了显著的性能改进——使用非UI基准应用程序使用QObjects(这是一个典型的用例)具有显著的改进(见上面的链接)。
The performance numbers for dealing with QObjects and calling typed functions on them have improved massively in Qt 6.6 and Qt 6.7 while also improving startup time.
在Qt6.6和Qt6.7中,处理QObjects和在QObjects上调用类型化函数的性能大大提高,同时也缩短了启动时间。
Next step here is restructuring the type information in the compiler so that our type inference can be extended again.
这里的下一步是重新构造编译器中的类型信息,以便我们的类型推理可以再次扩展。
The existing documentation covers more details for the QML compilers:
现有文档涵盖了QML编译器的更多详细信息:
https://doc.qt.io/qt-6/qtqml-qtquick-compiler-tech.html
ROM Reduction in Qt for MCU
单片机Qt中ROM的减少
Qt for MCU is a complete graphics framework and toolkit that supports QML while fitting into a few hundred kBytes of memory. It is in particular intended to microcontrollers where processing capacity and memory are limited. You can however run it in MPUs as well. Please see the product page for more details on Qt for MCU:
Qt for MCU是一个完整的图形框架和工具包,支持QML,同时适合几百kBytes的内存。它特别适用于处理能力和内存有限的微控制器。然而,也可以在MPU中运行它。有关MCU Qt的更多详细信息,请参阅产品页面:
https://www.qt.io/product/develop-software-microcontrollers-mcu
We have ongoing activity to reduce ROM footprint even further. This is a constant effort and in Qt for MCUs 2.8 LTS we were able to reduce the amount of code C++ generated from QML in 4-10% compared with the previous 2.5 LTS release.
我们正在进行进一步减少ROM占用的活动。这是一项持续的努力,在针对MCU 2.8 LTS的Qt中,与之前的2.5 LTS版本相比,我们能够将QML生成的C++代码量减少4-10%。
Embedded Performance Evaluation Application
嵌入式性能评估应用程序
Embedded performance evaluation application is a new application for embedded 2D use cases offering a minimalistic UI that can be expanded to see how performance evolves on your hardware when you add more and more UI elements. It provides a log output for fps, CPU load as well as memory consumption that you can view, and also supports command line usage so it could be used for continuous testing efforts. It is currently in beta phase and can be provided to early users separately later in the fall.
嵌入式性能评估应用程序是一款适用于嵌入式2D用例的新应用程序,它提供了一个极简主义的UI,当添加越来越多的UI元素时,可以扩展该UI以查看硬件的性能如何发展。它提供了可以查看的fps、CPU负载以及内存消耗的日志输出,还支持命令行使用,因此可以用于连续的测试工作。它目前处于测试阶段,可以在秋季晚些时候单独提供给早期用户。
Qt5 vs Qt6
Measuring performance can be a complicated thing and it’s easy to end up measuring things that are not directly comparable. For instance, in Qt6 we introduced the RHI APIs that change the software architecture in order to have better support for different backends like Vulkan in addition to OpenGL. RHI slightly changes the way your app uses Qt, but it significantly changes the way Qt uses the hardware and the OS. This makes direct comparisons between Qt5 and Qt6 much less straightforward.
衡量性能可能是一件复杂的事情,最终很容易衡量出无法直接比较的东西。例如,在Qt6中,我们引入了RHI API,这些API改变了软件体系结构,以便更好地支持不同的后端,如Vulkan以及OpenGL。RHI略微改变了你的应用程序使用Qt的方式,但它显著改变了Qt使用硬件和操作系统的方式。这使得Qt5和Qt6之间的直接比较变得不那么简单。
As another example we have for instance enhanced multi-threading support for certain operations in the software rasterizer (QPainter), however utilising multiple cores leads to higher CPU consumption initially (peak) but provides faster progress in the end in comparison to using a single-core model.
例如,作为另一个例子,我们对软件光栅化器(QPainter)中的某些操作增强了多线程支持,然而,与使用单核模型相比,使用多核最初会导致更高的CPU消耗(峰值),但最终会提供更快的进度。
Similarly configuration may also play a role. For instance Yocto configuration is different between Qt5 and Qt6, so just upgrading Yocto in Boot to Qt 5 to Qt6 and not configuring e.g. ICU library used for internationalising will cause significant memory increase that can be solved by reconfiguring libraries.
类似的配置也可能发挥作用。例如,Yocto配置在Qt5和Qt6之间是不同的,因此仅将引导中的Yocto升级到Qt5到Qt6,而不配置例如用于国际化的ICU库,将导致内存显著增加,这可以通过重新配置库来解决。
Additionally different versions of OS, 3rd party libraries and drivers make direct comparisons more difficult and these should be taken into account in the test setup.
此外,不同版本的操作系统、第三方库和驱动程序使直接比较更加困难,在测试设置中应考虑这些因素。
We have additional measurement work ongoing, comparing Qt5 to Qt6 on desktop among other things, and we can provide details of these measurements in the coming weeks, but the key thing is how Qt is being used as it can have a major impact to the results seen.
我们正在进行额外的测量工作,在桌面上比较Qt5和Qt6等,我们可以在未来几周内提供这些测量的细节,但关键是Qt的使用方式,因为它会对所看到的结果产生重大影响。
In general Qt6 has more functionality and code so it may well consume a little more memory than Qt5 in many scenarious but with some planning and guidelines there should be no major difference in performance in general. QML compilers and Qt Lite feature configuration are example additional ways to enhance performance for CPU/GPU consumption, memory utilisation and startup time.
一般来说,Qt6具有更多的功能和代码,因此在许多情况下,它可能会比Qt5消耗更多的内存,但有了一些规划和指导原则,总体而言,性能应该不会有太大差异。QML编译器和Qt-Lite功能配置是提高CPU/GPU消耗、内存利用率和启动时间性能的额外方法。
Qt Regression Testing
Qt回归测试
We are using a number of tools for regression testing but one key one from performance perspective is the QmlBench: https://code.qt.io/cgit/qt-labs/qmlbench.git/
我们正在使用许多工具进行回归测试,但从性能角度来看,其中一个关键工具是QmlBench:https://code.qt.io/cgit/qt-labs/qmlbench.git/
QmlBench is a tool for benchmarking Qt, QML and QtQuick as a whole stack rather than in isolation. The benchmarks it provides exercise a very large part of Quick, QML, Gui, Core, and as a result can be considered a decent metric for overall Qt performance.
QmlBench是将Qt、QML和QtQuick作为一个整体进行基准测试的工具,而不是孤立地进行基准测试。它提供的基准测试在很大程度上锻炼了Quick、QML、Gui、Core,因此可以被视为衡量整体Qt性能的一个不错的指标。
We have been using QmlBench to test different Qt versions on both various desktop and embedded platforms to detect regression issues. Now we are extending the embedded hardware coverage to new boards as well as enhancing the test procedure itself to better detect regressions.
我们一直在使用QmlBench在各种桌面和嵌入式平台上测试不同的Qt版本,以检测回归问题。现在,我们正在将嵌入式硬件覆盖范围扩展到新的电路板,并增强测试程序本身,以更好地检测回归。
Does it really reside on github? We have git.qt.io, we have package delivery, we have github. Reads like everything is spread out to all kinds of places with no structure [MK1]
它真的存在于github上吗?我们有git.qt.io,我们有包裹递送,我们有github。读起来就像所有东西都分散到各种没有结构的地方[MK1]
DebugView QML Type for Quick 3D
用于Quick 3D的DebugView QML类型
Sometimes you may have a need to debug your existing application in various scenarios. As one option for Qt Quick 3D data for easily seeing performance releated data DebugView QML type creates a dialog in the top left hand corner of your application providing a view on 3D fps, sync and render times as well as detailed statistics: draw calls, render passes, textures and meshes used by the scene‘s assets
有时,可能需要在各种场景中调试现有的应用程序。作为Qt Quick 3D数据的一个选项,可以轻松查看性能相关数据DebugView QML类型在应用程序的左上角创建一个对话框,提供3D fps、同步和渲染时间以及详细统计信息:绘制调用、渲染过程、场景资源使用的纹理和网格
This can be enabled by adding a QML snippet to your code.
这可以通过向代码中添加QML片段来实现。
Qt Performance Guidelines
Qt性能指南
The Qt performance guidelines have been a bit scattered in Qt documentation but we are assembling them to one location which is easier to find. Guidelines have been instrumental in getting the best out of individual hardware boards. Qt is a comprehensive framework with many bells and whistles for a vast number of purposes allowing you to do many different things, but you may inadvertently end up selecting unoptimised software constructs. The list of guideline documents currently includes:
Qt性能指南在Qt文档中有点分散,但我们正在将它们组装到一个更容易找到的位置。指导方针有助于最大限度地利用单个硬件板。Qt是一个全面的框架,具有许多功能,可以用于多种用途,允许做许多不同的事情,但您可能会无意中选择未优化的软件结构。准则文件清单目前包括:
- QML and Qt Quick performance - QML Performance Considerations And Suggestions | Qt 6.7
- Best Practices for Qt Quick - Best Practices for QML and Qt Quick | Qt Quick 6.7.2
- Embedded Qt Performance - Qt for Embedded Linux | Qt 6.7
- Qt UltraLight performance - Qt Quick Ultralite Performance Guide 2.8.0
We are also coming up with example reference applications to give concrete code examples about performance optimisation with Qt. Additionally there is an existing application for Quick 3D performance benchmarking for both Qt5 and Qt6:
我们还提出了示例参考应用程序,以提供有关Qt性能优化的具体代码示例。此外,还有一个用于Qt5和Qt6的Quick 3D性能基准测试的现有应用程序:
https://www.qt.io/blog/introducing-qtquick3d-benchmarking-application
Summary
总结
The high level key points of this blog post are:
这篇博客文章的高级要点是:
- Qt already provides a wide variety of tools allowing you to get to good performance especially on embedded and the list of tools and features is expanding.
- Qt已经提供了各种各样的工具,使您能够获得良好的性能,尤其是在嵌入式方面,而且工具和功能的列表正在扩展。
- We are expanding our regression testing activities to even better detect perfomance-related anomalies.
- 我们正在扩大我们的回归测试活动,以更好地检测与性能相关的异常。
- We have a number of performance guidelines and recommendations to help you get over performance hurdles in your Qt application.
- 我们有许多性能指南和建议,可以帮助克服Qt应用程序中的性能障碍。
文章评论