您需要 登录 才可以下载或查看,没有账号?注册
x
本帖最后由 雨の日曜日 于 2021-10-2 09:41 编辑
introduction In this article, we will learn what happens behind the scenes when unity renders each frame, what problems can cause rendering jams, and how to solve them Before reading this article, you should first understand that there is no fresh way to improve rendering performance. There are too many factors affecting rendering performance, including game type, device hardware, operating system, etc. The important thing is that we get the data through observation and practice, and then analyze the data to solve the problem. This article contains some common rendering performance problems and solutions, as well as some extended links for you to further understand its principle; Of course, it is very likely that there are performance problems in your game, which are not pointed out in this article; However, this article is worth reading. Understanding the essence of things is very helpful to solve problems
引言 本文我们将学习Unity在渲染每一帧幕后所发生的事情,以及有哪些问题会导致渲染卡顿,以及怎么解决这些问题 阅读本文之前,首先要明白对于提升渲染性能,没有一招鲜的办法。影响渲染性能的因素太多:包括游戏类型、设备硬件、操作系统等。重要的是我们通过观察,实践得到数据然后分析数据从而有针对性的解决问题。本文包含一些常见的渲染性能问题和解决办法以及一些扩展链接供你更深层次的了解其原理;当然,很有可能你们游戏中出现的性能问题,这篇文章并没有指出;不过,这篇文章你还是值得看看的,理解事物的本质对解决问题很有帮助
Introduction to rendering
Before we start, let's go over it quickly to see what happens when unity renders each frame. Understanding the whole process and the timing of each process will help us solve performance problems. The rendering process is as follows: The CPU (central processing unit) calculates which objects need rendering and sets the rendering state for these objects CPU sends graphics rendering instruction to GPU (graphics processing unit) GPU rendered objects "Rendering pipeline" is usually used to describe the rendering process; This is very appropriate and efficient rendering, just like the assembly line in the workshop, no pause and high efficiency. In the process of rendering each frame, the CPU will do the following: Detect whether each object in the scene needs rendering; An object needs to meet certain conditions to be rendered. For example, the object must be within the visual range of the camera (Note: even if only a part is within the visual range, the whole object still needs to be rendered). If it is not within the range, it will be eliminated. If you want to know more about object elimination, please refer to understanding the view frustum The CPU collects and calculates the relevant information of the object to be rendered and sends commands to the GPU (including transferring the mesh and texture to the video memory, setting the rendering state and calling the graphics API). This process is called draw calls The data package created by the CPU for each draw call is called batch. Sometimes a batch contains data other than draw call, but these data have no impact on performance, so these data will not be discussed in this article Each batch contains at least one draw call. The CPU will do the following for the batch: The CPU will issue a change instruction to let the GPU change the rendering state. This instruction is called setpass call. Setpass call tells the GPU which settings to use to render the mesh. The setpass call instruction is issued only when the settings for rendering the next mesh are different from those for rendering the previous mesh The CPU tells the GPU through the draw call command to render the specified mesh according to the rendering settings of the last setpass call In some cases, a batch may need more than one pass, which is a shader code, and the new pass needs to change the rendering state. For each pass in the batch, the CPU must send a new setpass call instruction, and then send a draw call again to inform the GPU to render the mesh according to the newly set rendering state
At the same time, GPU will also do the following: The GPU processes in the order of instructions sent by the CPU to the command buffer If the current instruction is setpass call, the GPU updates the rendering state If the current instruction is draw call, the GPU renders the mesh according to the last set rendering state. There are many stages in the process of rendering mesh, which will not be described here, but you can understand that vertex shader is used to process mesh vertices, and fragment shader is used to process each pixel This process will be repeated until all the instructions in the command buffer are executed After understanding the brief process of unity rendering, let's consider the possible problems in the rendering process
渲染简介
开始之前我们快速的过一遍,看看Unity渲染每一帧时发生了什么,理解整个流程以及每个过程发生的时机会对我们解决性能问题有帮助。渲染流程如下: - CPU(central processing unit)计算哪些物体需要渲染以及为这些物体设置渲染状态
- CPU 发送图形渲染指令给 GPU(graphics processing unit)
- GPU 渲染物体
“渲染管线(rendering pipeline)”通常用来描述渲染的过程;这十分贴切,高效的渲染就像车间的流水线,无停顿,高效率。渲染每一帧的过程中,CPU都会做如下工作: - 检测场景中的每一个物体是否需要渲染;一个物体被渲染需要符合一定的条件。比如:物体必须处于摄像机的可视范围内(注意:哪怕只有一部分处于可视范围,仍然需要渲染整个物体),不在范围内的则剔除,如果想了解更多关于物体剔除方面的内容,请查阅:Understanding the View Frustum
- CPU收集和计算将要被渲染的物体的相关信息并发送命令给GPU(包括将网格和纹理等传送到显存,以及设置渲染状态,调用图形API),这个过程称之为:Draw Calls
- CPU 给每一个 Draw Call 创建的数据包,称之为:Batch(批次),批次有时候会包含一些 Draw Call 以外的数据,但这些数据对性能没有什么影响,因此本文将不会讨论这些数据
每一个批次至少包含一个 Draw Call,CPU 针对批次会做如下事情: - CPU会发出更改指令,让 GPU 更改渲染状态,这个指令称之为:SetPass Call,SetPass Call 告诉 GPU 使用哪些设置去渲染网格。SetPass Call指令只有在渲染下一个网格的设置和渲染上一个网格的设置不一样时,才会发出
- CPU通过 Draw Call 命令告诉 GPU,就按照上一次 SetPass Call 的渲染设置去渲染指定的网格
- 有些情况下,一个批次可能需要不止一个pass,pass是一段shader代码,并且新的 pass 需要更改渲染状态,对于批次中的每个 pass ,CPU必须发送新的 SetPass Call 指令,然后再次发送 Draw Call 通知 GPU 按照新设置的渲染状态去渲染网格
同时,GPU也会做如下事情: - GPU 按照 CPU 发送到 Command buffer 内的指令顺序处理
- 如果当前指令是 SetPass Call,那么GPU更新渲染状态
- 如果当前指令是 Draw Call,那么GPU根据上一次设置的渲染状态来渲染网格。渲染网格的过程有很多阶段,这里就不一一阐述,不过你可以了解一下:顶点着色器(Vertex Shader)是用来处理网格顶点的,而片元着色器(Fagment Shader)是用来处理每一个像素的
- 这个过程会重复执行,直到Command buffer内的指令都被执行完毕
大概了解了Unity渲染的简要流程,让我们来考虑渲染的过程中可能会出现的问题
Rendering problem
The most important thing about rendering is that the CPU and GPU should complete their tasks on time within each frame. If any of them time out, it will cause rendering problems. There are generally two basic reasons for rendering problems: 1. CPU constraints. In the rendering process, the CPU takes too long to prepare data for each frame rendering, resulting in a rendering bottleneck. 2. GPU constraints, the number of rendering is too large, resulting in too long time for GPU to render a frame When there is a performance problem, finding out the cause of the performance problem is the primary task. For different problems, we can give different solutions. Repairing performance problems is actually a balanced work, such as sacrificing memory to improve CPU performance, sacrificing game image quality and solving rendering bottlenecks. We will use the two tools provided by unity to locate the problem: profiler and frame debugger
渲染问题
关于渲染,最重要的一点是:每一帧之内,CPU和GPU都要按时完成自己的任务。他们中任何一个任务超时的话,那就会造成渲染问题。渲染问题一般有两个基本原因:1. CPU 约束,渲染过程中,CPU 为每一帧渲染准备数据花费的时间太长,导致渲染瓶颈 2. GPU 约束,渲染数量过于膨大,导致 GPU 渲染一帧需要花费的时间过长 有果必有因,在性能出问题的时候,找出导致性能问题的原因才是首要任务。针对不同的问题,我们才能给出不同的解决方案。修复性能问题,其实也是一项平衡性的工作,比如:牺牲内存用于提高 CPU 性能,牺牲游戏画质解决渲染瓶颈。我们会使用Unity自带的两个工具来定位问题: Profiler 和 Frame Debugger
CPU bottleneck
Basically, in the process of rendering each frame, the CPU does three things: 1. Decide which objects to render; 2. Prepare data for rendering and set rendering status; 3. Send graphics rendering API. These three types of work include many independent tasks, which may be completed through multithreading; When these tasks are assigned to different threads for execution, we call it multi-threaded rendering Three types of threads participate in the rendering process of unity: main thread, render thread and worker threads. The main thread is used for the main CPU tasks (including some rendering tasks) in our game. The rendering thread is mainly used to send instructions to the GPU, while the auxiliary thread is used to handle separate tasks, such as object culling and mesh skin calculation. Which tasks are executed in which thread depends on the hardware of our game running device and the settings of the game. For example, the more CPU cores in a device, the more worker threads we will have. For this reason, it is very necessary to conduct performance analysis on the target device. Our game performance may be very different on different devices Because multithreaded rendering is very complex and very dependent on hardware conditions, when trying to improve performance, we should understand that those tasks cause CPU bottlenecks. If the game gets stuck because the thread to eliminate whether the object is in the camera window times out, it is useless for us to reduce draw call to improve the game experience Note: not all platforms support multithreaded rendering, but webgl does not. On platforms that do not support multithreading, all tasks are executed in the same thread. If you encounter a CPU bottleneck on this platform, try to optimize all points that may be helpful to CPU performance
CPU瓶颈
基本上,渲染每一帧的过程中,CPU 就干三件事儿:1. 决定渲染哪些物体 2. 为渲染准备好数据以及设置渲染状态 3.发送图形渲染API 。 这三类工作包含很多独立任务,这些任务可能是通过多线程完成的;当这些任务被分配到不同的线程执行时,我们称之为:多线程渲染 Unity的渲染过程中,有三类线程参与:主线程(main thread)、渲染线程(render thread)、辅助线程(worker threads)。主线程用于我们游戏中主要的 CPU 任务(也包括一些渲染任务),渲染线程主要是用来给 GPU 发送指令的,而辅助线程则用来处理单独的任务,比如:物体剔除和网格蒙皮计算。哪些任务执行在哪个线程,取决于我们游戏运行设备的硬件以及游戏的设置。比如:设备的CPU核心数量越多,我们就会线程越多的辅助线程。正是由于这个原因,在目标设备上进行性能分析十分有必要,在不同的设备上,我们的游戏表现可能有天壤之别 由于多线程渲染非常复杂而且非常依赖硬件条件,所以在尝试提升性能的时候,我们要理解是那些任务导致的CPU瓶颈。如果游戏卡顿是因为剔除物体是否在摄像机是窗内的线程超时,那么我们减少 Draw Call 对提升游戏体验,并没有什么卵用 注意:不是所有平台都支持多线程渲染,WebGL就不支持。在不支持多线程的平台,所有的任务都是在同一个线程中执行。如果在这种平台碰到CPU瓶颈问题,那么就试着去优化所有可能对CPU性能有帮助的点
Graphics Jobs
The project settings - > player - > other setting - > rendering - > graphics jobs option allows unity to assign rendering tasks that should be handled by the main thread to the auxiliary thread. On platforms that can use this function, it will bring significant performance improvement (this function can be turned on and off for performance comparison. This function is still a preview version at present)
图形工作
Project Settings ->Player->Other Setting->Rendering-> Graphics jobs 选项可以让Unity将那些本该由主线程处理的渲染任务分配到辅助线程中。在可以使用该功能的平台上,它将带来显著的性能提升(可以开启和关闭这个功能来做性能对比,该功能目前仍是预览版)
Send command to GPU
The process of sending commands to GPU is generally the most common cause of CPU bottleneck. On most platforms, this task is done by rendering threads, and on some platforms, it is done by auxiliary threads (such as PS4) The most time-consuming operation is the setpass call instruction. If the C Pu performance problem is caused by sending instructions to the GPU, reducing the number of setpass calls is usually the most effective way to improve the performance. We can view the number of setpass calls and batches through profiler. How many setpass calls will cause performance problems, which is closely related to the hardware of the game running device. The number of setpass calls that can be sent on high-end PCs is much larger than that of mobile devices The number of setpass calls and the number of corresponding batches depend on many factors, which will be described in detail later. However, in general: Reducing the number of batches or allowing more objects to share the same render state usually reduces the number of setpass calls Reducing the number of setpass calls usually improves CPU performance Reducing batches can improve CPU performance, even if reducing batches does not reduce the number of setpass calls. Because the CPU can process a single batch more effectively. Generally speaking, there are the following ways to reduce the number of batches and setpass calls: Reducing the number of objects to be rendered can reduce the number of batches and setpass calls Reducing the number of times each object is rendered can reduce the number of setpass calls Merging objects that need to be rendered can reduce rendering batches
发送指令到GPU
发送命令到GPU这个过程,一般是CPU瓶颈的最常见原因。大多数平台这个任务是由渲染线程完成的,个别平台是在辅助线程(比如:PS4) 而最耗时的操作是 SetPass Call 指令,如果C PU 性能问题是因为发送指令到 GPU 引起的,那么减少 SetPass Calls 的数量通常是最有效的改善性能的办法。我们可以通过 Profiler 查看到 SetPass Call 和 批次的数量。多少 SetPass Call 会造成性能问题,这个和游戏运行设备的硬件有很大关系,在高端PC上可以发送的 SetPass Call 数量远大于移动设备 SetPass Call 数量以及对应批次数量取决很多因素,稍后会详细介绍。然而一般来说: - 减少批次数量 或者 让更多物体共用相同的渲染状态,通常会减少 SetPass Call 数量
- 减少 SetPass Call 数量,通常能提升 CPU 性能
减少批次能提升 CPU 性能,即便减少批次并没有减少 SetPass Call 的数量。因为CPU 能够更有效的处理单个批次。一般来说有以下方式,能减少批次和 SetPass Call 的数量: - 减少需要渲染的物体数量,能减少批次和SetPass Call 的数量
- 减少每个物体被渲染的次数,能减少 SetPass Call 的数量
- 合并需要渲染的物体,可以减少渲染批次
Reduce the number of objects to render
This is the easiest way to reduce batches and setpass call. The following methods can reduce the number of rendered objects:
Directly reduce the number of visible objects in the scene. For example, there are many characters in the scene that need to be rendered. We can selectively render only some people Use the camera's far clipping planes property to reduce the camera's drawing range. This attribute indicates how far from the camera the object is no longer rendered The occlusion culling technology is used to close the objects completely occluded by other objects, so that the occluded objects do not need to be rendered. Please note: this feature is not applicable to all scenarios, but it does bring considerable performance improvement in some scenarios. In addition, we can realize our own occlusion elimination by manually closing the object. For example, if our scene contains some objects that will appear only after the transition, we should manually hide them before or after the transition to reduce the objects that need to be rendered. Manual culling is often more efficient than dynamic culling provided by unity
减少需要渲染的物体数量
这是减少批次和SetPass Call最简单的方法了,有以下方法可以降低渲染物体的数量: - 直接减少场景中的可见物体数量。比如:场景中有很多人物需要渲染,我们可以选择性的只渲染一部分人
- 使用摄像机的 Far Clipping Planes 属性来降低摄像机的绘制范围。这个属性表示距离摄像机多远的物体不再被渲染
- 通过 Occlusion culling 技术来关闭被其他物体完全遮挡的物体,这样就不用渲染被遮挡的物体了。请注意:这个功能不适用于所有场景,但是在某些场景它确实带来很可观的性能提升。另外,我们可以通过手动关闭物体来实现自己的遮挡剔除。比如:如果我们场景中包含一些过场切换才会出现的物体,那么在过场播放之前或者结束以后,就应该手动隐藏他们以减少需要渲染的物体。手动剔除往往比Unity 提供的动态剔除更高效
Reduce the number of renderings per object
Real time light, shadow, reflection and other effects can greatly improve the realism of the game; But in fact, these effects are very performance consuming, because these effects will cause the object to be rendered many times Our setting of the rendering path of the game also has a substantial impact on the performance consumption of these functions. Rendering path: indicates the execution order of rendering calculation when drawing the scene; The main difference between different rendering paths is how they deal with real time, shadows and reflections. Generally speaking, if our game runs on high-end devices and uses real time, shadow and reflection, delayed rendering is a better choice. Forward rendering is applicable to low-end devices that do not use the above functions. Real time light, shadow and reflection are complex functions. It is best to study relevant topics and fully understand the implementation principle. Refer to: lighting and rendering
No matter which rendering path is selected, real-time light, shadow and reflection will greatly affect the performance, so it is necessary to optimize them: Dynamic lighting is a very complex topic, which is beyond the scope of this article. You can refer to this article for in-depth understanding: Shadow troubleshooting Dynamic lighting consumes a lot of performance. When our scene contains many static objects, we can use baking technology to pre calculate the lighting of the scene and generate lighting maps. For details, click lighting If we want to use real-time shadows, you can submit performance through this article: Shadow cascades. This article describes shadow settings and how these settings affect performance The reflection probe creates a true reflection, but it will affect the batch. Therefore, we'd better minimize the use of it. For the optimization of reflection probe, please refer to: Reflection probe performance
减少每个物体渲染的次数
实时光,阴影,反射等效果可以极大的提高游戏的真实感;但其实这些效果都非常消耗性能,因为这些效果会导致物体被渲染多次 我们对游戏的渲染路径的设置,对这些功能的性能消耗也有实质性的影响。渲染路径:表示绘制场景的时候,渲染计算的执行顺序;不同的渲染路径最主要的区别是他们怎么处理实时光,阴影和反射。通常来说,如果我们的游戏运行在比较高端的设备上,并且运用了实时光,阴影和反射,那么延迟渲染(Deferred Rendering)是比较好的选择。向前渲染(Forward Rendering)适用于不使用以上功能的低端设备。实时光,阴影和反射是个比较复杂的功能,最好能研究相关主题充分了解实现原理,可参考: 灯光和渲染-2019.3不管选择哪种渲染路径,实时光,阴影,反射都会极大的影响性能,所以优化他们十分有必要: Batch object
A batch can contain data of multiple objects. In order to merge batches, objects must meet the following conditions: Share the same material Same rendering state (e.g. texture, shader, etc.) Merging objects does improve performance; At the same time, we should also analyze whether the performance improvement brought by merging objects will lead to greater performance consumption in other aspects. For batch integration, please refer to my previous article: static batch processing, dynamic batch processing and GPU instancing Texture atlas combines a large number of small-size textures into a large one. It is usually used in 2D games and UI systems. Unity has a built-in atlas tool: Sprite packer We can also manually merge meshes sharing materials and textures (you can operate directly through the editor or call API operation at runtime), but we must realize that after we manually merge, objects that would have been eliminated without rendering must be rendered because they share the same mesh, and these objects may also have lighting, Shadow and other more performance consuming operations, so weigh the pros and cons In the script, we must use renderer.material carefully. This interface will copy the material and return the copied reference. This will also destroy the batch because it no longer has the same material reference as other objects. If we must access the material of the batch object, we should use renderer.sharematerial
合批物体 一个批次可以包含多个物体的数据,为了能合并批次,物体必须满足如下条件: - 共享相同的材质
- 一样的渲染状态(比如:纹理,shader等)
合并物体确实可以提高性能;同时我们也要分析,合并物体带来的性能提高会不会反而造成其他方面更大的性能消耗。 纹理图集(Texture Atlasing),是把大量的小尺寸的纹理合并成一张大的纹理。它通常在2D游戏以及UI系统中使用,Unity内置了图集工具:Sprite Packer 我们也可以手动合并共享材质和纹理的网格(可通过编辑器直接操作,也可以在运行时调用API操作),但是我们必须意识到,有可能我们手动合并以后,本来会被剔除而无需渲染的物体,因为共享相同的网格而必须渲染,而这些物体可能还有光照,阴影等更加消耗性能的操作,所以要权衡利弊 在脚本中,我们必须小心使用Renderer.material,这个接口会拷贝材质,并返回拷贝后的引用。这样做也会破坏合批,因为它和其他物体不再有相同的材质引用了。如果我们一定要访问合批物体的材质,应该使用Renderer.shareMaterial
skinned mesh Skinnedmeshrenderers are usually used in mesh animation. The task of rendering skin is usually in the main thread or a separate auxiliary thread (depending on the game settings and the target device hardware). Rendering skin is a very performance consuming action. If you see that rendering skin causes performance bottleneck to CPU in profiler, here are several methods we can improve performance Consider whether it is really necessary to use skin mesh rendering components. If the object does not need motion, use ordinary mesh rendering components (meshrenderer) as much as possible If we only move objects at certain times, we should use meshes with less details (the skinnedmeshrenderers component has a function bakemesh, which can create a mesh with matching actions) For skin mesh optimization, refer to skinned mesh renderer. (skin mesh consumption is on each vertex, so using a model with fewer vertices and reducing the number of bones can also improve performance.) On some platforms, skins can be processed quickly by GPU. If the target device has a strong GPU, you can enable GPU skinning for the current platform
蒙皮网格 SkinnedMeshRenderers通常用在网格动画中,渲染蒙皮的任务通常在主线程或者单独的辅助线程(取决于游戏的设置以及目标设备硬件)。渲染蒙皮是个很消耗性能的动作,如果在Profiler中看到是渲染蒙皮对CPU造成性能瓶颈,这里有几个方法我们可以改进性能 - 考虑是否真的有必要使用蒙皮网格渲染组件,如果物体并不需要运动,尽可能使用普通的网格渲染组件(MeshRenderer)
- 如果我们只在某些时刻运动物体,我们应该用细节较少的网格,(SkinnedMeshRenderers组件有一个函数BakeMesh,可以用匹配的动作创建一个网格)
- 关于蒙皮网格的优化,可以参考:Skinned Mesh Renderer。(蒙皮网格消耗是在每个顶点上,因此,使用顶点较少的模型以及减少骨骼数量也可以提高性能)
- 在某些平台,蒙皮可以被GPU快速处理。如果目标设备GPU比较强,则可以对当前平台开启 GPU Skinning
GPU bottleneck
If the game is limited by GPU performance, you must first find the cause of GPU bottleneck. The most common problem with general GPU performance is the fill rate limit, especially on mobile platforms. Of course, video memory bandwidth and vertex processing may also affect
GPU瓶颈
如果游戏是GPU性能限制,那么首先就得找到GPU瓶颈原因。一般GPU性能最常见的问题是填充率限制,尤其是移动平台。当然显存带宽和顶点处理也可能影响
Filling rate Fill rate refers to the number of pixels that GPU can render per second on the screen; If the filling rate causes GPU performance problems in our game, it means that the number of pixels we try to draw per frame exceeds the processing capacity of the GPU. It is simple to check whether the filling rate causes GPU performance problems: Open profiler and pay attention to GPU time Reset rendering resolution screen.setresolution (width, height, true) Re open profiler. If the GPU performance is improved, the high probability is the reason for the filling rate If it is a filling rate problem, we have the following methods to solve this problem Slice shader is the shader code that tells GPU how to draw each pixel. If this code is inefficient, it is prone to performance problems. Complex slice shader is a common cause of filling rate problems If our game uses unity's built-in shader, we should use the mobile shaders for mobile platforms If the game uses unity's standard shaders, it should be understood that unity compiles these shaders based on the current material settings, and only those currently used functions will be compiled. This means that removing detail maps can reduce the complexity of the slice shader If you are using a custom shader, you should optimize it as much as possible. Optimizing shaders is a big topic. You can refer to: optimizations Overdraw means that pixels in the same position are drawn multiple times. It usually happens when one object is above another. In order to understand overdraw, we must understand the order in which unity draws objects in the scene. The shader of an object determines the drawing order of the object, which is usually determined by the render queue attribute. The most common factors causing overdraw are transparent materials, non optimized particles and overlapping UI elements. For overdraw optimization, please refer to the optimizing unity UI
填充率
填充率是指 GPU 在屏幕上每秒可以渲染的像素数量;如果我们游戏是因为填充率导致的GPU性能问题,那么意味着我们游戏每帧尝试绘制的像素数量超过了 GPU 的处理能力,检查是否填充率引起的GPU性能问题其实很简单: - 打开Profiler,注意GPU时间
- 重新设置渲染分辨率 Screen.SetResolution(width,height,true)
- 重新打开Profiler,如果GPU性能提升,那么大概率就是填充率的原因了
如果是填充率问题,那么我们有如下几个方法解决这个问题 - 片元着色器是告诉 GPU 怎么去绘制每一个像素的shader代码,如果这段代码效率低,那么就容易发生性能问题,复杂的片元着色器是很常见的引起填充率问题的原因
- 如果我们游戏使用了Unity内置的Shader,那么应该使用针对移动平台的 the mobile shaders
- 如果游戏使用的是Unity的Standard Shader,那么要理解 Unity 编译这些shader是基于当前材质设置,只有那些当前使用的功能才会被编译。这意味着,移除 detail maps可以减少片元着色器的复杂度
- 如果使用的是定制的shader,那么应该尽量优化它。优化shader是个很大的话题,可以参考:Optimizations
- Overdraw 是指相同位置的像素被绘制了多次。一般发生在某个物体在其他物体之上。为了理解 Overdraw,我们必须理解 Unity 在场景中绘制物体的顺序。物体的 shader 决定了物体的绘制顺序,通常由 render queue 属性决定。最常见引起 Overdraw 的因素是透明材质,未优化的粒子以及重叠的UI元素。关于Overdraw优化,可以参考:Optimizing Unity UI
Memory bandwidth
Video memory bandwidth refers to the speed at which GPU can read and write special memory. If our game is limited by video memory bandwidth, it usually means that the texture we use is too large. We can use the following methods to detect whether it is a video memory bandwidth problem: Open profiler and pay attention to various data of GPU Project settings - > Quality - > texture quality to set the texture quality and reduce the texture quality of the current platform Re open profiler and re view various data of GPU. If the performance is improved, it is the problem of video memory bandwidth If it is a video memory bandwidth problem, we need to reduce the memory consumption of textures. There are usually different solutions for different games. Here we provide several methods to optimize textures Texture compression technology can greatly reduce the occupation of texture in memory at the same time. Texture import setting describes the details of texture compression formats and various settings Mipmaps, multi-level fade texture is unity's low resolution texture for distant objects. The mipmaps draw mode in the unity scene view allows us to see which objects are suitable for multi-level fade textures. In fact, the main purpose of mipmap is to improve the quality. If there is no mipmap with large texture and small sampling frequency, the quality of the model will be very poor (the texture pixels sampled by two adjacent screen pixels are very different, which will greatly reduce the cache hit rate)
显存带宽(Memory bandwidth)
显存带宽是指GPU读写专用内存的速度,如果我们的游戏受限于显存带宽,通常意味着我们使用的纹理太大了,我们可以用如下方法检测是否是显存带宽问题: - 打开Profiler,并关注GPU各项数据
- Project Settings->Quality->Texture Quality,设置纹理质量,降低当前平台的纹理质量
- 重新打开Profiler,重新查看GPU各项数据。如果性能改善,那么就是显存带宽问题
如果是显存带宽问题,那么我们需要降低纹理的内存占用,针对不同游戏通常有不同的解决方案,这里我们提供几个优化纹理的方法 - 纹理压缩技术可以同时极大的降低纹理在内存中的占用。Texture Import Setting讲述了纹理压缩格式和各种设置的详细信息
- Mipmaps,多级渐远纹理是 Unity 对远处物体使用低分辨率纹理。Unity场景视图中的 The Mipmaps Draw Mode 允许我们查看哪些物体适用多级渐远纹理。其实,Mipmap最主要的目的是为了提高质量,如果没有mipmap 纹理很大,采样频率却很小的情况下,模型看去来质量会很差(相邻的两个屏幕像素采样的纹素差的很远,此时会大大降低缓存命中率)
Vertex Processing
Vertex processing refers to GPU processing each vertex in the mesh. The consumption of vertex processing is mainly affected by two factors: the number of vertices and the complexity of operating each vertex. There are several ways to optimize this: Reduce mesh complexity Use normal map to simulate meshes with higher geometric complexity. For normal map, refer to: normal map If the game does not use normal mapping, in the settings of mesh import, you can turn off the tangent of vertices, which can reduce the amount of data per vertex LOD, the technology of reducing the complexity of object mesh when the object is far away from the camera, can effectively reduce the number of vertices to be rendered by GPU and will not affect the visual performance. For details, please refer to lodgroup Vertex shader is a piece of shder code. Reducing its complexity can improve performance
顶点处理 顶点处理是指 GPU 处理网格中的每一个顶点,顶点处理的消耗主要受两个因素的影响:顶点数量以及操作每个顶点的复杂度。有一些方法可以优化这个: - 降低网格复杂度
- 使用法线贴图模拟更高几何复杂度的网格,关于法线贴图可以参考:Normal map
- 如果游戏未使用法线贴图,在网格导入的设置中,可以关闭顶点的切线,这可以降低每个顶点的数据量
- LOD,当物体远离摄像机的时候,降低物体网格的复杂度的技术,可以有效的降低 GPU 需要渲染的顶点数量,并且不会影响视觉表现,具体细节请参考:LODGroup
- 顶点着色器,是一段shder代码,降低它的复杂度,可以提升性能
|