长相思番外婚后:Google VP8 Code 首次深入技术分析

来源:百度文库 编辑:九乡新闻网 时间:2024/04/29 02:03:01

翻译:Google VP8 Code 首次深入技术分析

翻译来自:唐福林 博客雨 http://blog.fulin.org/2010/05/vp8_first_in_depth_tech_analysis.html

注1:文章来自:http://x264dev.multimedia.cx/?p=377 ,一个 H264 开发者对 VP8 的深入分析。

注2:在Google翻译基础上试译,不足之处请多包涵,欢迎各种建议和意见

The first in-depth technical analysis of VP8

首次深入技术分析

Back in my original post about Internet video, I made some initial comments on the hope that VP8 would solve the problems of web video by providing a supposed patent-free video format with significantly better compression than the current options of Theora and Dirac.

在我原来写的关于互联网视频的博文中,我曾做了一些初步的设想,希望 VP8提供的免专利费的视频压缩视频格式,能显著的超过当前的选择 Theora 和 Dirac,解决当前互联网视频的难题。

Fortunately, it seems I was able to acquire access to the VP8 spec, software, and source a good few days before the official release and so was able to perform a detailed technical analysis in time for the official release.

幸运的是,在官方正式发布前几天,我似乎能够获得VP8规范,软件和源代码,因此能够在正式发布前对它做一个详细的技术分析。

The questions I will try to answer here are:

我会在这里尽量回答这些问题:

1. How good is VP8? Is the file format actually better than H.264 in terms of compression, and could a good VP8 encoder beat x264?

1。VP8有多好?在压缩比方面VP8文件格式真的比 H.264要好,一个好的VP8编码器可以击败x264吗?

On2 claimed 50% better than H.264, but On2 has always made absurd claims that they were never able to back up with results, so such a number is almost surely wrong.

On2公司声称VP8超过H.264标准  50%,但On2公司经常发表荒谬的声明但却从来不能用事实来证明它,所以这样的数字几乎肯定是错误的。

VP7, for example, was claimed to be 15% better than H.264 while being much faster, but was in reality neither faster nor higher quality.

例如VP7,据称超过H.264标准 15%,而且快得多,但实际上既不快,也不是质量更高。

2. How good is On2’s VP8 implementation? Irrespective of how good the spec is, is the implementation good, or is this going to be just like VP3, where On2 releases an unusably bad implementation with the hope that the community will fix it for them?

2。On2公司的VP8的实现如何?先不论规范多么好,它的实现是否足够好,还是和曾经的VP3一样,On2公司发布了一个几乎不能使用的差劲实现,希望开发社区来帮助完善?

Let’s hope not; it took 6 years to fix Theora!

希望不会,修复Theora就已经花了6年的时间!

3. How likely is VP8 to actually be free of patents? Even if VP8 is worse than H.264, being patent-free is still a useful attribute for obvious reasons.

3。VP8的专利免费有多确定?即使VP8不如H.264,专利免费仍然是一个非常吸引人的属性。

But as noted in my previous post, merely being published by Google doesn’t guarantee that it is.

但正如我在前面的博文里提到的,Google 发布它并不能保证它是免费的。

Microsoft did similar a few years ago with the release of VC-1, which was claimed to be patent-free — but within mere months after release, a whole bunch of companies claimed patents on it and soon enough a patent pool was formed.

微软在几年前做了类似的事情,发布了VC-1,声称是免专利费的。 但仅仅过了几个月,一大堆的公司声称拥有关于它的专利,并很快成立了一个专利池。

We’ll start by going through the core features of VP8.

我们将首先浏览一遍VP8的核心特点。

We’ll primarily analyze them by comparing to existing video formats. Keep in mind that an encoder and a spec are two different things: it’s possible for good encoder to be written for a bad spec or vice versa! Hence why a really good MPEG-1 encoder can beat a horrific H.264 encoder.

我们主要分析它们的格式,跟现有的格式做比较。 记住,编码器和规范是两个不同的东西:可能存在根据差的规范写出好的编码器实现,也可能反过来!所以为什么一个很好的MPEG – 1编码器可以击败一个差劲的H.264编码器。

But first, a comment on the spec itself.

但首先,关于规范本身的评论。

The spec consists largely of C code copy-pasted from the VP8 source code — up to and including TODOs, “optimizations”, and even C-specific hacks, such as workarounds for the undefined behavior of signed right shift on negative numbers.

这个规范里存在大量的从 VP8 源代码里拷贝过来的C语言代码,甚至包括一些 TODO,“优化”,甚至一些只在C语言里存在的技巧,比如负数的带符号右移的替代方法。

In many places it is simply outright opaque. Copy-pasted C code is not a spec. I may have complained about the H.264 spec being overly verbose, but at least it’s precise.

在许多地方简直是彻头彻尾不透明。 复制粘贴的C代码不是规范。我可能抱怨过冗长的H.264规格,但至少它是精确的。

The VP8 spec, by comparison, is imprecise, unclear, and overly short, leaving many portions of the format very vaguely explained.

相比之下,VP8 的规范是不确切的,不明确,过于短,还留下许多关于格式的很含糊的解释。

Some parts even explicitly refuse to fully explain a particular feature, pointing to highly-optimized, nigh-impossible-to-understand reference code for an explanation.

有些地方甚至明确拒绝完全解释一个特定的功能,却给了一个高度优化,无法理解的参考代码实现作为解释。

There’s no way in hell anyone could write a decoder solely with this spec alone.

所以根本没有人可以根据本规范来单独的实现一个解码器。

Now that I’ve gotten that out of my system, let’s get back to VP8 itself.

现在我已经摆脱了它(指规范?译者注)了,让我们回到VP8本身。

To begin with, to get a general sense for where all this fits in, basically all modern video formats work via some variation on the following chain of steps:

首先,为了获得一个大概的印象,基本上所有的现代的视频格式,都包含以下步骤的工作:

Encode: Predict -> Transform + Quant -> Entropy Code -> Loopfilter

编码:预测 – > 变换 – > 熵编码 – > 回放过滤
Decode: Entropy Decode -> Predict -> Dequant + Inverse Transform -> Loopfilter

解码:熵解码 – > 预测 – > 逆变换 – > 回放过滤

If you’re looking to just get to the results and skip the gritty technical details, make sure to check out the “overall verdict” section and the “visual results” section. Or at least skip to the “summary for the lazy”.

如果你只是希望得到结果,跳过这些繁琐的技术细节,请直接跳到“整体判决”部分和“视觉效果”一节。或者至少跳到“总结”一节。

Prediction 预测

Prediction is any step which attempts to guess the content of an area of the frame. This could include functions based on already-known pixels in the same frame (eg inpainting) or motion compensation from a previous frame.

任何试图猜测帧的某个部分的内容的步骤,都称为预测。包括前帧运动补偿功能,以及基于帧内已知像素的各种功能。

Prediction usually involves side data, such as a signal telling the decoder a motion vector to use for said motion compensation.

预测通常与边界数据有关,例如用于运动补偿的运动矢量。
Intra Prediction 帧内预测

Intra prediction is used to guess the content of a block without referring to other frames. VP8’s intra prediction is basically ripped off wholesale from H.264: the “subblock” prediction modes are almost exactly identical (they even have the same names!) to H.264’s i4×4 mode, and the whole block prediction mode is basically identical to i16×16.

帧内预测是用来猜测块的内容,不使用任何其它帧的信息。VP8的帧内预测基本上跟H.264是一样的:“子区块”预测模式几乎跟H.264的为I4 × 4模式一模一样,(他们甚至有相同的名字!)完整块预测模式跟i16 × 16基本一致。

Chroma prediction modes are practically identical as well.

色度预测模式也几乎相同。

i8×8, from H.264 High Profile, is not present.

H.264高级优化的i8× 8模式,在VP8中不存在。

An additional difference is that the planar prediction mode has been replaced with TM_PRED, a very vaguely similar analogue.

另外一个区别是,H.264的平面预测模式被TM_PRED,一个非常模糊的相似模拟取代。

The specific prediction modes are internally slightly different, but have the same names as in H.264.

特殊预测模式在内部有一些差别,但和H.264有同样的名称。

Honestly, I’m very disappointed here.

老实说,关于这个我非常失望。

While H.264’s intra prediction is good, it has certainly been improved on quite a bit over the past 7 years, and I thought that blatantly ripping it off was the domain of companies like Real (see RV40 ).

虽然H.264的帧内预测是好的,但在过去的7年里肯定有已经被改善的地方,我以为这样公然的抄袭只是某些公司如Real的行为(见RV40 )。

I expected at least something slightly more creative out of On2.

我以为On2公司至少会做一些小的改进。

But more important than any of that: this is a patent time-bomb waiting to happen.

但更重要的是:这个仍在专利时间内,炸弹在等待爆炸。

H.264’s spatial intra prediction is covered in patents and I don’t think that On2 will be able to just get away with changing the rounding in the prediction modes.

H.264的基于空间的帧内预测是受专利保护的,我不认为On2公司仅仅换一种方式实现就能绕过去专利。

Verdict on Intra Prediction: Slightly modified ripoff of H.264.

帧内预测结论:抄袭自H.264, 略加修改。

Somewhat worse than H.264 due to omission of i8×8.

因为遗漏了i8×8,比 H.264 稍差。
Inter Prediction 帧间预测

Inter prediction is used to guess the content of a block by referring to past frames. There are two primary components to inter prediction: reference frames and motion vectors.

帧间预测是利用前面的帧的信息来猜测块的内容。有两个主要组件:参考帧和运动矢量。

The reference frame is a past frame from which to grab pixels from and the motion vectors index an offset into that frame.

参考帧是过去的某个帧,可以从中提取某些像素,以及运动矢量偏移索引。

VP8 supports a total of 3 reference frames: the previous frame, the “alt ref” frame, and the “golden frame”.

VP8支持三种参考帧:前一帧,替代帧,和黄金帧。

For motion vectors, VP8 supports variable-size partitions much like H.264.

对于运动向量,VP8支持可变大小的分区,跟H.264一样。

For subpixel precision, it supports quarter-pel motion vectors with a 6-tap interpolation filter.

对于亚像素精度,它支持与六抽头插值滤波器四分之一像素运动矢量。

In short: 简而言之:

VP8 reference frames: up to 3       VP8参照帧:最多3个
H.264 reference frames: up to 16   H.264参考帧:多达16个
VP8 partition types: 16×16, 16×8, 8×16, 8×8, 4×4    VP8分区类型:16 × 16,16 × 8,8 × 16,8 × 8,4 × 4
H.264 partition types: 16×16, 16×8, 8×16, flexible subpartitions (each 8×8 can be 8×8, 8×4, 4×8, or 4×4).   H.264的分区类型:16 × 16,16 × 8,8 × 16,灵活的子分区(每个8 × 8 可以是8 × 8,8 × 4,4 × 8或4 × 4)。
VP8 chroma MV derivation: each 4×4 chroma block uses the average of colocated luma MVs (same as MPEG-4 ASP)   VP8色度推导:每个4 × 4色度块使用了同位亮度的MV(与MPEG – 4 ASP 相同)
H.264 chroma MV derivation: chroma uses luma MVs directly   H.264的色度压推导:直接使用原始色度
VP8 interpolation filter: qpel, 6-tap luma, mixed 4/6-tap chroma VP8插值过滤器:qpel,6-tap luma,混合4/6-tap chroma
H.264 interpolation filter: qpel, 6-tap luma (staged filter), bilinear chroma  H.264的插值过滤器:qpel,6-tap luma(阶段过滤),双线性色度
H.264 has but VP8 doesn’t: B-frames, weighted prediction H.264有但VP8没有:B帧,加权预测

H.264 has a significantly better and more flexible referencing structure.

H.264 显然更好,有着更灵活的参考架构。

Sub-8×8 partitions are mostly unnecessary, so VP8’s omission of the H.264-style subpartitions has little consequence.

二级8 × 8分区大多是不必要的,所以VP8遗漏了H.264一些特性也不会有太大的影响。

The chroma MV derivation is more accurate in H.264 but slightly slower; in practice the difference is probably near-zero both speed and compression-wise, since sub-8×8 luma partitions are rarely used (and I would suspect the same carries over to VP8).

色度推导方面 H.264中更准确,但略为缓慢。在实践中,速度和压缩比方面的差别几乎为零,因为二级8 × 8亮度分区很少使用(我怀疑跟VP8的负载相同)。

The VP8 interpolation filter is likely slightly better, but will definitely be slower to implement, both encoder and decoder-side.

VP8的插值过滤器可能稍好,但实现起来肯定会慢,包括编码器端和解码器端。

A staged filter allows the encoder to precalculate all possible halfpel positions and then quickly calculate qpel positions when necessary: an unstaged filter does not, making subpel motion estimation much slower.

阶段过滤器允许编码器预先计算所有可能halfpel位置,然后在需要的时刻快速计算qpel位置,非阶段过滤器无法这么做,使得亚像素运动估计功能会慢得多。

Not that unstaged filters are bad — staged filters have basically been abandoned for all of the H.265 proposals — it’s just an inherent disadvantage performance-wise.

这并不是说非阶段过滤器不好。阶段过滤器已被所有的H.265建议放弃了,它有着先天的性能劣势。

Additionally, having as high as 6 taps on chroma is, IMO, completely unnecessary and wasteful.

此外,拥有高达六色度,是完全不必要的浪费。

The lack of B-frames in VP8 is a killer. B-frames can give 10-20% (or more) compression benefit for minimal speed cost; their omission in VP8 probably costs more compression than all other problems noted in this post combined. This was not unexpected, however; On2 has never used B-frames in any of their video formats. They also likely present serious patent problems, though On2 doesn’t seem to have a problem ripping off other heavily patented features from H.264. Lack of weighted prediction is also going to hurt a bit, especially in fades.

VP8 缺乏B -帧,是致命的。B帧在编解码速度损失最小的同时可以给予10-20%(或以上)的压缩收益,VP8遗漏B帧后在压缩上付出的代价,可能比本篇博文中提到 的所有的问题加起来都要大。这并不令人意外,因为,On2公司从未在他们的任何视频格式上使用过的B帧。它们可能产生严重的专利问题,但On2公司从 H.264中抄袭其他专利保护的特性并没有考虑到专利问题。缺乏权重预测也会影响一些,特别在淡入淡出方面。

Verdict on Inter Prediction: Similar partitioning structure to H.264.

帧间预测结论:与H.264分区结构类似。

Much weaker referencing structure. 弱得多的引用结构。

More complex, slightly better interpolation filter. 更复杂的,稍微好一点插值过滤器。

Mostly a wash — except for the lack of B-frames, which is seriously going to hurt compression. 主要是a wash(?)除了缺乏B帧,会严重伤害压缩比。

Transform and Quantization 转换和量化

After prediction, the encoder takes the difference between the prediction and the actual source pixels (the residual ), transforms it, and quantizes it. The transform step is designed to make the data more amenable to compression by decorrelating it.

经过预测,编码器取得了预测值与实际值之间的差别,转换,量化。转换步骤用去相关性来使数据更适合压缩。

The quantization step is the actual information-losing step where compression occurs; the output values of transform are rounded, mostly to zero, leaving only a few integer coefficients.

量化步骤会进行实际的压缩动作,部分信息会丢失。转换步骤的输出值经过取整,大部分是零,只留下几个整数系数。
Transform 转换

For transform, VP8 uses again a very H.264-reminiscent scheme. Each 16×16 macroblock is divided into 16 4×4 DCT blocks, each of which is transformed by a bit-exact DCT approximation.

对于转换,VP8也是利用一个与H.264非常类似的方案。每个16 × 16块划分为16个4 × 4的DCT块,每个块由一个位准确的DCT作近似转化。

Then, the DC coefficients of each block are collected into another 4×4 group, which is then Hadamard-transformed.

然后,每个块的DC系数被收集到另一个4 × 4组,再对这个组做Hadamard转化。

OK, so this isn’t reminiscent of H.264, this is H.264.

好了,这不是与H.264类似 ,这就是 H.264标准。

There are, however, 3 differences between VP8’s scheme and H.264’s.

但VP8的做法与H.264有三个不同:

The first is that the 8×8 transform is omitted entirely (fitting with the omission of the i8×8 intra mode).

第一个是完全省略了8 × 8变换(因为缺乏i8 × 8 模式)。

The second is the specifics of the transform itself.

第二是转换规范自身。

H.264 uses an extremely simplified “DCT” which is so un-DCT-like that it often referred to as the HCT (H.264 Cosine Transform) instead.

H.264标准采用了非常简化的“DCT”,与标准的DCT差别很大,以至于经常被称为HCT(H.264的余弦变换)。

This simplified transform results in roughly 1% worse compression, but greatly simplifies the transform itself, which can be implemented entirely with adds, subtracts, and right shifts b 1.

这种简化的转换在压缩上比原来差大约1%,但大大简化了转换本身,使得仅仅用加,减,右移一位操作就能实现。

VC-1 uses a more accurate version that relies on a few small multiplies (numbers like 17, 22, 10, etc).

VC – 1使用一个更精确的版本,依赖一些小数字的乘法(如17,22,10等)。

VP8 uses an extremely, needlessly accurate version that uses very large multiplies (20091 and 35468).

VP8使用一个非常精确,而且不太必要的版本,使用非常大的数字乘法(20091和35468)。

This in retrospect is not surpising, as it is very similar to what VP3 used.

回想起来,这并让人惊讶,因为它非常类似于VP3的做法。

The third difference is that the Hadamard hierarchical transform is applied for some inter blocks, not merely i16×16.

第三个区别是,Hadamard 变换在一些内部块上实现 ,不仅仅是i16 × 16。

In particular, it also runs for p16×16 blocks.

特别是,它在 p16× 16块上也实现了。

While this is definitely a good idea, especially given the small transform size (and the need to decorrelate the DC value between the small transforms), I’m not quite sure I agree with the decision to limit it to p16×16 blocks; it seems that perhaps with a small amount of modification this could also be useful for other motion partitions.

虽然这无疑是一个好主意,特别是考虑到小规模转换(和去相关的小型DC值之间的转换需求),我不能肯定我是否同意将转换限制在 p16×16块上。看起来,经过少量的修改,这可能也可以用在其他的运动块上。

Also, note that unlike H.264, the hierarchical transform is luma-only and not applied to chroma.

此外,请注意与H.264不同,层次变换是只在亮度上的,不应用于色度。

Overall, the transform scheme in VP8 is definitely weaker than in H.264.

总体来说,VP8的转换方案肯定比H.264弱。

The lack of an 8×8 transform is going to have a significant impact on detail retention, especially at high resolutions.

缺乏8 × 8变换将对细节保留产生相当大的影响,特别是在高分辨率下。

The transform is needlessly slower than necessary as well, though a shift-based transform might be out of the question due to patents.

转换是不必要的,比需要的慢,即使因为专利原因不能考虑基于移位的转换。

The one good new idea here is applying the hierarchical DC transform to inter blocks.

这里一个好的新思路是在块间采用分层DC转换。

Verdict on Transform: Similar to H.264.

关于转换的结论:类似H.264标准。

Slower, slightly more accurate 4×4 transform.

稍慢,稍微准确一些的4 × 4变换。

Improved DC transform for luma (but not on chroma). 改进了的DC亮度变换(但不变换色度)。

No 8×8 transform. 没有8 × 8变换。

Overall, worse. 总体而言,更糟。

Quantization 量化

For quantization, the core process is basically the same among all MPEG-like video formats, and VP8 is no exception. The primary ways that video formats tend to differentiate themselves here is by varying quantization scaling factors.

对于量化,所有与MPEG类似的视频格式,核心处理流程基本上是相同的,VP8也不例外。在这个方面,视频格式之间的区别仅仅是不同的量化尺度因子。

There are two ways in which this is primarily done: frame-based offsets that apply to all coefficients or just some portion of them, and macroblock-level offsets.

主要有两种做法:适用于所有的系数或只是其中一些部分的基于帧的偏移量,和宏块级的偏移。

VP8 primarily uses the former; in a scheme much less flexible than H.264’s custom quantization matrices, it allows for adjusting the quantizer of luma DC, luma AC, chroma DC, and so forth, separately.

VP8主要使用前者,在方案上远远没有H.264的自定义量化矩阵灵活,它允许分别调整亮度量化DC,亮度AC,色度DC,等等。

The latter (macroblock-level quantizer choice) can, in theory, be done using its “segmentation map” features, albeit very hackily and not very efficiently.

后者(宏块级量化)在理论上,可以用“分割地图”的特性来实现,虽然很hackily,而且不是非常高效。

The killer mistake that VP8 has made here is not making macroblock-level quantization a core feature of VP8.

VP8在这里犯的致命的错误是没有把宏块级量化作为其核心功能。

Algorithms that take advantage of macroblock-level quantization are known as “adaptive quantization” and are absolutely critical to competitive visual quality.

利用块级别的量化优势的算法被称为“自适应量化”,在视觉质量上是有绝对优势的。

My implementation of variance-based adaptive quantization ( before , after ) in x264 still stands to this day as the single largest visual quality gain in x264 history.

我在x264中实现的基于方差的自适应量化直到今天还是x264历史上最大的单个视觉质量收益算法。

Encoder comparisons have showed over and over that encoders without adaptive quantization simply cannot compete.

编码器比较一次又一次的证明了,没有自适应量化的编码器根本无法竞争。

Thus, while adaptive quantization is possible in VP8, the only way to implement it is to define one segment map for every single quantizer that one wants and to code the segment map index for every macroblock.

因此,虽然在VP8中实现自适应量化是可行的,但唯一的实现办法就是为每一个转换器定义一个段地图,为每一个宏块计算在段地图中的索引。

This is inefficient and cumbersome; even the relatively suboptimal MPEG-style delta quantizer system would be a better option. Furthermore, only 4 segment maps are allowed, for a maximum of 4 quantizers per frame.

这是低效的和累赘的,甚至相对次优的MPEG 风格的三角量化器都会是一个更好的选择。此外,只允许4个段地图,因为最高每帧只能有4个量化器。

Verdict on Quantization: Lack of well-integrated adaptive quantization is going to be a killer when the time comes to implement psy optimizations.

关于量化的结论:缺乏良好的自适应量化在以后实施物理优化的时候将是致命的。

Overall, much worse. 总体而言,糟糕得多。
Entropy Coding 熵编码

Entropy coding is the process of taking all the information from all the other processes: DCT coefficients, prediction modes, motion vectors, and so forth — and compressing them losslessly into the final output file. VP8 uses an arithmetic coder somewhat similar to H.264’s, but with a few critical differences.

熵编码获取其他过程的所有信息的过程:DCT系数,预测模式,运动矢量,等等,将它们无损压缩,最后输出到文件。VP8使用算术编码器有点类似H.264的,但有几个关键的差异。

First, it omits the range/probability table in favor of a multiplication.

首先,它忽略了范围/概率表而采用了乘法。

Second, it is entirely non-adaptive: unlike H.264’s, which adapts after every bit decoded, probability values are constant over the course of the frame.

第二,与H.264的不同,它是完全非自适应的,H.264能够适应解码后的每一个bit,概率值在解码过程中保持不变。

Accordingly, the encoder may periodically send updated probability values in frame headers for some syntax elements.

因此,为了语法需要,编码器可能会定期发送一些概率值的更新到帧头。

Keyframes reset the probability values to the defaults. 关键帧会将概率值重置为默认值。

This approach isn’t surprising; VP5 and VP6 (and probably VP7) also used non-adaptive arithmetic coders.

这种做法并不奇怪; VP5和VP6(或许还包括VP7)也使用非自适应算术编码器。

How much of a penalty this actually means compression-wise is unknown; it’s not easy to measure given the design of either H.264 or VP8.

这种做法在压缩比上的损失是未知的,鉴于H.264和VP8的设计,它不容易衡量。

More importantly, I question the reason for this: making it adaptive would add just one single table lookup to the arithmetic decoding function — hardly a very large performance impact.

更重要的是,我问这个问题的原因是:只要在算术解码功能中增加一个查询表就实现自适应解码,几乎没有性能影响。

Of course, the arithmetic coder is not the only part of entropy coding: an arithmetic coder merely turns 0s and 1s into an output bitstream.

当然,算术编码器只是熵编码的一部分:算术编码器只是将0和1输出到一个bit流。

The process of creating those 0s and 1s and selecting the probabilities for the encoder to use is an equally interesting problem.

创造这些0和1,并为编码器选择概率的过程,同样是一个有趣的问题。

Since this is a very complicated part of the video format, I’ll just comment on the parts that I found particularly notable.

鉴于这是视频格式中非常复杂的一部分,我只就我发现特别重要的地方进行评论。

Motion vector coding consists of two parts: prediction based on neighboring motion vectors and the actual compression of the resulting delta between that and the actual motion vector.

运动矢量的编码分为两部分:基于临近运动矢量的预测,以及其与实际运动矢量差值的压缩。

The prediction scheme in VP8 is a bit odd — worse, the section of the spec covering this contains no English explanation, just confusingly-written C code.

VP8预测方案有点奇怪,糟糕的是,在规范中没有涉及这个的英文的解释,只是笼统的C代码。

As far as I can tell, it chooses an arithmetic coding context based on the neighboring MVs, then decides which of the predicted motion vectors to use, or whether to code a delta instead.

据我所知道的,它基于临近的MV选择了一个算术编码上下文,然后决定使用哪个运动矢量,或是否计算出一个差值来代替。

The downside of this scheme is that, like in VP3/Theora (though not nearly as badly), it biases heavily towards the re-use of previous motion vectors. This is dangerous because, as the Theora devs have recently found (and fixed to some extent in Theora 1.2 aka Ptalabvorm), any situation in which the encoder picks a motion vector which isn’t the “real” motion vector in order to save bits can potentially have negative visual consequences. In terms of raw efficiency, I’m not sure whether VP8 or H.264’s prediction is better here.

这个方案的缺点是,像在VP3/Theora(虽然没有那么严重)一样,当重新使用以前的运动矢量的时候会产生严重的偏差。这是危险的,正如 Theora开发者们最近发现(在Theora 1.2 的一些扩展中被Ptalabvorm修正),为了减少处理的数据位,在任何情况下,编码器选取的运动矢量都不是“真正”的运动矢量,这就有可能在视觉上产 生消极后果。我不知道VP8和 H.264的预测效率哪个更好。

The compression of the resulting delta is similar to H.264, except for the coding of very large deltas, which is slightly better (similar to FFV1’s Golomb -like arithmetic codes).

结果差值的压缩与H.264相似,除了的编码非常大的增量会稍微好一点(类似 FFV1类似Golomb的算术编码)。

Intra prediction mode coding is done using arithmetic coding contexts based on the modes of the neighboring blocks.

帧内预测模式编码使用基于临近块模式的算术编码上下文。

This is probably a good bit better than the hackneyed method that H.264 uses, which always struck me as being poorly designed.

这可能是比H.264的陈腐做法好一点的一个办法,H.264在这个地方的差劲设计始终困扰着我。

Residual coding is even more difficult to understand than motion vector coding, as the only full reference is a bunch of highly optimized, highly obfuscated C code.

剩下的编码甚至比运动矢量编码更难理解,因为唯一的完整参考是一堆高度优化的,高度混淆的C代码。

Like H.264’s CAVLC, it bases contexts on the number of nonzero coefficients in the top and left blocks relative to the current block.

像H.264的的 CAVLC,它基于当前块顶部和左侧的非零系数的数字。

In addition, it also considers the magnitude of those coefficients and, like H.264’s CABAC, updates as coefficients are decoded.

此外,它同时也考虑这些系数的大小,类似H.264的的CABAC,在解码时作为系数更新。

One more thing to note is the data partitioning scheme used by VP8. This scheme is much like VP3/Theora’s and involves putting each syntax element in its own component of the bitstream. The unfortunate problem with this is that it’s a nightmare for hardware implementations, greatly increasing memory bandwidth requirements. I have already received a complaint from a hardware developer about this specific feature with regard to VP8.

还有一点要注意的是VP8中数据的分区方案。这个方案很像VP3/Theora的,将每个语法元素放入bit流中属于自己的组件中。不幸的是,这是硬件实现的恶梦,大大提高了内存带宽的需求。我已经收到了硬件开发人员关于VP8这方面的特定功能的投诉。

Verdict on Entropy Coding: I’m not quite sure here.

关于编码的结论:我不是很确定。

It’s better in some ways, worse in some ways, and just plain weird in others.

它的某些方面更好一些,在某些方面更差,而在另一些方面很奇怪。

My hunch is that it’s probably a very slight win for H.264; non-adaptive arithmetic coding has to have some serious penalties. It may also be a hardware implementation problem.

我的直觉是,它可能比H.264稍好一点,非自适应算术编码可能会付出很大的代价。而且还可能面临硬件实现的问题。
Loop Filter 回放过滤器

The loop filter is run after decoding or encoding a frame and serves to perform extra processing on a frame, usually to remove blockiness in DCT-based video formats. Unlike postprocessing, this is not only for visual reasons, but also to improve prediction for future frames.

回放过滤器在解码或编码一个帧后运行,在帧上执行一些额外的处理,在基于DCT的格式中通常是消除区块效应。与后期处理不同,这不仅是对视觉方面有益,而且可以帮助预测未来帧。

Thus, it has to be done identically in both the encoder and decoder.

因此,我们必须保证编码器和解码器中有相同的实现。

VP8’s loop filter is vaguely similar to H.264’s, but with a few differences.

VP8的回放过滤器与H.264的依稀相似,但有一些分歧。

First, it has two modes (which can be chosen by the encoder): a fast mode and a normal mode.

首先,它有两种模式(即可以由编码器选择):快速模式和普通模式。

The fast mode is somewhat simpler than H.264’s, while the normal mode is somewhat more complex.

快速模式比H.264的简单,而正常模式较为复杂。

Secondly, when filtering between macroblocks, VP8’s filter has wider range than the in-macroblock filter — H.264 did this, but only for intra edges.

第二,当在宏块之间的过滤的时候,VP8的过滤器比在块内部过滤有更广泛的范围, H.264 只在内部边缘会这样做。

Third, VP8’s filter omits most of the adaptive strength mechanics inherent in H.264’s filter.

第三,VP8的过滤器忽略了大部分H.264中自适应过滤器的力学强度的过滤。

Its only adaptation is that it skips filtering on p16×16 blocks with no coefficients.

它唯一的自适应是,它跳过了对没有系数的p16 × 16块的过滤。

This may be responsible for the high blurriness of VP8’s loop filter: it will run over and over and over again on all parts of a macroblock even if they are unchanged between frames (as long as some other part of the macroblock is changed).

这可能是因为VP8的回放过滤器的高模糊:它将一遍又一遍地反复运行,对一个宏块的所有部分,即使他们在帧与帧之间保持不变(只要宏块的一部分被改变)。

H.264’s, by comparison, is strength-adaptive based on whether DCT coefficients exist on either side of a given edge and based on the motion vector delta and reference frame delta across said edge.

对比之下,H.264是强度自适应的,基于DCT系数是否存在给定边缘的任何一边,和基于运动矢量差值和参考帧在边缘的差值。

Of course, skipping this strength calculation saves some decoding time as well.

当然,跳过这个强度计算可以节省一些解码时间。

Update: 更新:
05:28 < derf> Gumboot: You’ll be disappointed to know they got the loop filter ordering wrong again.

05:28 古博特:你知道后会很失望,他们的回放滤波器顺序又错了。
05:29 < derf> Dark_Shikari: They ordered it such that you have to process each macroblock in full before processing the next one. 05:29 Dark_Shikari:他们这样排序,所以你必须处理完一个宏块才能开始处理下一个。

Verdict on Loop Filter: Definitely worse compression-wise than H.264’s due to the lack of adaptive strength.

回放滤波器结论:在压缩比上绝对比 H.264的差,因为缺乏强度自适应。

Especially with the “fast” mode, might be significantly faster.

特别是快速模式,可能会快的多。

I worry about it being too blurry.   我担心它过于模糊。
Overall verdict on the VP8 video format

关于VP8的视频格式的综合结论:

Overall, VP8 appears to be significantly weaker than H.264 compression-wise. The primary weaknesses mentioned above are the lack of proper adaptive quantization, lack of B-frames, lack of an 8×8 transform, and non-adaptive loop filter.

总体而言,在压缩比上VP8似乎明显弱于H.264。上述的主要弱点是缺乏适当的自适应量化,缺乏B帧,缺乏8 × 8转换,非自适应回放过滤器。

With this in mind, I expect VP8 to be more comparable to VC-1 or H.264 Baseline Profile than with H.264.

考虑到这一点,与H.264相比,我觉得VP8与VC – 1或H.264 Baseline Profile更具有可比性。

Of course, this is still significantly better than Theora, and in my tests it beats Dirac quite handily as well.

当然,这仍然比Theora要好很多,在我的测试中,它也很轻松的击败了Dirac。

Supposedly Google is open to improving the bitstream format — but this seems to conflict with the fact that they got so many different companies to announce VP8 support. The more software that supports a file format, the harder it is to change said format, so I’m dubious of any claim that we will be able to spend the next 6-12 months revising VP8. In short, it seems to have been released too early: it would have been better off to have an initial period during which revisions could be submitted and then a big announcement later when it’s completed.

据说谷歌对于改善码流格式持开放态度,- 但是,这似乎与事实冲突,即有这么多不同的公司宣布支持VP8。一个文件格式越多的软件支持,就越难改变,所以我对任何在未来的6-12个月修正VP8的 声明保持怀疑。简言之,它似乎被释放得太早:它应该经过一段时间的修正,当修正完成的时候,再举行一次盛大的发布会更好。

Update: it seems that Google is not open to changing the spec: it is apparently “final” , complete with all its flaws.

更新:看来,谷歌不打算开放规格的修订: 它显然是想“最后” ,一次性完成其所有缺陷的修正。

In terms of decoding speed I’m not quite sure; the current implementation appears to be about 16% slower than ffmpeg’s H.264 decoder (and thus probably about 25-35% slower than state-of-the-art decoders like CoreAVC).

在解码的速度方面,我不是很确定,当前的实现,似乎比FFmpeg的H.264解码器慢大约16%(因此可能比实现的很好的解码器,如CoreAVC,慢约 25-35%) 。

Of course, this doesn’t necessarily say too much about what a fully optimized implementation will reach, but the current one seems to be reasonably well-optimized and has SIMD assembly code for almost all major DSP functions, so I doubt it will get that much faster.

当然,在这里还没有必要太在意一个全面优化的实现将达到多少,但目前的实现似乎已经是一个相当优化后的实现了,几乎所有主要DSP功能都用SIMD汇编代码进行了优化,所以我怀疑它能继续优化到赶上那样的速度。

I would expect, with equally optimized implementations, VP8 and H.264 to be relatively comparable in terms of decoding speed.

我期望经过同样的优化后的实现,VP8和H.264 在解码速度方面能具有可比性。

This, of course, is not really a plus for VP8: H.264 has a great deal of hardware support, while VP8 largely has to rely on software decoders, so being “just as fast” is in many ways not good enough.

这当然不是附加给VP8:H.264有很好的硬件支持,而VP8很大程度上要依赖软件解码器,所以“一样快”在很多方面还是不够好。

By comparison, Theora decodes almost 35% faster than H.264 using ffmpeg’s decoder.

相比之下,使用FFmpeg的 Theora 解码器比H.264的解码速度快近35%。

Finally, the problem of patents appears to be rearing its ugly head again. VP8 is simply way too similar to H.264: a pithy, if slightly inaccurate, description of VP8 would be “H.264 Baseline Profile with a better entropy coder” .

最后,专利问题的似乎再次抬头。VP8与H.264太类似了:一个精辟的,可能也稍有不准确,VP8是一个“有更好的熵编码的H.264 Baseline Profile” 。

Though I am not a lawyer, I simply cannot believe that they will be able to get away with this, especially in today’s overly litigious day and age. Even VC-1 differed more from H.264 than VP8 does, and even VC-1 didn’t manage to escape the clutches of software patents.

虽然我不是律师,我还是根本不相信他们能够得到逃过专利,特别是在今天这个过度诉讼的时代。与VP8相比VC-1跟H.264差别更大,可VC – 1也没能逃脱软件专利的魔掌。

Until we get some hard evidence that VP8 is safe, I would be extremely cautious. Since Google is not indemnifying users of VP8 from patent lawsuits, this is even more of a potential problem.

在我们得到确凿证据证明 VP8是安全的之前,我会非常谨慎。由于谷歌并不确保VP8的用户免于专利诉讼,这更是一个潜在的问题。

But if luck is on Google’s side and VP8 does pass through the patent gauntlet unscathed, it will undoubtedly be a major upgrade as compared to Theora.

但如果运气在谷歌一边,VP8通过专利审查而毫发无损,相比 Theora 这无疑将是一个重大的升级。
Addendum A: On2’s VP8 Encoder and Decoder

附录A:On2公司的VP8编码器和解码器

This post is primarily aimed at discussing issues relating to the VP8 video format. But from a practical perspective, while software can be rewritten and improved, to someone looking to use VP8 in the near future, the quality (both code-wise, compression-wise, and speed-wise) of the official VP8 encoder and decoder is more important than anything I’ve said above. Thus, after reading through most of the code, here’s my thoughts on the software.

这篇博文的主要目的是讨论VP8视频格式的有关问题。但是从实践的角度,软件可以重写和改进,对于计划在不久的将来使用VP8的人来说,官方VP8 编码器和解码器的质量(包括代码质量,压缩质量,和速度)是比我上面说的都重要。因此,通读大部分代码之后,这就是我对这个软件的思考。

Initially I was intending to go easy on On2 here; I assumed that this encoder was in fact new for VP8 and thus they wouldn’t necessarily have time to make the code high-quality and improve its algorithms. However, as I read through the encoder, it became clear that this was not at all true; there were comments describing bugfixes dating as far back as early 2004 . That’s right: this software is even older than x264! I’m guessing that the current VP8 software simply evolved from the original VP7 software. Anyways, this means that I’m not going to go easy on On2; they’ve had (at least) 6 years to work on VP8, and a much larger dev team than x264’s to boot.

起初我正打算在这里轻松放过On2公司,我以为这是一个新的VP8编码器,因此,他们不一定有时间来提高代码品质和改善算法。然而,在我翻阅编码器 的源代码时我发现,很显然,这并不完全是真的。有评论显示时间为早在2004年年初的错误修正。 是的:这个软件比x264更老!我猜测,目前VP8软件只是原VP7软件的一个简单演变。不管怎么说,这意味着我不会继续轻易放过On2公司,他们有(至 少)6年的时间来完善VP8,和一个比x264更大的开发团队。

Before I tear the encoder apart, keep in mind that it isn’t bad. In fact, compression-wise, I don’t think they’re going to be able to get it that much better using standard methods. I would guess that the encoder, on slowest settings, is within 5-10% of the maximum PSNR that they’ll ever get out of it. There’s definitely a whole lot more to be had using unusual algorithms like MB-tree, not to mention the complete lack of psy optimizations — but at what it tries to do, it does pretty decently. This is in contrast to the VP3 encoder, which was a pile of garbage (just ask any Theora dev).

在我撕开这个编码器之前,请记住,它并不坏。事实上,从压缩方面看,我不认为使用通常的办法能比现在的更好。我猜想,编码器在最慢的设置下的信噪比 是峰值的5-10%,更不要说缺乏物理优化,很显然必须做更多的努力,使用一些非通常的算法,比如 MB树。 不论它试图做些什么,它都做的很体面。这是相对于VP3的编码器来说,VP3编码器简直就是一堆垃圾(只要问问任何 Theora 开发者就知道了)。

Before I go into specific components, a general note on code quality. The code quality is much better than VP3, though there’s still tons of typos in the comments

在我进入的具体组成部分前,对代码质量的一个总体印象是,代码质量比VP3好得多,虽然在注释中的仍然有一堆的笔误 .

They also appear to be using comments as a form of version control system, which is a bit bizarre. The assembly code is much worse, with staggering levels of copy-paste coding, some completely useless instructions that do nothing at all, unaligned loads/stores to what-should-be aligned data structures, and a few functions that are simply written in unfathomably roundabout (and slower) ways. While the C code isn’t half bad, the assembly is clearly written by retarded monkeys.

他们似乎很奇怪的把注释当作某种版本控制系统的形式在使用。汇编的代码要糟糕的多,拷贝粘贴的痕迹很严重,一些代码结构完全没用,什么也不做,应该 对齐加载/存储的结构也没有对齐,有些功能实现写的不知所云(而且更慢)。C代码还看的过去,汇编代码大约是一群智障的猴子写的。

Motion estimation: Diamond, hex, and exhaustive (full) searches available. All are pretty naively implemented: hexagon, for example, performs a staggering amount of redundant work (almost half of the locations it searches are repeated!). Full is even worse in terms of inefficiency, but it’s useless for all but placebo-level speeds, so I’m not really going to complain about that.

运动估计:菱形,六角,以及全搜索可用,但实现方法都很幼稚:例如六边形,执行了数量巨大的重复多余的搜索(它搜索的几乎一半的位置是重复的!)。在总体效率方面甚至更糟,在速度上,除了安慰人,没别的用处了,所以我真的不想继续抱怨了。

Subpixel motion estimation: Straightforward iterative diamond and square searches. Nothing particularly interesting here.

亚像素运动估计:直接迭代菱形和方形搜索。这里没有什么特别有趣的。

Quantization : Primary quantization has two modes: a fast mode and a slightly slower mode. The former is just straightforward deadzone quant, while the latter has a bias based on zero-run length (not quite sure how much this helps, but I like the idea). After this they have “coefficient optimization” with two modes. One mode simply tries moving each nonzero coefficient towards zero; the slow mode tries all 2^16 possible DCT coefficient rounding permutations . Whoever wrote this needs to learn what trellis quantization (the dynamic programming solution to the problem) is and stop using exponential-time algorithms in encoders.

量化 :量化主要有两种模式:快速模式和慢模式。前者只是简单的直接死区量化,而后者有一个基于零运行长度的偏移(不太清楚这有多大帮助,但我喜欢这个主意)。 在此之后,他们有两个“系数优化”模式。一个模式简单的尝试将每个非零系数向零移动;另一个慢速模式尝试所有可能的2 ^ 16个DCT系数四舍五入排列 。任何写编码器这部分代码的人都需要学习什么是框架量化(这个问题的动态规划解法),停止使用指数时间算法。

Ratecontrol (frame type handling): Relies on “boosting” the quality of golden frames and “alt-ref” frames — a concept I find extraordinarily dubious because it means that the video will periodically “jump” to a higher quality level, which looks utterly terrible in practice.

码率控制(帧类型处理):依靠“提高”黄金帧和替代参考帧的质量,一个概念,我觉得非常可疑,因为这意味着视频将定期“跳”到一个更高的质量水平,在实践中这看起来非常可怕。

You can see the effect in this graph of PSNR ; every dozen frames or so, the quality “jumps”. This cannot possibly look good in motion.

你可以在此图中看到信噪比效果 ,每十张左右,图片质量发生一次“跳跃”。这在视频中看起来不可能好到哪去。

Ratecontrol (overall): Relies on a purely reactive ratecontrol algorithm, which probably will not do very well in difficult situations such as hard-CBR and tight buffer constraints. Furthermore, it does no adaptation of the quantizer within the frame (eg in the case that the frame overshot the size limitations ratecontrol put on it). Instead, it relies on re-encoding the frame repeatedly to reach the target size — which in practice is simply not a usable option for two reasons. In low-latency situations where one can’t have a large delay, re-encoding repeatedly may send the encoder way behind time-wise. In any other situation, one can afford to use frame-based threading , a much faster algorithm for multithreaded encoding than the typical slice-based threading — which makes re-encoding impossible.

码率控制(总体):依靠一个纯粹的反作用码率控制算法,可能在某些困难的情况下表现不会很好,如硬CBR和缓冲区严格限制的情况。此外,它没有在帧 内做量化适应(例如在帧超过了码率控制的尺寸限制的情况下)。相反,它依赖于多次重新编码来达到目标大小。 这实际上根本不是一个可用的办法,原因有两个:在低延迟情况下,不能有如此大的滞后,反复的重新编码可能使得编码器落后于时间线;在任何其他情况下,多线 程编程中,使用比典型的基于帧的多线程更快的多线程算法 ,使得重新编码成为不可能。

Loop filter: The encoder attempts to optimize the loop filter parameters for maximum PSNR. I’m not quite sure how good an idea this is; every example I’ve seen of this with H.264 ends up creating very bad (often blurry) visual results.

回放过滤器:编码器试图优化回放过滤器的参数,以达到最大的峰值信噪比。我不是很确定这个主意有多么好,在H.264里我看到的每一个这么做的例子,最终结果都是非常坏(通常是模糊的)视觉效果。

Overall performance: Even on the absolute fastest settings with multithreading, their encoder is slow . On my 1.6Ghz Core i7 it gets barely 26fps encoding 1080p; not even enough to reliably do real-time compression. x264, by comparison, gets 101fps at its fastest preset “ultrafast”. Now, sure, I don’t expect On2’s encoder to be anywhere near as fast as x264, but being unable to stream HD video on a modern quad-core system is simply not reasonable in 2010. Additionally, the speed options are extraordinarily confusing and counterintuitive and don’t always seem to work properly; for example, fast encoding mode (–rt) seems to be ignored completely in 2-pass.

性能总体评价:即使是在多线程最快的设置下,这个编码器还是慢 。在我的1.6GHz的Core i7机器上,它只得到了26fps的编码1080p的速度,甚至还达不到可靠的实时压缩的速度。x264比较而言,在最快的设置“超级快”下获得了 101fps的速度。现在,当然,我并不指望On2公司的编码器在任何方面接近x264的速度,在一个现代四核心系统下还无法编码高清视频流,在2010 年根本不应该的。此外,速度选项非常混乱和依赖直觉,而且不是都能正常工作,例如,快速编码模式(- RT)在2-pass下似乎被完全忽略了。

Overall compression: As said before, compression-wise the encoder does a pretty good job with the spec that it’s given. The slower algorithms in the encoder are clearly horrifically unoptimized (see the comments on motion search and quantization in particular), but they still work .

压缩总体评价:正如前面所说,压缩方面,编码器根据给定的规范做的不错。编码器中一些慢的算法没有经过任何的优化(见前面关于动作检测和特定量化部分的注释),但它们能工作 。

Decoder: Seems to be straightforward enough. Nothing jumped out at me as particularly bad, slow, or otherwise, besides the code quality issues mentioned above.

解码器:似乎是直白的够用。给我的第一印象,不是特别糟糕,或缓慢,或有其它问题,除了上面提到的代码质量问题。

Practical problems: The encoder and decoder share a staggering amount of code. This means that any bug in the common code will affect both , and thus won’t be spotted because it will affect them both in a matching fashion. This is the inherent problem with any file format that doesn’t have independent implementations and is defined by a piece of software instead of a spec: there are always bugs. RV40 had a hilarious example of this, where a typo of “22″ instead of “33″ resulted in quarter-pixel motion compensation being broken. Accordingly, I am very dubious of any file format defined by software instead of a specification. Google should wait until independent implementations have been created before setting the spec in stone.

实现的问题:编码器和解码器共享了惊人的数量的代码。这意味着任何在共同的代码中的错误都会影响两者 ,因此将不会被发现,因为影响总会以配对的方式存在。 这是一个内在的问题,任何文件格式,如果没有独立的实现,依赖一个软件实现,而不是一个规范,都会有这样的问题: 总是有缺陷。RV40证明了这一点,一个滑稽的例子就是一个本应该输入33的地方被错误输入成22,导致四分之一像素运动补偿功能错误。因此,我非常怀 疑,任何用软件实现而不是用规范来定义的文件格式。谷歌应该等到独立的实现完成以后再来确定规范。

Update: it seems that what I forsaw is already coming true:  更新:看来我放弃的已经变为现实:

gmaxwell: It survives it with a patch that causes artifacts because their encoder doesn’t clamp MVs properly.

gmaxwell:它给打了一个补丁,修正了因为编码器没有妥当的处理MV引起的问题。
::cries:: ::哭::
So they reverted my decoder patch, instead of fixing the encoder.

他们回滚了我的解码器补丁,而不是修正编码器。
“but we have many files encoded with this!” “因为我们有许多文件用这个编码器编码的!”
so great.. 如此之多..

single implementation and it depends on its own bugs.

单一的实现,它依赖于它自己的错误。 :(

This is just like Internet Explorer 6 all over again — bugs in the software become part of the “spec”!

这就跟Internet Explorer 6一样,错误成为软件规范的一部分!

Hard PSNR numbers:   硬信噪比的数字:
(Source/target bitrate are the same as in my upcoming comparison .) (下面的比较中源/目标比特率都是一样的 。)
x264, slowest mode, High Profile: 29.76103db (~28% better than VP8)

x264,最慢的模式,高度优化:29.76103分贝(比VP8好?28%)
VP8, slowest mode: 28.37708db (~8.5% better than x264 baseline)

VP8,最慢的模式:28.37708分贝(比x264基线好?8.5%)
x264, slowest mode, Baseline Profile: 27.95594db

x264,最慢的模式,基线优化:27.95594分贝

Note that these numbers are a “best-case” situation: we’re testing all three optimized for PSNR, which is what the current VP8 encoder specializes in as well.

请注意,这些数字是“最佳”的情况:我们正在测试三个编码器都为峰值信噪比做了专门优化,而这也正是VP8编码器规范当前做的。

This is not too different from my expectations above as estimated from the spec itself; it’s relatively close to x264’s Baseline Profile.

这与我的期望相差不是太大,根据规范本身预估,它与x264的基线优化比较接近。

Keep in mind that this is not representative of what you can get out of VP8 now , but rather what could be gotten out of VP8. PSNR is meaningless for real-world encoding — what matters is visual quality — so hopefully if problems like the adaptive quantization issue mentioned previously can be overcome, the VP8 encoder could be improved to have x264-level psy optimizations. However, as things stand…

请记住,这不是典型的你现在能从VP8中得到什么,而是你曾经能从VP8中得到的。信噪比在现实世界编码中没有意义,重要的是视觉效果,所以希望如果上面提到的自适应量化问题可以解决,VP8编码器可以被改善到拥有x264一样的物理优化。然而,按现状看来…

Visual results: Unfortunately, since the current VP8 encoder optimizes entirely for PSNR, the visual results are less than impressive. Here’s a sampling of how it compares with some other encoders.

视觉效果:不幸的,因为目前的VP8编码器完全是为信噪比做优化,视觉效果远远达不到令人印象深刻的程度。这里有一个与其它编码器的比较。

Source and bitrate are the same as above; all encoders are optimized for optimal visual quality wherever possible:

源和比特率是相同的,所有的编码器都是为最佳的视觉质量做优化:

Update: I got completely slashdotted and my few hundred gigs of bandwidth ran out in mere hours. The images below have been rehosted, so if you’ve pasted the link somewhere else, check below for the new one.

更新:我被Slashdot了,我的数百G的带宽在仅仅几个小时里就被消耗完了。下面的图片已被迁移了,所以如果你在其它地方粘贴了链接,请使用下面这个新的链接地址。

VP8 (On2 VP8 rc8) (source) (Note: I recently realized that the official encoder doesn’t output MKV, so despite the name, this file is actually a raw VP8 bitstream.)

VP8(On2公司VP8 rc8) (源) (注:我最近才发现,官方编码器不会输出MKV,所以尽管使用这个名字,这个文件实际上是一个原始VP8比特流。)
H.264 (Recent x264) (source)  H.264标准(最近x264) (源)
H.264 Baseline Profile (Recent x264) (source)   H.264 Baseline Profile的(最近x264) ( 源)
Theora (Recent ptalabvorm nightly) (source)   Theora(最近 ptalabvorm 每日构建版本) (源)
Dirac (Schroedinger 1.0.9) (source)  Dirac(Schroedinger 1.0.9) (源)
VC-1 (Microsoft VC-1 SDK) (source) VC – 1(微软的VC – 1 SDK) (源)
MPEG-4 ASP (Xvid 1.2.2) (source)  MPEG – 4的ASP(XviD格式1.2.2) (源)

The quality generated by On2’s VP8 encoder will probably not improve significantly without serious psy optimizations.

由On2公司的VP8编码器因为没有经过认真的物理优化,可能质量提升不会很明显。

One further note about the encoder: currently it will drop frames by default, which is incredibly aggravating and may cause serious problems. I strongly suggest anyone using it to turn the frame-dropping feature off in the options.

关于编码器的另一个注解:目前在默认情况它会丢弃帧,这将令人难以置信的是问题恶化,并可能导致严重问题。我强烈建议任何使用它的人都关闭帧丢弃选项功能。
Addendum B: Google’s choice of container and audio format for HTML5

附录B:谷歌的HTML5的选择容器和音频格式

Google has chosen Matroska for their container format. This isn’t particularly surprising: Matroska is one of the most widely used “modern” container formats and is in many ways best-suited to the task. MP4 (aka ISOmedia) is probably a better-designed format, but is not very flexible; while in theory it can stick anything in a private stream, a standardization process is technically necessary to “officially” support any new video or audio formats. Patents are probably a non-issue; the MP4 patent pool was recently disbanded, largely because nobody used any of the features that were patented.

谷歌已经选择了Matroska作为他们的容器格式。这并不特别令人惊讶:Matroska 是最广泛使用的“现代”容器格式之一,并在许多方面是最适合这个工作的。MP4(又名ISOmedia)可能是一个设计得更好的格式,但不是很灵活。在理 论上它可以将任意数据插入到一个私密流,为了“正式”支持任何新的视频或音频格式,一个标准化的过程在技术上是必要的。专利不再是问题了;MP4专利池最 近解散了,这主要是因为没有人使用该专利保护的任何功能。

Another advantage of Matroska is that it can be used for streaming video: while it isn’t typically, the spec allows it. Note that I do not mean progressive download (a’la Youtube), but rather actual streaming, where the encoder is working in real-time. The only way to do this with MP4 is by sending “segments” of video, a very hacky approach in which one is effectively sending a bunch of small MP4 files in sequence. This approach is used by Microsoft’s Silverlight “Smooth Streaming”. Not only is this an ugly hack, but it’s unsuitable for low-latency video. This kind of hack is unnecessary for Matroska. One possible problem is that since almost nobody currently uses Matroska for live streaming purposes, very few existing Matroska implementations support what is necessary to play streamed Matroska files.

Matroska 的另一个优点是它可用于视频流:虽然不是它的典型应用,但它的规范允许这么做。注意,我不是说渐进式下载(类似 YouTube)的,而是实际的流,编码器实时编码。MP4这样做的唯一的方式发送“视频分段”,非常变态的方法,快速的按顺序发出一大顿小MP4文件。 微软的Silverlight的“光滑流“也使用这种办法。这不仅是一个丑陋的做法,而且对于低延迟视频不合适。Matroska 不必要这么做。一个可能的问题是,目前几乎没有人使用 Matroska 做流媒体直播,所以很少有 Matroska 的实现支持播放流式的 Matroska 文件。

I’m not quite sure why Google chose to rebrand Matroska; “WebM” is a stupid name.

我不明白为何谷歌要更改 Matroska 的名字,“WebM”是一个愚蠢的名字。

The choice of Vorbis for audio is practically a no-brainer. Even ignoring the issue of patents, libvorbis is still the best general-purpose open source audio encoder. While AAC is generally better at very low bitrates, there aren’t any good open source AAC encoders : faac is worse than LAME and ffmpeg’s AAC encoder is even worse. Furthermore, faac is not free software; it contains code from the non-free reference encoder. Combined with the patent issue, nobody expected Google to pick anything else. Vorbis

选择 Vorbis 格式的音频几乎是显而易见的。甚至抛开专利问题不说,libvorbis 仍然是最好的通用的开源音频编码器。尽管AAC格式,在低比特率下会更好,但没有好的开源的AAC编码器 :FAAC 不如 LAME,FFmpeg的AAC编码器更差。此外,FAAC 不是免费的,它包含非免费的编码器参考代码。结合专利的问题,没有人预计谷歌挑别的了。
Addendum C: Summary for the lazy

附录C:摘要的懒惰

VP8, as a spec, should be a bit better than H.264 Baseline Profile and VC-1. It’s not even close to competitive with H.264 Main or High Profile. If Google is willing to revise the spec, this can probably be improved.

VP8,作为一个规范,应该稍稍比H.264 基线优化 和VC – 1更好。它甚至还没有与H.264的主要优化或高级优化做比较的实力。如果谷歌愿意修改规范,这大概可能改善。

VP8, as an encoder, is somewhere between Xvid and Microsoft’s VC-1 in terms of visual quality. This can definitely be improved a lot, but not via conventional means.

VP8,作为一个编码器,在视觉质量上,介于Xvid和微软的VC – 1之间。这肯定是进步了很多,但不是通过常规的手段达到的。

VP8, as a decoder, decodes even slower than ffmpeg’s H.264. This probably can’t be improved that much.

VP8,作为一个解码器,解码速度甚至比FFmpeg的H.264 还要慢。这大概不能改善多少了。

With regard to patents, VP8 copies way too much from H.264 for anyone sane to be comfortable with it, no matter whose word is behind the claim of being patent-free.

关于专利,VP8复制了太多H.264的方法,任何理智的人都不会觉得舒服,不论是谁说的免专利费。

VP8 is definitely better compression-wise than Theora and Dirac, so if its claim to being patent-free does stand up, it’s an upgrade with regard to patent-free video formats.

VP8 肯定比Theora和 Dirac 有更好的压缩效率,如果它自称的免专利费能够成立,这是免专利费的视频格式的一次升级。

VP8 is not ready for prime-time; the spec is a pile of copy-pasted C code and the encoder’s interface is lacking in features and buggy. They aren’t even ready to finalize the bitstream format, let alone switch the world over to VP8.

VP8还没有为它的黄金时间作好准备; 它的规范是一堆复制粘贴的C代码,编码器的接口缺乏很多功能,还有很多bug。他们甚至还没有确定码流格式,更不用说在全球推广 VP8。

With the lack of a real spec, the VP8 software basically is the spec–and with the spec being “final”, any bugs are now set in stone. Such bugs have already been found and Google has rejected fixes

考虑到缺乏一个真正的规范,VP8 软件基本上就是规范,当规范最终被确定下来,任何错误都已经不能更改了。这样的错误已经被发现,但谷歌已经拒绝了修复这种错误。

Google made the right decision to pick Matroska and Vorbis for its HTML5 video proposal.

谷歌做出了正确的决定,挑选了 Matroska 和 Vorbis 来达到 HTML5 视频目的。