~38 min read
Diving into the depths of Widevine L3
Intro
Widevine is a DRM scheme by Google that is used to securely deliver content to the end user; either by secure hardware or obfuscated software. In this post, I will explain how to use the Qiling emulation framework and apply different techniques to break the software-only DRM. In particular, we will cover how to load Android libraries into Qiling, apply Differential Fault Analysis (DFA) on a real-world target, and how emulation can assist in deobfuscating code.
Let’s start with a quick overview of Widevine.
Widevine requires three servers: a Provisioning Server, a License Server, and a Content Server.
The root of trust is called the keybox. It’s a binary blob with the following structure:
| Keybox | Size (bytes) |
|---|---|
| Device ID | 0x20 |
| Device Key | 0x10 |
| Data (version specific) | 0x48 |
| Magic (“kbox”) | 0x4 |
| Checksum | 0x4 |
| Total = 0x80 |
There are three provisioning models:
- Request a new license directly with a token from the keybox
- Use the keybox to request a device certificate that can be used to request a license
- Let the OEM handle the certificates (which can be used to provision L1 as an OTA update; we’ll come back to what L1 means in just a moment)
Google operates the Provisioning Server and License Server, which handle the creation and management of device certificates. A third-party license proxy connects to the License Server and checks whether a client can request a license for specific content. The Content Server then provides the encrypted content.
Widevine has three security levels, L1, L2, and L3. They all have different requirements.
L1 ensures that Widevine DRM keys and decrypted content are handled exclusively by secure hardware, preventing exposure to the host CPU.
L2 only ensures that the Widevine DRM keys are stored by secure hardware, and the host CPU handles the decrypted content.
L3 runs everything on the host CPU; the keys must be reasonably protected.
The content provider can restrict access to specific content (such as high-definition streams) depending on the security level.
Why did I want to break Widevine
I wanted to break Widevine mostly out of curiosity. Widevine L3 has been broken a lot in recent years, but the projects I knew of all use Frida hooks to dump the device key from a running Widevine session. While this only requires root access to an Android device, it felt like there has to be another way which requires fewer privileges. I was encouraged by reading the paper Exploring Widevine for Fun and Profit, which provides a really nice overview on how the architecture works. They claim that the keybox itself is not really protected and can just be dumped from memory. The paper also mentions that the actual cryptography has been broken twice: once by David Buchanan, who used Differential Fault Analysis to break the white-box AES used to protect the keybox of Chrome’s Widevine version in 2019, and a year later by Tomer Hadad, who broke an updated version that used RSA and released a Chrome extension that could be used to decrypt the content. However, it seems controversial who was involved in it, according to this post.
How did I want to break Widevine
At this point, I knew nearly nothing about the inner workings of the L3 version I was planning to work on. I just followed a blog post that used the Android Studio Emulator and dumped the device key using the Frida script mentioned in the blog post in order to understand the flow a bit better and later verify whether my approach works or not.
To understand how OEMCrypto (the component used for DRM on Android) works, I did some dynamic analysis using Frida to figure out how the different functions for the L3 interface behave and which arguments they need. The Exploring Widevine for Fun and Profit paper contains a reverse-engineered symbol mapping. For example, oecc01 initialized the DRM context, and the _lccXX functions are the corresponding L3 functions.
While Frida is a powerful dynamic analysis tool, I prefer a more isolated and reproducible environment to play around with. When solving and writing CTF challenges, I got familiar with Qiling, a Python project that uses Unicorn under the hood to emulate the userspace and reimplement the OS layer. It is quite powerful and easily hackable if something doesn’t work as intended (which happens often, but that’s fine). So, my first goal was to get Widevine to run inside Qiling. This is also where I spent most of my time during the project.
Getting Widevine to run inside Qiling for easier analysis
To be able to run Widevine, I first had to load it in the emulator.
The library implementing all Widevine L3 operations is libwvhidl.so. I just wanted to instrument this library, not the whole program used on a real Android device (android.hardware.drm@1.1-service.widevine, the Android DRM service). Therefore, I tried to trick the loader into thinking the library was just an executable so that it loads all libraries it depends on for me. I have done this a few times, and it is usually really easy using LIEF. There is even a short post on how to do it in their docs, but for some reason, the library resulted in a corrupted state, which resulted in some problems when applying relocations. After hours of debugging, I figured out that I could get a working ELF like this:
lib = lief.parse(src)
lib[lief.ELF.DynamicEntry.TAG.from_value(0x6000000F)].value = ( lib[lief.ELF.DynamicEntry.TAG.from_value(0x6000000F)].value + 0x1000)lib.interpreter = b"/system/bin/linker"
lib.write(dst)The loader happily loads the library, and it magically works ¯\_(ツ)_/¯
With this, I could manually execute the _lcc01 function to initialize the DRM, which creates /data/vendor/mediadrm/IDM1013/L3/ay64.dat, the encrypted keybox.
To verify if everything works as intended, I just copied all files of /data/vendor/mediadrm/ into my Qiling root filesystem and initialized the DRM. However, it didn’t “accept” my keybox and always created a new one, so I wasn’t sure whether it was a proper key or contained a “garbage” key.
The other thing I noticed was that if I ran it multiple times, the encrypted keybox content changed, even if I cleaned the filesystem between runs. This is when I learned that Qiling is not reproducible. After linking /dev/random to /dev/zero, hooking the getrandom syscall to always write zeroes, and hooking the clock_gettime and gettimeofday syscalls to return a static time, I was at least able to get a reproducible result within Qiling. But after hooking the same functions with Frida on the Android emulator, I still got different encrypted keyboxes.
from math import floorfrom qiling import Qilingfrom qiling.os.linux.syscall import __get_timespec_structfrom qiling.const import *from qiling.os.const import *import random
random.seed(0)
FAKE_TIME = 170000000def get_faketime(): tv_sec = floor(FAKE_TIME) tv_nsec = floor((FAKE_TIME - tv_sec) * 1e6) ts_cls = __get_timespec_struct(32) return ts_cls(tv_sec=tv_sec, tv_nsec=tv_nsec)
def hook_clock_gettime(ql: Qiling, clock_id: int, tp: int): ql.mem.write(tp, b"\x00" * 8) return 0
def hook_gettimeofday(ql: Qiling, tv: int, tz: int): if tv: ql.mem.write(tv, bytes(get_faketime())) if tz: ql.mem.write(tz, b"\x00" * 8) return 0
def hook_getrandom(ql: Qiling, buf: int, buflen: int, flags: int): ql.mem.write(buf, b"\x00" * buflen) return buflen
def make_deterministic(ql: Qiling): ql.os.set_syscall("getrandom", hook_getrandom, QL_INTERCEPT.CALL) ql.os.set_syscall("clock_gettime", hook_clock_gettime, QL_INTERCEPT.CALL) ql.os.set_syscall("gettimeofday", hook_gettimeofday, QL_INTERCEPT.CALL) ql.add_fs_mapper("/dev/urandom", "/dev/zero")At this point, I had two theories. Either there was still a difference in some syscall handling, or the obfuscated part somehow detected that this was not a real device. I traced all syscalls, and there was nothing suspicious. At this point, I was certain that they had some fancy detection in place, which I was hoping that they were doing before the DRM initialization. Thus, I hooked the keybox read in Frida and wrote a script that would dump the entire memory and CPU state to later be loaded into Qiling. For some reason, the hook could not read all memory regions the process had mapped. Therefore, I created a shell script that used dd to dump it externally from procfs. Some registers were missing from Frida’s CPU state, but they didn’t matter. After writing a simple loader and patching out the modifications Frida made to the process memory for their hooks, I got it to run inside the emulator. Unfortunately, it still produced a different key. At that point, I was frustrated and started tracing and diffing all executed instructions. While I provided my emulator with the device properties from the filesystem, they weren’t resolved correctly. I noticed that Widevine checks ro.serialno after hooking __system_property_get. Once I had implemented a hook that returns the Android emulator serial number in my Qiling-based emulator, it finally accepted the keybox and didn’t create a new one.
Using the Qiling emulator, I verified the claim of the paper that the keybox can be dumped by searching the memory for kbox when hooking munmap. At this point, the only thing left was to break the crypto used to protect the keybox. When initializing Widevine, it logged the following to Logcat WVCdm : [(0):] Level3 Library 4464 Apr 20 2018 14:54:35. I fired up the emulator and chose Android 9 because the blog post I followed for setting up the Android emulator instructed me to do so. The Widevine version is really old. Still, with this knowledge, I was certain that the keybox was protected using white-box AES, as David Buchanan noted in this tweet. Since I had no idea how DFA worked in practice, I read this great blogpost by Quarkslab from 2018. For an in-depth explanation of how it works, check out their blog post. For our purposes, we have to do the following:
- Identify the AES operations
- Inject a fault at a precise location that corrupts the internal state a bit
- Generate many faulted outputs for the same input
- Throw the result into phoenixAES
- Profit?
Well, that sounds easy (and it is, as you will see later); the hardest part is actually identifying the AES operations. They wrote a tracing framework called TraceGraph that visualizes execution memory accesses. Their framework only works for Valgrind and PIN, but not Qiling. I wrote a small Python script replicating their idea. It created an image from my execution, but the output was be too large to analyze. Hence, I reduced the resolution slightly, resulting in promising-looking pictures at first glance. By logging all memory reads and writes, I also narrowed down the area I had to record by looking at when I could identify the encrypted keybox and the decrypted keybox in memory. The resulting image looked like this:
Execution of Widevine L3 inside Qiling: the x-axis is the memory address, and the y-axis is time. Green pixels represent reads, blue pixels represent writes, and black pixels represent the bytes executed.
Generating traces and visualizing them using TraceGraph
I quickly realized that while the results looked promising, it was really hard to work with the image. Most viewers wouldn’t open it at all, and while GIMP could open it, navigating it was really slow due to the image’s size. I ended up creating a TraceGraph-compatible database. TraceGraph is really slow and lags a lot when using large databases. Still, it performed significantly better than my GIMP approach, so I looked around for any alternatives. I was even thinking about rewriting it, but I concluded it would be more effort than I was willing to invest. In the end, I wrote a simple Python module. It has some bugs when resolving memory accesses to instructions, but it’s useful for recognizing patterns.
Finding the AES operations in obfuscated code
After loading a trace of the L3 initialization, it can be inspected visually. Here is my approach to finding the AES.
TraceGraph visualization of L3 initialization: the x-axis (left➛right) is the memory address, and the y-axis (top➛bottom) is time. Green represents reads, red represents writes, and black pixels represent the bytes executed.
It turned out that the AES operations were surprisingly easy to find despite the code’s obfuscation.
TraceGraph visualization of L3 initialization highlighting the AES operations: the x-axis is the memory address, and the y-axis is time. Green represents reads, red represents writes, and black pixels represent the bytes executed.
TraceGraph visualization showing four large and one small clusters of read operations and a repeating structure of execution operations
The four boxes on the left look like lookup tables, and the one on the right contains key-related stuff (more on that later). The internal state is managed on the stack. Let’s focus on the code flow for a bit.
This image shows that the instructions executed in three groups.
Zooming in on one group: There are two distinct blocks of instructions. Both blocks look similar.
The high-level AES algorithm looks as follows:
KeyExpansionAddRoundKey
for i in range(rounds-1): Substitution ShiftRows MixColumns AddRoundKey
SubstitutionShiftRowsAddRoundKeyWe have 9 “identical” rounds, so it could be AES-128 (which has 10 rounds in total). I wonder why they use different code for odd and even rounds.
The first round starts with some extra code.
Visualization of the first block, which has some extra code at the beginning
The last round ends with some additional code.
The last block is never used before and continues after the second last block
My initial assumption was the following:
The first round should start with additional
KeyExpansionandAddRoundKeyand have some extra code. The last round should not contain theMixColumnsoperation. Since it is between the other operations, having this as “different” code can make sense when inlined.
However, a player on my CTF team later pointed out that it looks like a t-table implementation.
Here is how it could have been spotted from the trace. Remember the four boxes from earlier?
Four blocks showing the t-tables
These boxes are the T-Tables
Here is how they implemented it:
# generate round keys (rk)# generate t-tables (T0-T3)
s0 = rk[0] ^ IN[0:4]s1 = rk[1] ^ IN[4:8]s2 = rk[2] ^ IN[8:12]s3 = rk[3] ^ IN[12:16]
for i in range(rounds-1): t0 = rk[(i*8)+ 4] ^ T3[s0 >> 0x18] ^ T2[s1 >> 0x10 & 0xff] ^ T1[s2 >> 8 & 0xff] ^ T0[s3 & 0xff] t1 = rk[(i*8)+ 5] ^ T3[s1 >> 0x18] ^ T2[s2 >> 0x10 & 0xff] ^ T1[s3 >> 8 & 0xff] ^ T0[s0 & 0xff] t2 = rk[(i*8)+ 6] ^ T3[s2 >> 0x18] ^ T2[s3 >> 0x10 & 0xff] ^ T1[s0 >> 8 & 0xff] ^ T0[s1 & 0xff] t3 = rk[(i*8)+ 7] ^ T3[s3 >> 0x18] ^ T2[s0 >> 0x10 & 0xff] ^ T1[s1 >> 8 & 0xff] ^ T0[s2 & 0xff] s0 = rk[(i*8)+ 8] ^ T3[t0 >> 0x18] ^ T2[t1 >> 0x10 & 0xff] ^ T1[t2 >> 8 & 0xff] ^ T0[t3 & 0xff] s1 = rk[(i*8)+ 9] ^ T3[t1 >> 0x18] ^ T2[t2 >> 0x10 & 0xff] ^ T1[t3 >> 8 & 0xff] ^ T0[t0 & 0xff] s2 = rk[(i*8)+10] ^ T3[t2 >> 0x18] ^ T2[t3 >> 0x10 & 0xff] ^ T1[t0 >> 8 & 0xff] ^ T0[t1 & 0xff] s3 = rk[(i*8)+11] ^ T3[t3 >> 0x18] ^ T2[t0 >> 0x10 & 0xff] ^ T1[t1 >> 8 & 0xff] ^ T0[t2 & 0xff]i+=1
t0 = rk[(i*8)+ 4] ^ T3[s0 >> 0x18] ^ T2[s1 >> 0x10 & 0xff] ^ T1[s2 >> 8 & 0xff] ^ T0[s3 & 0xff]t1 = rk[(i*8)+ 5] ^ T3[s1 >> 0x18] ^ T2[s2 >> 0x10 & 0xff] ^ T1[s3 >> 8 & 0xff] ^ T0[s0 & 0xff]t2 = rk[(i*8)+ 6] ^ T3[s2 >> 0x18] ^ T2[s3 >> 0x10 & 0xff] ^ T1[s0 >> 8 & 0xff] ^ T0[s1 & 0xff]t3 = rk[(i*8)+ 7] ^ T3[s3 >> 0x18] ^ T2[s0 >> 0x10 & 0xff] ^ T1[s1 >> 8 & 0xff] ^ T0[s2 & 0xff]
OUT[0:4] = rk[(i*8)+ 8] ^ T1[t0 >> 0x18] & 0xff000000 ^ T0[t1 >> 0x10 & 0xff] & 0xff0000 ^ T3[t2 >> 8 & 0xff] & 0xff00 ^ T2[t3 & 0xff] & 0xffOUT[4:8] = rk[(i*8)+ 9] ^ T1[t1 >> 0x18] & 0xff000000 ^ T0[t2 >> 0x10 & 0xff] & 0xff0000 ^ T3[t3 >> 8 & 0xff] & 0xff00 ^ T2[t0 & 0xff] & 0xffOUT[8:12] = rk[(i*8)+10] ^ T1[t2 >> 0x18] & 0xff000000 ^ T0[t3 >> 0x10 & 0xff] & 0xff0000 ^ T3[t0 >> 8 & 0xff] & 0xff00 ^ T2[t1 & 0xff] & 0xffOUT[12:16] = rk[(i*8)+11] ^ T1[t3 >> 0x18] & 0xff000000 ^ T0[t0 >> 0x10 & 0xff] & 0xff0000 ^ T3[t1 >> 8 & 0xff] & 0xff00 ^ T2[t2 & 0xff] & 0xffPerform DFA to get the key
Now that we have identified the different rounds and know the AES key size, we must corrupt a single byte of the AES state between the last two MixColumns operations.
Somewhere at the beginning of this blob of code should be fine.
The second-to-last block containing the execution between the last two MixColumns AES operations
I decided to introduce a fault by just skipping an instruction. Other methods could involve corrupting registers or directly writing to the internal state on the stack.
I instructed Qiling to take a snapshot directly before the decryption and hooked the first instruction where I could identify the decrypted result in the memory dump.
Then, I started skipping instructions in the area that should have been inside this time frame until I noticed that exactly four bytes had changed in the output.
Trace showing that four AES output bytes have changed after having introduced a fault
Side track: How does AES DFA work?
Now that we have a working setup, let’s work backwards to figure out why this works. We have a correct fault if exactly four output bytes are different. This is what happens when we corrupt t3 and how it affects the output:
`t3_` is the faulted variable `t3`
OUT[0:4] = [...] ^ T2[t3_ & 0xff] & 0xffOUT[4:8] = [...] ^ T3[t3_ >> 8 & 0xff] & 0xff00OUT[8:12] = [...] ^ T0[t3_ >> 0x10 & 0xff] & 0xff0000OUT[12:16] = [...] ^ T1[t3_ >> 0x18] & 0xff000000The output mostly stays the same except for 4 bytes, one in each dword.
But how can we recover the key from this?
The round keys are derived from the AES key. All involved operations are invertible, so the AES key can be calculated from any round key.
Hence, we only need to somehow recover the round key.
Let’s assume we know the value of t3, then the first output byte is calculated like this:
OUT[0] = rk[(i*8)+ 8] ^ T2[t3 & 0xff] & 0xffT2 is no secret; calculate OUT[0] ^ T2[t3 & 0xff] & 0xff to get a byte of the last round key.
We don’t know the value of t3, but maybe we can guess it! Actually, we only need to know the value of t3 & 0xff, which is a single byte, so there are just 256 possibilities. But how can we find the correct one?
If we XOR the output byte of the fault and the original value (this is the Differential in DFA), we get the following:
OUT[0] = rk[(i*8)+ 8] ^ T2[t3 & 0xff] & 0xffOUT_[0] = rk[(i*8)+ 8] ^ T2[t3_ & 0xff] & 0xff
OUT[0] ^ OUT_[0] = T2[t3 & 0xff] & 0xff ^ T2[t3_ & 0xff] & 0xffWe can now use the last equation as an oracle. If we try out all combinations of t3 & 0xff and t3_ & 0xff and only not down all values where this equation is true, we get a list of possible values for t3 & 0xff. By using more and more different faults, we can narrow down t3 & 0xff to a single value and recover the first byte of the last round key.
Do it for all t0-t3 and all output bytes, and you will get the whole round key.
~ Side Track END!
Luckily, we don’t need to implement this ourselves, phoenixAES is a python module which does the heavy lifting.
We only have to produce a few faults and create a file with them. The first line should be the output without a fault.
So, in my case, the file would look like this:
4f78655756565656516471744a7a4e67b878655756d85656516418744a7a4e0c...import phoenixAES
print(phoenixAES.crack_file("tracefile", encrypt=False))Having generated a few faults (fewer than 10), the script started printing parts of the key.
Few bytes of AES key cracked…
Having generated enough faults for every output byte, we can finally compute the key.
Full AES key cracked
Verification
I then verified that the key was correct by decrypting the keybox stored on the device in /data/vendor/mediadrm/IDM1013/L3/ay64.dat.
ROOT_KEY = bytes.fromhex("67B2963950E3ED2E3DC49D5740982BAC")
with open("<rootfs>/data/vendor/mediadrm/IDM1013/L3/ay64.dat", "rb") as f: keybox = f.read()
decrypted_keybox = AES.new(ROOT_KEY, AES.MODE_CBC, iv=b"\x00" * 16).decrypt(keybox)device_id, device_key, device_data, magic, crc = struct.unpack( "32s16s72s4s4s", decrypted_keybox)assert magic == b "kbox"print("Device id:", device_id.hex())print("Device key:", device_key.hex())print("Device data:", device_data.hex())
Decrypted keybox using the cracked key
Impact
With the device key, it is now possible to decrypt the wrapped RSA key of the device certificate and finally decrypt the content key after following the key ladder, the content key can then be used to decrypt the media. The keybox can also be used to request new device certificates. Initially, this was where my journey with Widevine was intended to end. But while writing this blog post, something got me hooked. When I took screenshots of the traces, I noticed more AES operations before the keybox encryption. From just looking at the traces, I could not figure out what they were used for, so I decided it was time to break the obfuscation.
Deobfuscating Widevine L3
To break the obfuscation, I tried to first understand how the initialization code works. The function names and global symbols were obfuscated. For this blog post, I left them as they were so that it is possible to follow along with the same version of libwvhidl.so.
The initialization function is called _lcc01. After logging the Widevine version string, it fills a 16-byte buffer with static, seemingly random values and constructs a large array structure (fgqcrnsl). Each entry consists of five 32-bit integers. Here is the structure definition:
struct wv_vm_entry { uint32_t offset; uint32_t size; uint32_t always_zero; uint32_t checksum; uint32_t static_def;};Here is the decompiler output:
*piVar2 = (int)(piVar2 + 1); pthread_mutex_lock((pthread_mutex_t *)(piVar2 + 3)); wvcdm::Log("","",0,2,"Level3 Library 4464 Apr 20 2018 14:54:35"); CHAR_ARRAY_0038b9e0[0] = 'M'; CHAR_ARRAY_0038b9e0[1] = -0x20; CHAR_ARRAY_0038b9e0[2] = '<'; CHAR_ARRAY_0038b9e0[3] = 'j'; CHAR_ARRAY_0038b9e0[4] = -0x75; CHAR_ARRAY_0038b9e0[5] = '\t'; CHAR_ARRAY_0038b9e0[6] = 'f'; CHAR_ARRAY_0038b9e0[7] = -0x5e; CHAR_ARRAY_0038b9e0[8] = -8; CHAR_ARRAY_0038b9e0[9] = -0x14; CHAR_ARRAY_0038b9e0[10] = 'W'; CHAR_ARRAY_0038b9e0[0xb] = -0x47; CHAR_ARRAY_0038b9e0[0xc] = -3; CHAR_ARRAY_0038b9e0[0xd] = -0x55; CHAR_ARRAY_0038b9e0[0xe] = '\0'; CHAR_ARRAY_0038b9e0[0xf] = '\"'; fgqcrnsl[0x2ed].offet = 0; fgqcrnsl[0x2ed].size = 0x214; fgqcrnsl[0x2ed].always_zero = 0; fgqcrnsl[0x2ed].checksum = 0; fgqcrnsl[0x2ed].always_static = uRam0038b9c0; fgqcrnsl[0x2cb].always_static = uRam0038b9c0; fgqcrnsl[0x2cb].offet = 0x214; fgqcrnsl[0x2cb].size = 0x214; fgqcrnsl[0x2cb].always_zero = 0; fgqcrnsl[0x2cb].checksum = 0; fgqcrnsl[0x21a].offet = 0x428; fgqcrnsl[0x21a].size = 0x214; fgqcrnsl[0x21a].always_zero = 0; fgqcrnsl[0x21a].checksum = 0; fgqcrnsl[0x21a].always_static = uRam0038b9c0; fgqcrnsl[0x363].offet = 0x63c; fgqcrnsl[0x363].size = 0x8dd; fgqcrnsl[0x363].always_zero = 0;Most of the function’s code sets up the fgqcrnsl array. Later, some functions are assigned to global memory. These turned out to be handlers/syscalls for a virtual machine running the obfuscated code. Since Widevine is deployed on many different platforms, having some abstraction/interface layer makes sense. That way, the device vendor can write platform-specific code, like how to generate a device’s ID, without the need to modify the Widevine L3 logic.
fgqcrnsl[0x259].always_static = uRam0038b9c0; fgqcrnsl[0x259].offet = 0xf9900; fgqcrnsl[0x259].size = 0x214; fgqcrnsl[0x259].always_zero = 0; fgqcrnsl[0x259].checksum = 0; DAT_0038bbb0 = FUN_0008c100(); piVar2 = (int *)DAT_0038bbc4; uRam0038b988 = 0; *(int *)((int)DAT_0038bbc4 + 0x10) = 0; DAT_0038b920 = &DAT_0038ba40; pcRam0038b990 = wvoec3::clear_cache_function; DAT_0038b8f8 = &DAT_0038ba40; pcRam0038baa8 = memcmp; uStack_1c = 0; pcRam0038baac = memset; _DAT_0038ba40 = cmiqoqlf; pcRam0038ba44 = rbovbloj; pcRam0038ba48 = edxbgfhs; pcRam0038ba4c = wvoec3::clear_cache_function; pcRam0038ba58 = adpveuve; pcRam0038ba50 = aykilcti; pcRam0038ba54 = rfdncxfe; pcRam0038ba5c = pnvgwxew; pcRam0038ba60 = htxvewae;Once everything has been initialized, the function starts the VM by calling the cwkfcplc function.
VM_CONTEXT[0x2c] = clock_gettime; VM_CONTEXT[0x2d] = wvoec3::generate_entropy; VM_CONTEXT[0x2e] = tfdlgwrh; VM_CONTEXT[0x2f] = ogfecvcl; piVar2[4] = 0; cwkfcplc(0x18c,0x198,VM_CONTEXT,&uStack_1c); pthread_mutex_unlock((pthread_mutex_t *)((int)DAT_0038bbc4 + 0xc)); if (___stack_chk_guard == local_18) { return uStack_1c; } /* WARNING: Subroutine does not return */ __stack_chk_fail();The first argument is the VM internal function ID to start at, and the second is the VM internal function ID where the VM should exit. The third argument is near the destination of some of the VM function assignments. I called it VM_CONTEXT. The remaining arguments appear to be variable.
/* WARNING: Globals starting with '_' overlap smaller symbols at the same address */
void cwkfcplc(int start_function_id,int end_function_id,undefined4 VM_CONTEXT,...)
{ char *pcVar1; int iVar2; int iVar3; int iVar4; undefined4 uVar5; int *piVar6;
iVar2 = ___stack_chk_guard; uVar5 = 0x20e4d1; piVar6 = (int *)&__stack_chk_guard; if (start_function_id != end_function_id) { iVar4 = DAT_0038bbb8; do { // ... start_function_id = zmpczfhk(start_function_id,VM_CONTEXT,&stack0x00000010,uVar5,piVar6,&stack0x00000010); // ... } while (start_function_id != end_function_id); } if (*piVar6 == iVar2) { return; } /* WARNING: Subroutine does not return */ __stack_chk_fail();}Here, we can see the function-call indirection in action. We have a while-loop which always calls the same function: zmpczfhk (it probably somehow executes the VM code). The first argument holds a reference to the VM function that should be executed. The return value is compared against the end_function_id. If it matches, the loop exits; if it doesn’t, zmpczfhk gets called with the new start_function_id. This technique is often used to flatten the control flow, making it harder to analyze the program dynamically since there is no call stack.
Let’s take a look at zmpczfhk now.
/* WARNING: Globals starting with '_' overlap smaller symbols at the same address */
undefined4 zmpczfhk(int function_id,undefined4 *VM_CONTEXT,undefined4 param_3)
{ byte bVar1; int iVar2; uint uVar3; code *pcVar4; undefined4 uVar5; int iVar6; uint uVar7; byte abStack_38 [16]; int local_28 [4]; int local_18 [2];
local_18[0] = ___stack_chk_guard; iVar2 = -0x10; iVar6 = function_id; do { bVar1 = *(byte *)((int)VM_CONTEXT + iVar2 + 0x58); iVar6 = iVar6 * 0x19660d + 0x3c6ef35f; *(char *)((int)local_28 + iVar2) = (char)((uint)iVar6 >> 8); *(byte *)((int)local_18 + iVar2) = bVar1 ^ (byte)iVar6; iVar2 = iVar2 + 1; } while (iVar2 != 0); uVar3 = (*(code *)VM_CONTEXT[4])(function_id); iVar6 = (*(code *)*VM_CONTEXT)(uVar3); iVar2 = (*(code *)VM_CONTEXT[6])(function_id); if (uVar3 != 0) { uVar7 = 0; do { local_28[0] = local_28[0] * 0x19660d + 0x3c6ef35f; *(byte *)(iVar6 + uVar7) = abStack_38[uVar7 & 0xf] ^ *(byte *)(iVar2 + uVar7) ^ (byte)((uint)local_28[0] >> 0x10); uVar7 = uVar7 + 1; } while (uVar3 != uVar7); } pcVar4 = (code *)(*(code *)VM_CONTEXT[7])(function_id,iVar6); (*(code *)VM_CONTEXT[2])(iVar6,uVar3,VM_CONTEXT); (*(code *)VM_CONTEXT[5])(function_id,iVar6,uVar3); uVar5 = (*pcVar4)(function_id,VM_CONTEXT,param_3); (*(code *)VM_CONTEXT[1])(iVar6,uVar3); if (___stack_chk_guard == local_18[0]) { return uVar5; } /* WARNING: Subroutine does not return */ __stack_chk_fail();}The code uses a linear congruential generator seeded with the function_id. This is used to fill both 16-byte arrays local_28 and local_18
I started mapping the VM context to a struct.
/* WARNING: Variable defined which should be unmapped: local_18 *//* WARNING: Globals starting with '_' overlap smaller symbols at the same address */
undefined4 zmpczfhk(int function_id,VM_CTX_STRUCT *VM_CONTEXT,undefined4 param_3)
{ byte bVar1; int i; uint size; int iVar2; code *pcVar2; undefined4 uVar3; int iVar4; uint j; byte abStack_38 [16]; byte local_28 [16]; byte local_18 [16];
local_18._0_4_ = ___stack_chk_guard; i = -0x10; iVar4 = function_id; do { bVar1 = VM_CONTEXT->encryption_key[i + 0x10]; iVar4 = iVar4 * 0x19660d + 0x3c6ef35f; local_28[i] = (byte)((uint)iVar4 >> 8); local_18[i] = bVar1 ^ (byte)iVar4; i = i + 1; } while (i != 0); size = (*(code *)VM_CONTEXT->aykilcti)(function_id); iVar4 = (*(code *)VM_CONTEXT->cmiqoqlf)(size); iVar2 = (*(code *)VM_CONTEXT->adpveuve)(function_id); if (size != 0) { j = 0; do { local_28._0_4_ = local_28._0_4_ * 0x19660d + 0x3c6ef35f; *(byte *)(iVar4 + j) = abStack_38[j & 0xf] ^ *(byte *)(iVar2 + j) ^ (byte)((uint)local_28._0_4_ >> 0x10); j = j + 1; } while (size != j); } pcVar2 = (code *)(*(code *)VM_CONTEXT->pnvgwxew)(function_id,iVar4); (*(code *)VM_CONTEXT->edxbgfhs)(iVar4,size,VM_CONTEXT); (*(code *)VM_CONTEXT->rfdncxfe)(function_id,iVar4,size); uVar3 = (*pcVar2)(function_id,VM_CONTEXT,param_3); (*(code *)VM_CONTEXT->rbovbloj)(iVar4,size); if (___stack_chk_guard == local_18._0_4_) { return uVar3; } /* WARNING: Subroutine does not return */ __stack_chk_fail();}VM Functions
Now we can start examining the relevant VM functions.
aykilcti
uint aykilcti(int param_1)
{ DAT_0038b9d8 = fgqcrnsl; return fgqcrnsl[param_1].size;}This function returns the size field of the fgqcrnsl entry at offset function_id. This means the large array (fgqcrnsl) that was initialized in _lcc01 contains some metadata about the VM functions.
cmiqoqlf
undefined * cmiqoqlf(uint param_1)
{ DAT_0038b688 = getpagesize(); DAT_0038b690 = (param_1 / DAT_0038b688 + 1) * DAT_0038b688; ltbgvbfc = ltbgvbfc | 1; DAT_0038b8d0 = 1; DAT_0038b698 = (undefined *)mmap((void *)0x0,DAT_0038b690,2,0x22,-1,0); ltbgvbfc = ltbgvbfc & 0xfe; DAT_0038b8e0 = DAT_0038b698 == (undefined *)0xffffffff; DAT_0038b8d8 = 1; if (!(bool)DAT_0038b8e0) { return DAT_0038b698; } /* WARNING: Subroutine does not return */ abort();}This function creates a new memory area that fits the length specified by the first argument. The protection is 2, which is PROT_WRITE.
adpveuve
undefined * adpveuve(int param_1)
{ DAT_0038b8e8 = &DAT_00288124; DAT_0038b9d8 = fgqcrnsl; return &DAT_00288124 + fgqcrnsl[param_1].offset;}This function returns a pointer to the location inside DAT_00288124 specified by the entry at the function id into fgqcrnsl. DAT_00288124 seems to hold the encrypted data for the VM.
pnvgwxew
undefined * pnvgwxew(int param_1,int param_2)
{ DAT_0038b604 = 0; DAT_0038b600 = param_2; DAT_0038b9d8 = fgqcrnsl; DAT_0038b64c = 0; DAT_0038b648 = fgqcrnsl[param_1].always_zero; DAT_0038b698 = (undefined *)(fgqcrnsl[param_1].always_zero + param_2); return DAT_0038b698;}This sets up some global memory. The always_zero entry is added to the mapped memory area. This could be used in languages like ARM to switch to thumb.
edxbgfhs
void edxbgfhs(void *param_1,undefined4 param_2,int param_3)
{ longlong lVar1; longlong lVar2;
DAT_0038b630 = 0; (**(code **)(param_3 + 0xc))(param_1,param_2); DAT_0038b600 = getpagesize(); DAT_0038b604 = DAT_0038b600 >> 0x1f; lVar1 = (longlong)DAT_0038b600; lVar2 = __udivdi3(param_2,0,DAT_0038b600,DAT_0038b604); lVar1 = (lVar2 + 1) * lVar1; DAT_0038b648 = (size_t)lVar1; DAT_0038b64c = (undefined4)((ulonglong)lVar1 >> 0x20); DAT_0038b8d0 = 4; ltbgvbfc = ltbgvbfc | 4; DAT_0038b630 = mprotect(param_1,DAT_0038b648,5); ltbgvbfc = ltbgvbfc & 0xfb; DAT_0038b868 = DAT_0038b630 != 0; DAT_0038b8d8 = 4; if (!(bool)DAT_0038b868) { return; } /* WARNING: Subroutine does not return */ abort();}This function sets the protection to 5, which is PROT_READ | PROT_EXEC.
rfdncxfe
void rfdncxfe(int param_1,int param_2,uint param_3)
{ bool bVar1;
DAT_0038b8a8 = param_3 != 0; DAT_0038b938 = param_1; DAT_0038b9d8 = fgqcrnsl; DAT_0038b678 = fgqcrnsl[param_1].checksum; DAT_0038b8d0 = DAT_0038b678; DAT_0038b668 = 0; DAT_0038b700 = param_2; DAT_0038b61c = 0; DAT_0038b618 = 0; if ((bool)DAT_0038b8a8) { DAT_0038b668 = 0; DAT_0038b618 = 0; DAT_0038b61c = 0; do { DAT_0038b668 = DAT_0038b668 + *(byte *)(param_2 + DAT_0038b618); bVar1 = 0xfffffffe < DAT_0038b618; DAT_0038b618 = DAT_0038b618 + 1; DAT_0038b61c = DAT_0038b61c + bVar1; DAT_0038b8a8 = DAT_0038b61c < (DAT_0038b618 < param_3); } while ((bool)DAT_0038b8a8); } DAT_0038b870 = DAT_0038b678 != DAT_0038b668; if ((bool)DAT_0038b870) { wvcdm::Log("vendor/widevine/libwvdrmengine/level3/x86/libl3oemcrypto.cpp","rfdncxfe",0x152e7,0, "// XXX ERROR: checksum for %zd is %d not %d.\n",param_1,DAT_0038b668,DAT_0038b678); /* WARNING: Subroutine does not return */ exit(1); } return;}The log message spoils what this function is doing. It calculates the checksum of the memory area and compares it against the checksum stored inside the fgqcrnsl array. The checksum is just the sum of all bytes.
rbovbloj
void rbovbloj(void *param_1,undefined4 param_2)
{ longlong lVar1; longlong lVar2;
DAT_0038b618 = getpagesize(); DAT_0038b61c = DAT_0038b618 >> 0x1f; lVar1 = (longlong)DAT_0038b618; lVar2 = __udivdi3(param_2,0,DAT_0038b618,DAT_0038b61c); lVar1 = (lVar2 + 1) * lVar1; DAT_0038b640 = (size_t)lVar1; DAT_0038b644 = (undefined4)((ulonglong)lVar1 >> 0x20); DAT_0038b8d0 = 2; ltbgvbfc = ltbgvbfc | 2; munmap(param_1,DAT_0038b640); ltbgvbfc = ltbgvbfc & 0xfd; DAT_0038b8d8 = 2; return;}This function unmaps the memory area again.
From a high level, the zmpczfhk function does the following:
- Derive an encryption key based on the
function_id - Get the size of the function
- Allocate a new writable memory area that fits the code
- Get a pointer to the encrypted memory
- Decrypt the function and store it in the allocated memory
- Get a function pointer to where the code starts in the decrypted area
- Remap the memory area as readable and executable
- Calculate and check the checksum of the decrypted memory
- Execute the code (returns the next
function_id) - Unmap the memory area
We can reimplement the logic to dump the decrypted code or, for lazy people, hook the function that calculates the checksum (rfdncxfe) to dump all decrypted memory before it is executed. Hooking the function also has the benefit that we only decrypt what is used and directly see the control flow.
Here is the output from my emulator:
checksum function_id: 396checksum function_id: 876108 collapsed lines
checksum function_id: 16checksum function_id: 16checksum function_id: 354checksum function_id: 354checksum function_id: 353checksum function_id: 353checksum function_id: 456checksum function_id: 936checksum function_id: 452checksum function_id: 932checksum function_id: 15checksum function_id: 365checksum function_id: 845checksum function_id: 2checksum function_id: 482checksum function_id: 3checksum function_id: 483checksum function_id: 4checksum function_id: 484checksum function_id: 5checksum function_id: 485checksum function_id: 6checksum function_id: 486checksum function_id: 7checksum function_id: 487checksum function_id: 8checksum function_id: 488checksum function_id: 9checksum function_id: 489checksum function_id: 10checksum function_id: 490checksum function_id: 11checksum function_id: 491checksum function_id: 359checksum function_id: 839checksum function_id: 360checksum function_id: 840checksum function_id: 364checksum function_id: 844checksum function_id: 2checksum function_id: 3checksum function_id: 4checksum function_id: 5checksum function_id: 6checksum function_id: 7checksum function_id: 8checksum function_id: 9checksum function_id: 10checksum function_id: 11checksum function_id: 366checksum function_id: 846checksum function_id: 2checksum function_id: 482checksum function_id: 3checksum function_id: 483checksum function_id: 4checksum function_id: 484checksum function_id: 5checksum function_id: 485checksum function_id: 11checksum function_id: 491checksum function_id: 359checksum function_id: 839checksum function_id: 363checksum function_id: 843checksum function_id: 2checksum function_id: 3checksum function_id: 4checksum function_id: 5checksum function_id: 11checksum function_id: 15checksum function_id: 452checksum function_id: 17checksum function_id: 17checksum function_id: 18checksum function_id: 18checksum function_id: 19checksum function_id: 19checksum function_id: 20checksum function_id: 20checksum function_id: 21checksum function_id: 21checksum function_id: 22checksum function_id: 22checksum function_id: 23checksum function_id: 23checksum function_id: 24checksum function_id: 24checksum function_id: 25checksum function_id: 25checksum function_id: 26checksum function_id: 26checksum function_id: 27checksum function_id: 27checksum function_id: 28checksum function_id: 28checksum function_id: 29checksum function_id: 29checksum function_id: 30checksum function_id: 30checksum function_id: 31checksum function_id: 31checksum function_id: 32checksum function_id: 32checksum function_id: 397checksum function_id: 877checksum function_id: 15checksum function_id: 395checksum function_id: 87530 collapsed lines
checksum function_id: 15checksum function_id: 395checksum function_id: 400checksum function_id: 880checksum function_id: 15checksum function_id: 435checksum function_id: 915checksum function_id: 436checksum function_id: 916checksum function_id: 437checksum function_id: 917checksum function_id: 438checksum function_id: 918checksum function_id: 440checksum function_id: 920checksum function_id: 441checksum function_id: 921checksum function_id: 442checksum function_id: 922checksum function_id: 443checksum function_id: 923checksum function_id: 444checksum function_id: 924checksum function_id: 445checksum function_id: 925checksum function_id: 446checksum function_id: 926checksum function_id: 447checksum function_id: 927checksum function_id: 365checksum function_id: 845checksum function_id: 2checksum function_id: 482checksum function_id: 3checksum function_id: 483checksum function_id: 4checksum function_id: 484checksum function_id: 5checksum function_id: 485checksum function_id: 6checksum function_id: 486checksum function_id: 7checksum function_id: 487checksum function_id: 8checksum function_id: 488checksum function_id: 9checksum function_id: 489checksum function_id: 10checksum function_id: 490checksum function_id: 11checksum function_id: 491checksum function_id: 359checksum function_id: 839checksum function_id: 360checksum function_id: 840checksum function_id: 364checksum function_id: 844checksum function_id: 2checksum function_id: 3checksum function_id: 4checksum function_id: 5checksum function_id: 6checksum function_id: 7checksum function_id: 8checksum function_id: 9checksum function_id: 10checksum function_id: 11checksum function_id: 15checksum function_id: 402checksum function_id: 882checksum function_id: 15checksum function_id: 15I just dumped every decrypted function into a file. The decrypted content of 875 is just ay64.dat, so there is not just code but also data in the encrypted section.
I then used this Python snippet to build a single binary containing all decrypted functions:
from pathlib import Pathimport os
prefix = ""suffix = ""dataset = Path("decrypted_functions")for p in dataset.glob("*.bin"): name = p.name.removesuffix(".bin") with open(p, "rb") as f: data = f.read() prefix += f".globl _{name}\n" suffix += f"""_{name}:.byte {", ".join(f"0x{x:02X}" for x in data)} """
with open("out.S", "w") as f: f.write(prefix+suffix)
os.system("as out.S --32 -o out")We can open the binary in a decompiler. Here is the output of the first function.
undefined4 _396(int function_id,VM_CTX_STRUCT *VM_CONTEXT,undefined4 param_3)
{ uint uVar1; int iVar2; byte *pbVar3; undefined *puVar4; int iVar5; code *pcVar6; undefined4 uVar7; uint local_14; int local_10;
uVar1 = (*(code *)VM_CONTEXT->aykilcti)(function_id + 0x1e0); iVar2 = (*(code *)VM_CONTEXT->adpveuve)(function_id + 0x1e0); pbVar3 = (byte *)(*(code *)VM_CONTEXT->adpveuve)(0); puVar4 = (undefined *)(*(code *)VM_CONTEXT->adpveuve)(1); *pbVar3 = VM_CONTEXT->encryption_key[0] ^ 0xf9; *puVar4 = 0x72; pbVar3[1] = ~VM_CONTEXT->encryption_key[1] * -0x27 + 0x55; puVar4[1] = 0xa8; pbVar3[2] = VM_CONTEXT->encryption_key[2] ^ 0xc1; puVar4[2] = 0x1b; pbVar3[3] = VM_CONTEXT->encryption_key[3] & 0x28 | 0xc2; puVar4[3] = 0x1b; pbVar3[4] = VM_CONTEXT->encryption_key[4] ^ 0xb2; puVar4[4] = 9; pbVar3[5] = ~VM_CONTEXT->encryption_key[5] * -0x7d + 0x55; puVar4[5] = 0xe3; pbVar3[6] = 0xea; puVar4[6] = 0x68; pbVar3[7] = VM_CONTEXT->encryption_key[7] & 0x2a | 0x90; puVar4[7] = 2; pbVar3[8] = VM_CONTEXT->encryption_key[8] ^ 0xb1; puVar4[8] = 0xa2; pbVar3[9] = VM_CONTEXT->encryption_key[9] ^ 0x23; puVar4[9] = 0xef; pbVar3[10] = VM_CONTEXT->encryption_key[10] & 0x80 | 0x7a; puVar4[10] = 0x1f; pbVar3[0xb] = VM_CONTEXT->encryption_key[0xb] ^ 0x23; puVar4[0xb] = 99; pbVar3[0xc] = VM_CONTEXT->encryption_key[0xc] ^ 0x3d; puVar4[0xc] = 0x31; pbVar3[0xd] = ~VM_CONTEXT->encryption_key[0xd] * -0x49 + 0x55; puVar4[0xd] = 0x7e; pbVar3[0xe] = VM_CONTEXT->encryption_key[0xe] ^ 0xba; puVar4[0xe] = 1; pbVar3[0xf] = VM_CONTEXT->encryption_key[0xf] ^ 0xf9; puVar4[0xf] = 0xbe; iVar5 = (*(code *)VM_CONTEXT->cmiqoqlf)(uVar1); local_10 = *(int *)pbVar3; for (local_14 = 0; local_14 < uVar1; local_14 = local_14 + 1) { local_10 = local_10 * 0x19660d + 0x3c6ef35f; *(byte *)(local_14 + iVar5) = puVar4[local_14 & 0xf] ^ *(byte *)(local_14 + iVar2) ^ (byte)((uint)local_10 >> 0x10); } (*(code *)VM_CONTEXT->edxbgfhs)(iVar5,uVar1,VM_CONTEXT); (*(code *)VM_CONTEXT->rfdncxfe)(function_id + 0x1e0,iVar5,uVar1); pcVar6 = (code *)(*(code *)VM_CONTEXT->pnvgwxew)(function_id + 0x1e0,iVar5); uVar7 = (*pcVar6)(VM_CONTEXT,param_3); (*(code *)VM_CONTEXT->rbovbloj)(iVar5,uVar1); return uVar7;}Yay, another layer of encryption! But we can see that rfdncxfe is called again, so we should already have the decrypted function in our binary. (396+0x1e0 = 876)
undefined4 _876(VM_CTX_STRUCT *VM_CONTTEXT,undefined4 *param_2)
{ undefined local_44 [4]; uint local_40; int local_3c; uint local_38; int local_34; uint *local_30; int local_2c; undefined4 *local_28; undefined4 *local_24; VM_CTX_STRUCT *local_20; uint local_1c; undefined local_15; uint local_14; uint local_10;
local_20 = VM_CONTTEXT; local_24 = (undefined4 *)*param_2; *local_24 = 0x1c; local_28 = (undefined4 *)(*(code *)VM_CONTTEXT->rfowqsjn)(VM_CONTTEXT,0x10,0x6a4); *local_28 = 0; for (local_10 = 0; local_10 < 0x140; local_10 = local_10 + 1) { local_28[local_10 + 1] = local_10 + 1; *(undefined *)((int)local_28 + local_10 + 0x504) = 1; } local_28[0x191] = 0; for (local_14 = 0; local_14 < 0x10; local_14 = local_14 + 1) { local_28[local_14 + 0x192] = local_14 + 1; *(undefined *)((int)local_28 + local_14 + 0x688) = 1; } local_28[0x1a6] = 0; local_28[0x1a7] = 0; *(undefined *)(local_28 + 0x1a8) = 0; (*(code *)local_20->ydzgdvlz)(VM_CONTTEXT,local_28,0x10); local_2c = (*(code *)local_20->rfowqsjn)(VM_CONTTEXT,0x162,0xd); *(undefined *)(local_2c + 0xc) = 0; *(undefined4 *)(local_2c + 8) = 0; (*(code *)local_20->ydzgdvlz)(VM_CONTTEXT,local_2c,0x162); local_30 = (uint *)(*(code *)local_20->rfowqsjn)(VM_CONTTEXT,0x161,0x10); local_15 = 1; local_34 = (*(code *)local_20->clock_gettime)(1,local_44); local_38 = (*(code *)local_20->generate_entropy)(); if ((local_34 == -1) || (local_38 == 0)) { local_15 = 0; } *local_30 = local_40 ^ local_38; local_30[1] = (int)(local_40 ^ local_38) >> 0x1f; local_34 = (*(code *)local_20->clock_gettime)(1,local_44); local_38 = (*(code *)local_20->generate_entropy)(); if ((local_34 == -1) || (local_38 == 0)) { local_15 = 0; } local_30[2] = local_40 ^ local_38; local_30[3] = (int)(local_40 ^ local_38) >> 0x1f; *local_30 = *local_30 | 1; local_30[1] = local_30[1]; local_30[2] = local_30[2] | 1; local_30[3] = local_30[3]; (*(code *)local_20->ydzgdvlz)(VM_CONTTEXT,local_30,0x161); (*(code *)local_20->tfdlgwrh)(); for (local_1c = 0; local_1c < 0x10; local_1c = local_1c + 1) { local_3c = (*(code *)local_20->rfowqsjn)(VM_CONTTEXT,local_1c + 0x11,0xd40); *(undefined *)(local_3c + 0xd3c) = 0; (*(code *)local_20->ydzgdvlz)(VM_CONTTEXT,local_3c,local_1c + 0x11); } return 0x18d;}This looks a bit more like logic :)
That is all there is to deobfuscating the code. We can now use GhidraFindcrypt to find AES constants.
AES inverse sbox in Ghidra
In the trace, we can locate the use of 490 and look at the function using it. Before that, function 10 is called, and this function appears to be called from function 845.
undefined4 _845(int param_1,undefined4 *param_2)
{ undefined local_140 [244]; undefined4 local_4c; undefined4 local_48; undefined4 local_44; undefined4 local_40; undefined4 local_3c; undefined4 local_38; undefined4 local_34; undefined4 local_30; undefined4 local_2c; undefined4 local_28; undefined4 local_24; undefined4 local_20; undefined4 local_1c; undefined4 local_18; undefined4 local_14; int local_10;
local_10 = param_1; local_14 = *param_2; local_18 = param_2[1]; local_1c = param_2[2]; local_20 = param_2[3]; local_24 = param_2[4]; local_28 = (**(code **)(param_1 + 0x28))(param_1,2,0x400); local_2c = (**(code **)(local_10 + 0x28))(param_1,3,0x400); local_30 = (**(code **)(local_10 + 0x28))(param_1,4,0x400); local_34 = (**(code **)(local_10 + 0x28))(param_1,5,0x400); local_38 = (**(code **)(local_10 + 0x28))(param_1,6,0x400); local_3c = (**(code **)(local_10 + 0x28))(param_1,7,0x400); local_40 = (**(code **)(local_10 + 0x28))(param_1,8,0x400); local_44 = (**(code **)(local_10 + 0x28))(param_1,9,0x400); local_48 = (**(code **)(local_10 + 0x28))(param_1,10,0x100); local_4c = (**(code **)(local_10 + 0x28))(param_1,0xb,0x28); (**(code **)(local_10 + 0x24)) (0x167,0x198,local_10,local_140,local_14,local_28,local_2c,local_30,local_34,local_4c); (**(code **)(local_10 + 0x24)) (0x168,0x198,local_10,local_140,local_2c,local_38,local_3c,local_40,local_44); (**(code **)(local_10 + 0x24)) (0x16c,0x198,local_10,local_1c,local_24,local_20,local_140,local_18,local_38,local_3c, local_40,local_44,local_48); (**(code **)(local_10 + 0x2c))(param_1,local_28,2); (**(code **)(local_10 + 0x2c))(param_1,local_2c,3); (**(code **)(local_10 + 0x2c))(param_1,local_30,4); (**(code **)(local_10 + 0x2c))(param_1,local_34,5); (**(code **)(local_10 + 0x2c))(param_1,local_38,6); (**(code **)(local_10 + 0x2c))(param_1,local_3c,7); (**(code **)(local_10 + 0x2c))(param_1,local_40,8); (**(code **)(local_10 + 0x2c))(param_1,local_44,9); (**(code **)(local_10 + 0x2c))(param_1,local_48,10); (**(code **)(local_10 + 0x2c))(param_1,local_4c,0xb); return 0x198;}This code initializes stuff. Shortly after, 844 is called, which looks like the function doing the encryption.
After some reverse engineering, I reimplemented the logic in Python (I redacted all secrets for obvious reasons).
from Crypto.Hash import SHA1from Crypto.Cipher import AESfrom Crypto.Util import Paddingimport structimport base64from dataclasses import dataclass, fieldfrom typing import Optionalfrom functools import partialimport binascii
struct_keybox_data = ">II16s48s"struct_keybox = "<32s16s72s4s"
@dataclassclass KeyboxData: version_major: Optional[int] = None level3_version: Optional[int] = None c: Optional[bytes] = None d: Optional[bytes] = None
@dataclassclass Keybox: device_id: Optional[bytes] = None device_key: Optional[bytes] = None data: KeyboxData = field(default_factory=KeyboxData) magic: bytes = b "kbox" crc: Optional[int] = None
rol = lambda v, n: (v << n | v >> (64 - n)) & ((1 << 64) - 1)
def xorshift_next(state: list[int], length): s0 = state[0] s1 = state[1] a, b, c = 55, 14, 36 ret = b"" while length > 0: result = s1 + s0 result &= (1 << 64) - 1 s1 ^= s0 s0 = rol(s0, a) ^ s1 ^ (s1 << b) s0 &= (1 << 64) - 1 s1 = rol(s1, c) ret += struct.pack("<Q", result)[:length] length -= 8 state[0] = s0 state[1] = s1 return ret
def prng_init(): return [1, 1]
def encode_deviceid(device_id): return ( "".join( chr(ord("a") + x) if x < 0x1B else chr(ord("'") + x) for x in map(lambda x: x % 0x34, device_id[:-1]) ).encode() + b"\x00" )
def create_table(): a = [] for i in range(256): k = i << 24 for _ in range(8): k = (k << 1) ^ 0x4C11DB7 if k & 0x80000000 else k << 1 a.append(k & 0xFFFFFFFF) return a
def crc32_mpeg2(bytestream): crc_table = create_table() crc = 0xFFFFFFFF for byte in bytestream: lookup_index = ((crc >> 24) ^ byte) & 0xFF crc = ((crc & 0xFFFFFF) << 8) ^ crc_table[lookup_index] return crc
key_mask = [ # redacted]
vendor_key = [ # redacted]
prng_state = prng_init()prng_next = partial(xorshift_next, prng_state)
device_id = prng_next(0x20)
keybox = Keybox()keybox.device_id = encode_deviceid(device_id)
keybox.data.version_major = 2keybox.data.level3_version = 4464
keybox.device_key = prng_next(0x10)keybox.data.c = prng_next(0x10)
h = SHA1.new()h.update(keybox.device_key)
device_seed = bytearray(0x30)
device_seed[0:0x10] = keybox.device_keydigest = h.digest()device_seed[0x10 : 0x10 + len(digest)] = digestdevice_seed[0x24] = 3
aes_key = bytes([x[0] ^ x[1] for x in zip(vendor_key[:0x10], key_mask[:0x10])])
cipher = AES.new(aes_key, AES.MODE_CBC, iv=b"\x00" * 16)
some_other_aes_key = cipher.decrypt(keybox.data.c)
cipher = AES.new(some_other_aes_key, AES.MODE_CBC, iv=b"\x00" * 16)keybox.data.d = cipher.encrypt(device_seed)
keybox_data = struct.pack( struct_keybox_data, keybox.data.version_major, keybox.data.level3_version, keybox.data.c, keybox.data.d,)
data = struct.pack( struct_keybox, keybox.device_id, keybox.device_key, keybox_data, keybox.magic)data += struct.pack(">I", crc32_mpeg2(data))
h = SHA1.new()
h.update(b"0123456789abc") # wvoec3::getUniqueID
ay64_encryption_key = h.digest()[:16]
cipher = AES.new(ay64_encryption_key, AES.MODE_CBC, iv=b"\x00" * 16)
ay64_dat = cipher.encrypt(data)Now we can see that the keybox encryption key is just the SHA1 of the device ID, which is calculated like this:
undefined * wvoec3::getUniqueID(uint *param_1){ uint uVar1;
uVar1 = property_get("ro.serialno",&DAT_0038bbc8,0); if (((int)uVar1 < 1) && (uVar1 = property_get("net.hostname",&DAT_0038bbc8,0), (int)uVar1 < 1)) { __strncpy_chk2(&DAT_0038bbc8,"0123456789abc",0x5c,0xb8,0xe); } *param_1 = uVar1; return &DAT_0038bbc8;}While the underlying cryptography of the keybox has changed in later revisions of Widevine L3, the keybox encryption appears to still be the same (I just checked against the Android 16 emulator).
Profit?
With this knowledge, it is possible to create custom keyboxes for different vendors, as long as we obtain the vendor key, which doesn’t require deobfuscation. There is no real mitigation for the problem since one can always “just look at the code” to figure out how newer versions work. This is by design and, in reality, not even a problem. There are already some web services that hand you the decryption keys for L3 media, so publishing this research has no practical impact. In theory, Google would need to revoke all vendor keys using this old L3 implementation, but that would result in people being unable to watch protected content on older devices. Just blocking specific keyboxes doesn’t help since we can create new ones. For me personally, this was a fun challenge, and I learned a lot. I did not expect the obfuscation to be that easy to break when figuring out the DFA attack. Most content providers restrict their high-quality content to Widevine L1, so for pirating the newest series on your favorite streaming site, an L3 keybox is not enough. I looked a bit into Widevine L1 implementations during some TrustZone research and actually have been able to obtain an L1 keybox, though not the way one might think ;)
If you want to research other DRMs, I recommend Spotify’s PlayPlay DRM. It’s even included in their bug bounty program.
Feel free to play around with Widevine L3 at the playground on GitHub!