DSP hacking — Reversing G-Pay’s ultrasound signal (part 2)

Nihal Pasham
13 min readApr 12, 2020
Recording cash mode’s ‘near-ultrasound’ signal.

I finally managed to give this project my undivided attention. All it took was a world-stopping event -a stupid virus.

To recap from Part1

Google pay ‘Tez-mode’ (also called ‘cash-mode’) uses near-ultrasound to discover and pair parties. ‘Cash mode’ does not rely on RF-based technologies like (Bluetooth, Wi-Fi, NFC etc.) for data transfer and leverages ‘sound’ instead, which has certain advantages (such as it doesn’t pass through walls).

More precisely, ‘cash-mode’ tries to establish physical co-presence by transmitting a short 8 digit token as inaudible sound.

Put simply, if you want to transfer some money to a friend, this feature lets you transfer cash without having to give out your phone-number or bank-details, assuming both of you are in ‘relative close proximity’.

Google pay’s Tez mode feature in action.

Note: You can only transmit a tiny amount of data (like an 8-digit token) with this mode i.e. names, phone numbers or bank details still use a different data channel.

In Part1, we were able to record the ‘near-ultrasound signal’ with an audio recording tool called audacity and analyze it in Gnu-Radio — a signals analysis platform.

Initial analysis indicated that the transmitted signal is ‘likely’ a combination of 3 distinct signals. Something that looks like this

  • Transmitted signal — y(t) = sin(2piFt) * c(t) * d(t) where F = carrier freq, c(t) = code signal, d(t) = data signal

We then filtered the transmitted signal to remove the carrier and extract the base-band signal (i.e. the signal we’re interested in).

  • Base-band signal — b(t) = c(t) * d(t) after frequency translation and filtering

Lastly, we happen to discover (with some OS-INT) that the extracted base-band signal is a product of the 2 different types of modulations i.e. MFSK followed by DSSS. DSSS or direct sequence spread spectrum is a digital signal processing technique used to make signals resilient to noise, interference and multi-path effects i.e. with DSSS, you can get a signal out in the most harshest of transmission environments.

We ended Part1 with a brief intro to DSSS and how we can reverse (de-modulate) it.

In Part2, the objective is to retrieve the 8-digit token — ‘04755658’ (as seen in the video above). We’ll do just that and more. We’ve a lot to cover-

  1. Loading the extracted base-band signal into python.
  2. DSSS Code sequence generation
  3. Construct DSSS code-signal
  4. DSSS code synchronization
  5. Find the first correlation peak
  6. Code tracking and alignment
  7. Signal de-spreading and visualization via a Fourier transform
  8. Symbol Extraction*
  9. Symbol decoding*
  10. Token recovery*

Before we get started, let’s walk through my set-up.

Tools for the job:

  • An Android/iOS smartphone which has Google pay installed.
  • Audacity — Audio capturing software for signal capture
  • Gnu-Radio — a signals analysis tool for signal filtering
  • Jupyter-notebooks running scipy and numpy python modules — for all of our digital signal processing work.

Let’s begin:

1. Loading the extracted base-band signal into python:

In part1, we managed to extract the base-band signal and perform some initial analysis. We use the same extraction process with a few modifications i.e. tweak the signal capture and filter settings.

  • Modified the signal capture sampling rate in ‘Audacity’ to 48khz
  • Made a few changes to the Gnu-Radio flow-graph (in terms of filtering).
  • You’ll find these changes in my GitHub repo (download the Jupyter-notebook and you should have everything.)

As in part1, the Gnu-Radio flow-graph produces a ‘.wav file’. This .wav file is our filtered base-band. You pass the .wav file to the code snippet below and it separates the {real, imaginary} components of the signal into a python list [𝑟𝑒,𝑖𝑚] and prints the number of base-band-samples along with a plot of the signal.

The number base-band samples depends on the length of the audio capture (this one is around 7 seconds) and from the output below, we get the total number of samples — (499488 or ~500k samples).

Note: The 0–10k range didn’t record any audio in the 16.5–18khz spectrum as I started recording before activating Tez-mode.

Filtered base-band signal (1.5 khz)

2. DSSS code sequence generation:

DSSS or direct sequence spread spectrum is a signal-modulation technique. In DSSS, data bits are modulated by a pseudorandom bit sequence known as a spreading code (which is nothing more than a sequence of binary digits). Additionally, in part1, we discovered (via some OS-INT) that Google-pay uses a 127-bit binary sequence.

DSSS is not exactly an easy to understand modulation scheme. In Part1, I attempt to explain it some detail but I think if you stick around till the end, it’ll make a lot more sense as we’re going to get hands-on in reversing DSSS.

Put simply, all of this just tells us that we should be looking for a particular sequence of binary digits, one that’s 127 bits in length.

Turns out LFSRs (or linear feedback shift registers) are good at generating pseudo-random binary sequences. That’s great but we need a very specific sequence of 1’s and 0’s. One way to solve this problem is with some good old fashioned ‘brute force’ -write an LFSR and cycle through as many possible sequences as you can till you find the one you’re looking for. Although, this method is akin to ‘shooting in the dark’, it works for a short sequence-length of 127 bits.

You may be thinking -how do I know if I have the right sequence? Well, if its the right one, step3 — will work i.e. produce a correlation match. We’ll see what that means shortly.

The actual 127-bit binary sequence is shown below along with some code that performs an extra check. The code snippet below takes a sequence as input and outputs the corresponding bit-mask and initial seed value.

Note: I cross-verified the output from my LFSR implementation with that of Gnu-Radio’s implementation and I can confirm that they match.

G-pay’s DSSS binary code-sequence

Great, we have our code-sequence. Sequences are best visualized as plots rather than a sequence of numbers. So, lets plot this sequence with matplotlib (a python module).

Plot of the binary code sequence with zeros flipped to ‘-1’

3. Construct DSSS code-signal:

Our code sequence is a series of 127 bits but we need a digital signal.

There’s a couple of things we’re going to need if we want to construct a digital signal out of our bits.

  1. First — determine the sampling rate or frequency and
  2. Second—figure out the interpolation method/function

Interpolation can be loosely described as the process of connecting the dots given a sampled signal.

This may look like a straightforward problem where we just up-sample to match the base-band’s (which in our case is 48 kilo samples per second ) sampling frequency and that’s the right answer but it would be prudent to dig a bit deeper. In part1, we were able to gather information about the signal’s symbol rate — which is 23.6 symbols/seconds.

So, if our base-band sample rate is 48 kilo samples per second and we know the symbol rate, we can figure out the number of samples/symbol which is (48000/23.6 = 2033.8) samples/symbol. We can also calculate the symbol period which is the reciprocal of the symbol rate (1/23.6 =42.33 milliseconds.)

All of this simply means that 1 period of the code signal is 42.33 milliseconds (equal to that of one symbol period) and it has the same number of samples per period (i.e. equal to that one symbol or 2033.8).

Symbol rate is also referred to as baud rate.

So, we need to up-sample our 127-bit binary sequence by a factor of at least 16 (i.e. 127 * 16 = 2032) and interpolate it with some function. For interpolation, I used the zero-order hold method but that didn't work (was stuck here for some time). So, I went back to OS-INT and tried to comb through the patent document again for more information.

Face-palm moment! It contained a description of parts of the construction flow with ‘sinc’ interpolation labelled clearly. ** PS to self- ‘focus’ .

With this information, we can now construct our code-signal. Here’s a plot of the code-sequence and its ‘sinc’ interpolated version.

1 period of the binary code sequence and its ‘sinc’ interpolated version

4. DSSS code synchronization (i.e. cross-correlation of code-signal with our base-band signal):

So, we’ve extracted our base-band signal and constructed a code-signal.

In this step, we attempt to correlate these 2 signals with each other. In mathematical terms, correlation is a multiply and accumulate operation. In English, that means it’s a way to assess the degree of similarity between signals and going one step further, you can think of it as a technique used to look for the presence of one signal in another.

In our case, we’re going to cross-correlate the code-signal (we constructed) with the extracted signal (the one that we extracted/sniffed out of thin air).

  • Assume we have 2 signals we want cross-correlated, we slide one signal forward relative to the other and perform correlation at each instant of time.
  • When you do this, at some point the sliding signal aligns with the other signal and at that point we get a nice spike. (See example below.)
An example of cross-correlation. You see a spike in correlation magnitude when both signals are perfectly aligned and its drops as they go out of alignment.

That looks nice but in the digital world, signals are just a sequence of samples. So, we simply multiply our sequence of 2032 code-signal samples with a similar number of extracted base-band samples and just sum them up. This is one correlation operation.

In the next round, we slide the code-signal forward by 1 base-band sample and perform the same operation. We’re essentially correlating the code-signal with slices of the base-band signal, looking for slices that are similar to the code-signal.

We repeat this operation over a little less than the length of base-band signal. Here’s what that looks like in code

Cross-correlation of code signal with the base-band signal

And when you plot the correlation-magnitude scores, here’s what you get. The plot below is a slice of the correlation-magnitude scores (for samples between 100k-150k samples range). In simple terms, this means, we’ve found points where the code-signal aligns with the base-band signal in this 50k sample-range. The large spikes, spaced approximately every 2032 samples indicate synchronization.

Plot of cross-correlation between interpolated code-signal and base-band

5. Find the first correlation peak

Great, so we managed to synchronize the code-signal with our base-band and we can see peaks in our cross-correlation plot. The next step is to find the first correlation peak. The plot above displays a slice of correlation magnitude i.e. correlation magnitude for 50k samples (between the range of 100k-150k) but we have ~500k samples in our base-band.

So, we’ll need inspect all correlations and find the first largest peak.

For the DSP gurus out there who may be reading this, I’d love to hear inputs on this part.

  • This step still needs a human eye for now as the magnitude of correlation keeps changing with the strength of received signal. So, we cant use a common threshold between 2 different audio recordings.
  • Normalization of cross-correlation scores is something I’ve thought about but I have not been very successful.

So, for now- we’ll do this the old fashioned way. Visual inspection of the cross-correlation scores. We can do this with a simple list expression in python.

And here’s the output. Our first correlation peak is at sample number 3844 with a correlation peak of 138.33677656985827

6. DSSS code tracking

So, we have synchronization and we know the location of the first correlation peak (i.e sample number 3844).

This means that if we took every consecutive sequence of 2032 samples of our base-band, starting at sample index 3844, they (parts of the base-band) should (in theory) contain a signal that’s similar to the code-signal.

Mentally picture this (PS -may drive you mental):

Base-band samples|…………..………….…………..…………..……|and so on

first_peak starts__ |..2032..|..2032..|..2032..|..2032..|..2032..| and so on

That’s true — in theory but if we were to just pick the next 2032 samples after our first highest peak and assume that’s the next peak, we’d quickly notice that after a few thousand samples, we end up selecting correlation peaks that aren’t the highest but are in close proximity to the highest peak.

In digital signal processing terms —this is referred to as ‘code tracking’, where we need to continuously check for the highest correlation peak at each peak-sample or else we fall out of alignment.

  • The reason — in a perfect world where interference nor noise exists, we should just be able to take the next len(code_signal) samples and be on our way but sadly that world doesn’t exist.
  • So, we have to account for this by checking for the next highest peak at every probable alignment boundary.

In our case, we look for the highest peak in the next probable alignment boundary (2032 + and -200 samples).

The above snippet of code prints the list of correlation peaks indices and the corresponding magnitude of correlation.

Notes:

  • This is not the full output, only a part of it.
  • The difference between some consecutive indices is not exactly 2032.

7. Signal de-spreading and visualization via a Fourier transform

What was the point of finding parts of the base-band signal that are similar to the code-signal?

Well, we started out with the presumption that our base-band signal is a product of the code and data signals. i.e.

  • Base-band signal — b(t) = c(t) * d(t)

Given the above, it should be possible to recover the data signal (i.e. de-spread the base-band signal) by multiplying base-band and code signals but before that — we needed both signals to be perfectly aligned (or synchronized).

We went through all that just so that we don’t get our multiplications wrong!

Note: If you’re wondering, ‘why another multiply’ and not a divide operation. The code signal is essentially a binary sequence (albeit interpolated) but if you multiply a mixed binary sequence by itself, it has the effect of cancelling out.

Another way to visualize this is with real python plots.

This snippet of code should produce these plots

  • The first one is a plot of a single frequency/tone. (or you can think of as the data signal)
  • The second one is mixed with our actual code-signal (the one we constructed)
  • The third is the unmixed version of the same signal.

Notice, there’s some symmetry in the third plot. In real life, this is not the case as you’ll have to deal with noise (as we’ll soon see).

DSSS — Visual comparison of spread Vs de-spread signals

The above plots offer a way to visually compare signals (data, mixed and unmixed) but lets go a step further and plot their ‘Fast Fourier transforms’.

A Fast Fourier Transform is a mathematical operation used to convert a signal from its time domain representation to a representation in the frequency domain . **This description is good enough for our needs but not complete.

Put simply, if you pass one of our signals as input to an FFT function, it’ll output all frequencies contained in that signal. Example code below

performs an FFT of the mixed and unmixed waves and

  • You can clearly see 2 distinct spikes in the unmixed FFT. That’s the original tone that was mixed in with the code-signal.
  • And a completely different FFT plot for the Mixed FFT.

In the second FFT plot, bandwidth usage (amount of spectrum in use) is much higher than the one above. That’s DSSS at-play.

DSSS — FFT plot of spread Vs de-spread signals

To sum up, de-spreading our base-band and performing an FFT on the output should reveal the actual data signal. Let’s do that then.

The above code-snippet gives us a frequency plot of a slice of the data signal, starting at sample index number 38389 and if we plot more of them at different indices, we should observe that (pretty much) every frequency plot has spikes in the 100 to 500 hertz range.

De-spreading the signal and visualizing data via a Fourier transform.

That’s interesting! It adds up.

We made another presumption in the beginning about the data signal being MFSK modulated.

Multiple frequency shift keying modulation is similar to binary FSK (or 2-FSK) with 2 important differences. MFSK uses multiple tones/frequencies to transmit data and frequencies are spaced at an equal distance from each other.

You could verify this using scipy but that just means a bit more matplotlib code. I found an easier option- passed the de-spread signal through a professional signal analysis tool called ‘baudline’ (essentially the tool offers an infinitely customizable version of the FFT function) and it confirmed that the signal contains

  • multiple frequencies
  • spaced at an equal distance (by ~23.6 hertz) from each other
  • Surprise, Surprise — that’s the same as our symbol_rate of 23.6 symbols/sec and
  • Additionally, if you look closely at baudline’s output, you can see the frequency spectrum is in the 100 to 500 Hz range.
Baudline FFT view of the de-spread signal. Each one of those squiggly lines is a frequency and is spaced 23.6hz apart

I think we have enough information to assume its (the de-spread signal) an MFSK modulated signal.

  • If that’s true, the exact type of MFSK modulation must be 16-MFSK i.e. (500−100)/23.6= 16.9(500−100)/23.6= 16.9
  • I rounded it to 16 (an even number of tones)- apparently its odd to have an odd number of frequencies in MFSK and 16 MFSK is a well-known MFSK modulation.

OK, I just realized this post is tooooooooooooo long! We’ll cover the the rest (symbol extraction, decoding, token generation along with vulnerabilities, learning's and conclusions) in Part3.

--

--

Nihal Pasham

Product Security | IoT Edge & Cloud Security | Security Strategist | Adversarial Resilience & Neural Networks