Uncategorized

Premature optimisation is a trap you can easily fall into whenever you are making or designing something. In some ways, it feels wrong to do something badly when you know you could do it better with a little more time. However, it is almost always best to get something working (even if it’s not perfect) rather than striving for an elegant solution that doesn’t yet work. You can always implement the faster, more accurate, or more elegant method later—and often you don’t need to. It is also far easier to improve something that already works than to build something from scratch.

Over the last few days, I have been working on a project that involves decoding a high-frequency, frequency-modulated (FM) audio signal. I recorded this signal on my phone, which has stereo microphones, and spent some time writing a clever beam-forming algorithm that adjusts the amplitude and phase of the signals received by the two microphones to increase the FM signal strength while rejecting background noise. The algorithm used simulated annealing to optimise the amplitude and phase adjustments, and it worked very well in a set of simulated examples that I used for testing.

However, it did not work well on the actual data I collected. I spent quite a while fixing and tuning parameters but could not get it to perform properly. Eventually, I did what I should have done from the beginning: I conducted a real experiment and carried out some simple data analysis to understand what was happening.

I moved the source around different parts of my phone and plotted the power in the FM carrier band. After looking at the plot, it became immediately obvious why the beam-forming algorithm never worked: the sound was far too directional. (This makes complete sense, given that it was a high-frequency sound.) There was never a point where the sound was picked up strongly by both microphones; it was only ever picked up by one at a time. I should have used a much simpler approach: just take the output from whichever channel had the highest power in the carrier band—a single line of code—rather than hundreds.

In fact, the plot of the ratio of signal power to total power implies that in this example, the signal typically makes up the vast majority of the received power. However, the carrier band power is only a proxy for the actual signal, since it also includes FM-encoded noise that is not truly signal. Therefore, the true ratio of signal power to total power is somewhat lower.

JPEG images, at high compression ratios, often have blocky artefacts which look particularly unpleasant. JPEG images look particularly bad around the sharp edges of objects.

There are already ways to reduce the artefacts, however, these methods often don’t use very sophisticated techniques, only using the information from adjacent pixels to smooth out the actual transition between the blocks. An example of this can be found in Kitbi and Dawood’s work https://ieeexplore.ieee.org/document/4743789/ which gave me the original inspiration for this.

An alternative way is to use a convolutional neural network (CNN) to intelligently estimate an optimal block given the original block and the eight surrounding blocks. And then tile over the image to estimate the optimal mapping for each block.

The network design was a 5 layer fully convolutional one, using only small filters. Several different architectures were used, which all gave largely similar results. The compromise between effectiveness and speed (i.e. 1/size) was found with a small network with only 57,987 parameters. Training the network was surprisingly fast, taking only a few hours without a GPU.

The network takes in full colour information and outputs the full colour too. The reason for this is to use all of the possible information. The colour channels are highly correlated with one another. It would be possible to train the network on monochrome images, but that would lose the relation which naturally exists.

So, does it work?

In my opinion, yes it does work. I think my method performs best when there are complicated edges, such as around glasses, or on hairs which are resolved as partial blocks. It works least well, in comparison to the method which photoshop currently uses, when there are large smooth areas.

Text was not present in the training data set. So the poor effectiveness of the network on text is not a significant point of comparison. The above images are quite small. A larger example is below, along with a zoomed in video comparing the original, JPEG, photoshop artefact removal, and my method.

I did calculate root mean squared error values one point comparing the network performance with the JPEG image to the original. In some cases the network was reliably out performing the JPEG – which is impressive, but not too surprising, as this was how it was trained. I don’t think that those sorts of values are too important in this case. The aim isn’t to restore the original image, but to instead reduce the visual annoyance of the artefacts.

If you really wanted to reduce the annoyance of the artefacts you should use JPEG2000 or else have a read of this paper https://arxiv.org/pdf/1611.01704.pdf

Joshua McBruceer

Photography and Projects

Quick Peek: Demodulating FM Sound Signals in a Noisy Environment

JPEG artifact removal with Convolutional Neural Network (AI)