The ultimate guide to code obfuscation for security professionals

Code is money. It is a business asset with a measurable marketplace value in current cashflow and future earnings. It may contain business secrets and sensitive information. As a form of intellectual capital, code is created out of the time, knowledge, skills, research, innovation, and systems of your organization—everything that gives you a competitive advantage.

This is why code needs protection, as with any other business asset. Loss of code or control over it may result in loss of intellectual property, loss of revenue, loss of trust, and loss of reputation. These are disasters for any commercial enterprise. You need code security, now.

Code obfuscation helps protect your valuable code. In this guide, we will examine the purpose and uses of code obfuscation. We trace its place in application security and the software lifecycle. We provide guidance on best practice and success measurement. Finally, we finish with buyer advice and a look at future developments.

Your valuable IP is someone else's goldmine

Find out if 100 of the most downloaded iOS apps can defend against repackaging.

Get the report

What code obfuscation is…and isn't

Obfuscation is a method of securing code recommended by OWASP. It protects source code by making it challenging for both humans and AI to parse or comprehend that code. The function of the code remains the same after the obfuscation process. The code still works, while appearing more complex and less comprehensible than before it was obfuscated.

What obfuscation isn't

Obfuscation is sometimes confused with other techniques. Distinguishing these techniques from obfuscation will help clarify what it means.

Data obfuscation vs code obfuscation

Data obfuscation is an attempt to hide sensitive data items. Code obfuscation is the narrower, more specialized attempt to hide the function of the code e.g., code flow obfuscation (see below). Here are examples of data obfuscation that are employed in cybersecurity.

Masking (or anonymization, tokenization): This is replacing data with random characters or codes to conceal sensitive information.
Redaction: Redaction is the process of removing sensitive information from a file or program.
Encryption: This converts information into a secret code that hides its true meaning.

Code obfuscation vs other techniques

Code obfuscation is often mentioned along with other security techniques and sometimes confused with them.

Tamper-proofing: This is a general name for any measure that makes it harder for an attacker to change the code e.g., checking code integrity at runtime to ensure it has not been modified.
Watermarking (or forensic watermarking): This is embedding a unique identifier into a program for the purposes of traceability.
Runtime controls: This includes root and jailbreak detection, anti-debugging, and anti-hooking techniques.

The problems obfuscation is designed to solve

Code obfuscation reduces the readability and understandability of the source code. By making the code more complex, obfuscation slows down serious attackers while acting as a psychological barrier to prevent attacks from casual hackers and "script kiddies."

The problem of reverse engineering

A major aim of code obfuscation is to make the malicious reverse engineering (RE) of program logic and source code more difficult. Reverse engineering attacks against applications may aim at IP theft, malicious changes, or data breaches. The first step to committing such an attack is to understand the application involved and then formulate a cyberattack plan. Code obfuscation makes attempts to understand the code harder, making reverse engineering attempts a more time consuming, frustrating, and painful process.

The prevention of common attacks

Code obfuscation has the potential security benefit of making vulnerabilities harder to find and exploit by attackers. Exploits often depend on minute details about program internals. Obfuscation deliberately introduces diversity and variability into deployed applications. This can make it more of a challenge for malicious actors to understand the code or construct exploits that reliably work against multiple targets on a grand scale.

The prevention of unauthorized software modifications

Code obfuscation does not actively prevent modification. But it does give the indirect benefit of helping make software more impervious to unauthorized access and modification attempts. If the code is difficult to understand, that means it is more difficult to modify.

Obfuscation techniques differ from tamper-proofing mechanisms, like checksumming. This is the process of verifying the integrity of the code to protect it from modification. But the two types of techniques are often layered together to preserve the intended operation of an application’s code.

Main types of code obfuscation technique

To gain a high-level understanding of how code obfuscation works, it is useful to look at how individual techniques are grouped together. Each obfuscation technique transforms one of these aspects of code.

Control or control flow obfuscations: These are obfuscations that alter and obscure the flow of a program so that it is difficult for an attacker to analyze it. Control flow techniques are some of the most numerous and commonly employed obfuscation methods.
Data obfuscations: These work by concealing and obscuring the data structure of a program. This can be done by targeting arrays, obfuscating variables, and transforming classes.
Layout or lexical obfuscations: This targets the program’s layout structure through renaming identifiers and removing source code formatting. These are one-way techniques—once the information is gone, there is no way to recover the original formatting.

three code obfuscation types with different technique examples

Limitations and potential problems with the code obfuscation process

Code obfuscation is widely used by high-security apps in multiple industries, including banking, gaming and streaming. It offers a cost-effective security boost that can be applied quickly and easily with suitable tools. When applied, code obfuscation is difficult to overcome and protects against a wide range of attacks.

But despite these strengths, there are limitations to the effectiveness of code obfuscation. Many of these weaknesses occur when obfuscation is used in isolation, overused, or used naively. The good news is that these drawbacks are substantially diminished if obfuscation is combined with other security techniques, and the least interfering obfuscation techniques are employed in a smart way. Good quality obfuscation tools should strive to achieve this.

Problems with performance

Code obfuscation is often valued because it strikes a practical balance between security effectiveness and performance. But heavy obfuscation can negatively impact application performance, leading to poor UX. It can even lead to rejection from app stores.

One way to deal with this problem of overload is to prioritize obfuscating critical components that could be exploited rather than obfuscating non-critical code. Decide which code does (e.g., sensitive code) and does not (e.g., performance critical code, non-critical code) require obfuscation.

Problems with protection

Code obfuscation provides little defence against runtime attacks e.g., dynamic instrumentation. It only deals with code behavior and doesn't specifically protect stored data (e.g., API keys, user credentials, tokens, local databases). And it is not a foolproof method against motivated, skilled, and time-rich attackers. It acts as a speed bump, not a wall. This is why it is often combined with runtime controls in the context of protecting mobile applications.

Impact on developer experience

Obfuscation makes the code difficult to understand but it should not make it difficult to develop or debug. It is important to consider how the obfuscation tool fits into the software development lifecycle (SDLC) by evaluating its impact on testability and debuggability. A key feature of suitable tools is the generation of mapping files which allow debug stack traces to be mapped back to the original source code.

Ethical issues

Obfuscation techniques can be used by attackers to hide malicious code. Unethical uses of obfuscation techniques include the hiding of data collection, malicious features, back doors, and concealed vulnerabilities in code.

The place of code obfuscation in application security

According to Gartner’s report on Avoiding Mobile Application Security Pitfalls, there is a five-stage path to building a secure app: threat modeling, secure development, testing, hardening, and anti-tampering. Code obfuscation has a central place in the last of these two main areas of application security and in-app protection. This involves security features that can be built into mobile apps and integrating with them to prevent threats.

Obfuscation as application hardening

Application hardening is the process of strengthening an app’s security by reducing the size of its vulnerable attack surface. It achieves this by modifying an app’s source code, binary code or bytecode to make it more resistant to tampering and RE.

Some application hardening techniques predict, monitor, and detect attacks. Others - like code obfuscation and data encryption - help prevent and block them by increasing the difficulty for a malicious actor to run a static analysis or execute an attack.

Obfuscation as application shielding

Unlike application hardening, which is a largely a preventative measure, application shielding can provide comprehensive protection for mobile apps against attacks and threats in real time. Along with code obfuscation, it includes data obfuscation techniques (like white-box cryptography), as well as runtime protection security technology to protect the app’s entire ecosystem at multiple levels.

Runtime app shielding controls and monitors the process of an app running on a device to ensure it is clean and safe. This is software security mechanism is called runtime application self-protection (RASP). Integrating static code obfuscation with dynamic RASP checks create a more comprehensive security posture.

Gartner's 5-stage path

The position of code obfuscation in application development

There are two ways of viewing where code obfuscation takes place in the application development process.

Code obfuscation in the SDLC

The software development lifecycle (SDLC) consists of the basic stages in building a software application: analysing, planning, designing, building, testing, deploying, and maintaining. Code obfuscation along with other code hardening techniques are crucial to the later part of this cycle.

Code obfuscation happens towards the end of the build stage to ensure that it does not impact code development. But code obfuscation must occur before the testing of software and performance. This is to ensure that the application’s function is acceptable post obfuscation, and that the obfuscation process hasn’t introduced any defects.

Within this crucial build phase, obfuscation can be carried out before, during or after the compilation process (see below). After it, in the maintenance phase, obfuscation may be reapplied to new builds and updated to maintain security. This is where finding the balance between security and performance is crucial.

Code obfuscation and compilation

In a typical compilation process for a compiled language, there are three places where obfuscation can happen. These are the three levels of code obfuscation. It is important to consider the advantages of each option.

Source code level

Source level obfuscators change the text source files before they are compiled. The disadvantages with code obfuscation here are:

It can be tricky to integrate.
It can cause usability issues.

Intermediate code level

Intermediate level obfuscators and most application security tools work at this point. The disadvantages with code obfuscation here are:

It may slow compilation down.
It requires patching into toolchain.
It needs access to the source code in order to obfuscate.

Binary/machine code

Post-build obfuscation means that it doesn't need to take place during the development part of the application life cycle. Rather, it can take place after the product has been built and compilation complete. This is the best obfuscation level. There are many advantages to a post-build, post-compilation obfuscation:

It is easier and simpler to use, integrate, and maintain.
It is not tied to a source code or compiler.
It has no impact on the integrated development environment (IDE).
There is no need to integrate with toolchain.
It is developer tool neutral and so doesn't have to integrate with any specific development tools.
It means that security teams aren't dependent on the development teams/tools and can apply security policies independently from the developers.

Types of obfuscator

Layering as best practice in code obfuscation

There are many best practice recommendations for code obfuscation. For example, code obfuscation practice should be kept up-to-date with the latest tools, tests, and attempts. Attention needs paid to the performance of obfuscated code in speed and size. And it is smart to employ variability and randomness so each app build has a unique binary.

But the best in best practice for obfuscation lies in layering.

Obfuscation is is an important element to app security, but it is not sufficient to work by itself - and isn't support to. Each individual obfuscation technique typically offers relatively little benefit by itself. Employing multiple obfuscation types, techniques and layers with other security measures along side each other is the best way to establish a solid protection shield.

Layered obfuscation techniques

Using a mixture of code obfuscation techniques creates layers of code complexity so that RE is as difficult and time-consuming as possible. A layered defence works by combining several different obfuscation techniques multiple times to maximize impact. The more techniques used in a smart way, the better code is protected.

Here are some obfuscation techniques that are frequently employed together in a layered manner:

Symbol and function renaming: This alters the names of variables and methods to meaningless identifiers.
String obfuscation: This hides the content of code string (sequences of characters in programming).
Dummy or junk code insertion: This adds code that is non-functional and ancillary to make it harder to read.
Operator removal: This removes or replaces programming arithmetic operators in a program (like +, -, *, /, =, etc.) with functional but less readable equivalents.
Opaque predicates: This introduces conditional statements into the code that confuses static analysis tools, as outcomes are known during runtime.
Irreducible control flow: This produces complexity into control flow graphs that standard decompilers find difficult to simplify or optimize.
Polymorphic obfuscation: This generates different variations of obfuscated code for every app build, so hackers have to create new strategies each time.
Control flow flattening: This obfuscates the sequential structure of program flow by flattening code blocks.

Layering obfuscation with other security techniques

Code obfuscation complements other security techniques, enhancing rather than replacing them. This is especially important in areas that obfuscation doesn’t cover. For example, obfuscation alone doesn’t stop runtime attacks, API abuse, or data theft, so it must be combined with other defenses for maximum protection.

Here are some other security techniques that are commonly employed alongside code obfuscation:

Anti-tampering solutions: This is a general name for any cybersecurity measure that protects software and systems from app tampering attempts.
Anti-debugging techniques: These prevent malicious hackers from employing debugging tools to conduct an analysis of the application.
Jailbreaking and root detection: This detects whether an application has been jailbroken or rooted, which are signs of potential tampering.

Ways to measure code obfuscation success

The strength and quality of code obfuscation can be measured by a mix of technical metrics and performance testing. The first can provide some quantitative assurance for particular functions. However, in practical terms, measuring code obfuscation often comes down to a cost-effectiveness analysis. Reconstructing the original code should be much more costly and difficult for attackers than the initial obfuscation attempt.

Quality of an obfuscation transformation

Cost

The cost of mobile security must be balanced against the true cost of insecurity. However, just like any other security measure, code obfuscation introduces cost and "computational overhead" to the system. The higher level of obfuscation, the greater cost. In human terms, this cost can be measured by increases in execution time and resource consumption. In software terms, it can lead to increases in development costs, size, and performance.

Effectiveness

The effectiveness of code obfuscation programs can be measured in different ways. These are two of the standard measurement criteria:

Potency: This is the degree to which a human reader is confused by obfuscation attempts, or how much harder it is to comprehend the obfuscated code. It is often measured by a ‘before’ and ‘after’ comparison of the original code in terms of complexity and added obscurity.
Resiliency: Resilience against RE and tampering is a standard security metric. When specifically applied to code obfuscation, it is a measure of how well or robustly obfuscation resists automatic obfuscation attempts. A highly resilient obfuscation technique remains effective even after being attacked by tools designed to bypass or reverse obfuscation.

These are two other criteria that deserve attention.

Stealth: This is a measure of how seamlessly obfuscated code integrates with the rest of the program. A high stealth measurement means an obfuscation technique doesn't deviate noticeably from the original code structure and blends in well with other parts of the program, making it harder to detect.
Coverage: This is a consideration of how much of the programme is covered by obfuscation. Factors like ease of use and performance degradation could reduce effective coverage.

Code obfuscation as a buyer

There are some obfuscation products that provide similar sorts of capabilities for app security. It is best to focus in the practicality of implementing the tool and the implications for the ongoing code maintenance.

So, the question is this: How practical is the package to use and how sophisticated are the tools supporting it? The whole package is what matters, not just the low-level features. For example, if the tools is difficult to use, coverage is likely to be sparse.

The general rules are these. An obfuscation product should be easy to operate to the degree that it does not need an expert user. It should be able to perform straight out-of-the-box. And any required configuration work should be explicit if required.

Usability

When evaluating an application hardening tool or product that provides code obfuscation as one of its main offerings, ease of use is important. Some tools seem sophisticated in their configuration settings but are hard to use. This introduces higher performance costs and will likely lead to sparser coverage.

Consider which configuration options you need. Don’t disable options and weaken app security due to unused or difficult to use obfuscation settings.
How quickly and seamlessly you want your tool to start working are important purchasing factors.
Research what the product provides in terms of automation and installability. No or low code options might suit you best.
Employing a post-build obfuscation tool means you don't have to be a developer to use it.
A sign of high-level usability is when a tool is able to apply a base level of code obfuscation to everything with little to no effort. But then, in those areas of concern that are of concern, tailored configurations are possible. This flexibility of options is is important for performance considerations.

Performance

When choosing a tool, try to balance its security strengths with its performance limits. The benefit of obfuscation is that it adds protective value. But what is the impact on execution?

Performance is more important in practice than whether a product employs a particular obfuscation technique. Whether you can use those techniques depends on how easy they are to configure and how practical they are to use.
Select a tool that performs the analysis up front so that the impact of configuring obfuscation is minimized.
If a tool requires large amounts of trial and error to configure, the possibility of a frictionless setup is low.

Debuggability

You might launch an application onto the App Store and receive reports that it is crashing on certain devices. Developers discover a bug, which they want to fix, so you can get another release out. But once the code is obfuscated, it is difficult to know what is happening.

A report can tell you that code is crashing but not why.
To help with debugging applications, select software that provides a mapping file that unscrambles the crash report, which attackers can't access.

The next generation of code obfuscation

Although obfuscation has a relatively long history, it continues to evolve in the face of new cybersecurity challenges.

Dynamic obfuscation

The techniques mentioned so far are examples of static obfuscation. They transform the code at protection time and then the code is shipped in its transformed state. The code doesn’t alter again after this. Dynamic obfuscation is when some transformation are applied to the code at protection time, but the code moves, mutates, and even self-repairs in runtime as well. This makes it a moving target to attack, which helps defend against AI attacks.

These two obfuscation approaches are related to two types of target analysis used when attacking a program. Static analysis examines the program at rest, when loaded off disk or the app as a file. An analyst would use a decompiler or disassembler to analyse the code. A dynamic analysis observes the code in an execution environment and observes how in functions at runtime.

Code obfuscation provides sound protection for applications against static analysis. But runtime controls are necessary to protect against dynamic analysis because it interferers with the process of running the application. Most static techniques are understood by pen testers and are increasingly vulnerable to automatic machine learning (ML) attacks. New dynamic approaches help deal with this challenge, and are also better for customer ease of use and maintenance.

Virtual machine-based obfuscation

Virtual machine (VM) obfuscation works by transforming code into another code format, such as another bytecode or instruction type. VM instruction encoding uses real time code generation with diversity to prevent scripting attacks. A hacker who stumbles across this code in a RE attempt cannot interpret its meaning or function because it is a different from anything previously encountered. It runs in an interpreter at runtime, and so in considered a dynamic procedure.

Progressive decryption

Progressive decryption (PD) is another variant of dynamic code obfuscation. With PD, only the part of the application that is running gets decrypted. The remainder stays encrypted, making it difficult for an attacker to observe the entire code. Both code and data are decrypted ‘just-in-time’ for use, as function are entered and then destroyed after use.

Obfuscation and AI

The future of Artificial Intelligence (AI) in cybersecurity involves many aspects. AI has been used in code deobfuscation trials. AI is moderately successful at when dealing with each technique and level of obfuscation in isolation, but less so when they are layered together. Runtime dynamic techniques would further help against AI attacks.

Your next steps in code obfuscation

Learn more about code obfuscation techniques or examples of how code obfuscation works in practice.

Protect your IP and app’s data by obfuscating the purpose of its code with Promon IP Protection Pro.

Book a meeting to ask us about Promon’s powerful RASP and code obfuscation that defends mobile apps against tampering, reverse engineering, and unauthorized access.

Let us protect you and your valuable code, starting today.

Protect your IP and app data

Learn how RASP and code obfuscation defend your mobile apps against tampering and unauthorized access.

Book a meeting