Wednesday 23 March 2016

Busted

At the end of post "2015, the year in a Breach" (bit.ly/1RqqvG6) we mentioned that the next wave of cyber attacks will not steal information, but alter it. This, was a remark based on a statement made by America’s intelligence chiefs (http://bit.ly/1VS8Pb8).
We think that this is actually going to happen because on one side there is a profit to be made (don’t rob an ATM, just alter the bank account balance) and, on the other, attackers may be motivated by political or military reasons.
Information may be altered when at rest or when in transit. This means that an attacker can either alter information when it is sitting on your DB or when you are transmitting information from one point to another.

When at rest, it is possible to detect unauthorized data modifications by introducing an hashing “safeguard”. Each time an operation produces any type of changes to the data, a new hash should be calculated and saved. Then, each time an read is required, it is possible to detect changes by hashing the information and comparing to the saved one.
Although this may seem feasible, it is not! At the rate that data grows and although hashing functions are very fast, it would not be practical to compute a hash of the data each time an alteration is made because hashing functions still need to go through every byte of information to produce the hash (just try to hash 100mb and 10gb of info and compare the time it takes).

So how exactly can we detect data changes? The answer to that is both simple and complex at the same time. Assuming that there are replication policies in place, it’s simple because you just have to evolve your fault model and change your replication from passive to active and complex because you have to evolve your fault model and change your replication form passive to active!

To understand what follows, keep in mind the following definitions:

  1. Fault: occurs when the system stops working properly (commonly know as crash);
  2. State: represents the status of the system and is the result of all the operations performed on the system;
  3. Write: any operation that changes the state of the system;
  4. Read: returns information according to the current state of the system

To sustain faults, systems can be designed to offer high availability (HA). The HA is built on replication where, if there is at least one replica functioning, the system will continue to operate. To ensure HA, system are designed to sustain at most f simultaneous faults. This means that to actually resist f faults, there needs to be at least f+1 replicas in total. So even if simultaneous faults occur, the system will still function. There are to basic replication models, passive and active.

Most of the current systems rely on passive data replication. In this model there’s an active and a passive replica. The active replica is responsible for backing up writes (operations that change data on the system) asynchronously to the passive replica on a regular basis. When the active replica fails, the passive replica takes over becoming the active.
The asynchronous replication model has two great problems:

  • If data gets changed on the active replica, it will not be detected and, will eventually get copied to the passive replica;
  • When the active replica fails, the data that was not copied to the passive replica will be lost.

In this model, it is possible to force the active replica to write all changes to the passive replica either before or immediately after writing to itself. Doing this will induce extra time on each write since, the active replica needs to contact the passive in a synchronous way before giving the write as completed to the client.
Passive replication (f=1) - synchronous writes
If we consider that each write takes X time and that the time that a message takes to be propagated on the network is Y, we can compare both synchronous and asynchronous writes for the passive model in terms of time from a client’s perspective:

  • Asynchronous = X+Y+Y = X+2Y
  • Synchronous = Asynchronous + X+Y+Y = X+2Y + X+2Y = 2(X+2Y) = 2Asynchronous

So in practice, the client needs to wait twice as long for the synchronous write (assuming fair network hops to keep it simple). Even worst, switching from asynchronous to synchronous writes will only guarantee that data will not be lost in case of failure, it will not detect data changes!

In active replication, writes are performed on each replica at the “same time”. When the client sends a message, it sends it to all replicas and waits for the first write to complete. The client just has to wait for the first write to complete since we assume that the system is working properly if there is at least one replica working (remember f+1 replicas with at least one working properly).

Active replication (f=1)

By direct comparison with the passive replication model, if we consider the same network propagation time Y and the same write time X we see that each write takes the same amount of time of the asynchronous write, X+2Y. We can also see that it gets the same property of the synchronous write that, even if one replica goes down, the system will not lose data. Another common trade with the passive replication model is that we still aren’t detecting data changes.  There is also a new problem with this model, the concurrent writes.
Concurrent writes can occur when there are multiple clients sending write operations to the replicas but, these operations arrive at different replicas in different orders (for example due to different network distances). To assure that all replicas execute the writes in the same order, a special intra-replica protocol is required to order the writes.
I will not describe the intra-replica protocol on these posts. If you have a concurrent write problem, I suggest that you either contact me, or the company for which I work.

In this post we explained the key differences between active and passive replication models. On the next one, we will evolve the fault model to detect unauthorized data changes.

Friday 12 February 2016

Breach this!


We ended last the last post with a promise to come back to the subject of data breach and how to protect your data even if it gets leaked. For the sake of simplicity, we will not enter on how to protect your system from being breached since, that would not be possible due to the diversity of systems. In this post, we will show how you can protect your data in case it gets leaked.
Since there is a lot of data that may need to be protected, we will just focus on user password and credit card numbers.


User Passwords

When storing a user password, it goes without saying that, you should not store the password in clear text. What you should do is apply a good one way function (for example SHA256) to the password and store the resulting hash value.
Due to the characteristics of one way functions (bit.ly/1T3Aaai), when you apply them to the same content, you always get the same result. This means that if you store the hashed password, when the user authenticates, you will be able to apply the one way function to the password and compare the result with the one that you have stored.
With this, you are now able to store the hashed passwords.  Although hashed passwords may seem enough, they are not! What happens if you have two users with the same password? Both will yield the same hash value. To avoid this, you should introduce entropy on the hash.
Introducing entropy on the hashed values can be done using a “salt”. These, are big random generated values that can be prepend to the password before applying  the one way function. After doing so, you can then store the hashed password and the salt on the database.
Store password
To authenticate the user, you can retrieve both the hash and the salt, prepend the salt to the password, apply the one way function and compare both hashes. If they are equal, the password is correct.

By employing the method described above, even if you password database gets leaked for some reason, it will not be possible to obtain the passwords. This is due to the fact that by introducing both the one way function and the salt, traditional attacks won’t work (bit.ly/1b6eFMZ).

Credit Card Numbers

If you are thinking of storing credit card numbers, DON’T! If you have a service where clients are required to pay for goods (or something similar) you should rely on Payment Service Providers (bit.ly/1PzICJE). These, are PCI DSS (bit.ly/1ONAycB) compliant meaning that, they are qualified to store credit card numbers and use a tokenization (bit.ly/1ooxGYC) process.
The tokenization process is a way of generating a random token (may contain 4 numbers from the credit card) that links the clients credit card numbers to the merchants bank account.  This means that a token can only be used to transfer money from a the client to the merchant. Tokens may be stored for recurring payments and, since they may contain at most 4 digits from the credit card number, you may show them to the client as a means of differentiating credit cards.

If you really have the need to store credit card numbers, you should encrypt them before storing. DO NOT, at any time, store the CVV since, it’s only required to verify that the card holder has the card.

When encrypting credit card numbers for storage, you should use a very strong key and a robust encryption algorithm like AES-256 (bit.ly/1MTTbKu). However, you are now presented with the problem of storing the key. The answer to this is simple, DO NOT STORE THE KEY. Instead, use the users password and a generated salt as a seed for a deterministic key generator algorithm. After encrypted, you can store the number. If you require, you can also store four number of the card so that, when presenting cards to a user, they can identify which are which.

Store credit card numbers flow

When you are required to retrieve the credit card number for any reason, just ask the user to reauthenticate over a secure channel get the salt from the database and recalculate the key from the password.

Employing either of the described methods will protect the credit cards numbers. When using tokenization, the token links the credit card number to the merchant. This means that, even if it gets used it will transfer money from the client to the merchant. Upon detection, the merchant can cancel the tokens and return the funds to the clients.
By encrypting the credit card numbers as described, since the key is never stored, it will not be possible to decrypt the credit card numbers in case of data leakage.

In light of what you read along this post, how do you feel about your stored information?

Tuesday 29 December 2015

2015, the year in a Breach

2015 is at an end and, as such, we look back and see what has happened.
This year was without a doubt marked by data breaches. Over the year, millions and millions of customer and employee records were stolen.
So lets look just some of the data breaches in chronological order.


Ferburary – Anthem (http://bit.ly/1Z4SXVe)
The year started by with the insurance company being hacked. In it, information capable of identifying 78.8 million customers was stolen. The data was stolen during an attack on January. The attack remained undiscovered for a month.

May – US IRS (http://cnnmon.ie/1NU7k6N)
A hack that exploited a tool intended to make life easier for the american people, has led to a major data breach. In May, it was reported that the tax information of 110.000 americans was stollen. Latter, in November this number was revised to 330.000 records.

June – LastPass (http://bit.ly/1R1pPrc)
The cloud based password management company was hacked in June. Although they use very strong encryption algorithms, after the hack, the company advised every user  to change their master password. In this case, if users were using a weak password, it is possible that the atackers recovered you password. There are no official records of how many master passwords were stollen.

June – US Office of Personnal Management (http://bit.ly/1mgqeOv)
In April, OPM noticed a data breach on their systems. Uppon depper investigation, it was concluded that the hack had started in April 2014 ( maybe even earlier). In this hack, the personal information of 21.5 million federal employees (including previous or retired personnal) was lea.

July – Hacking Team (http://bit.ly/1KLhbgo)
In July, the hackers got hacked. The Italian survilance and hacking company saw its network breach and 400 gigabytes of information was exposed. Among it, there where several zero day exploits used by the company.

July – UCLA Health (http://cnnmon.ie/1GruuME)
UCLA Health has announced that they have suffered a major data breach in September 2014. Uppon latter investigations, on May they realised that the attackers had gainned access to computers that stored sensitive recrods.
This breach has exposed the medical and personal information of 4.5 million people.

July – CVS Pharmacy photo service (http://cnnmon.ie/1MF2nwP)
CVS Pharmacy had a service where a user could upload a picture into their servers and then pick up a printed copy at a pharmacy, the CVSphoto. On July 10th, the company shutdown the service due to a massive data breach.
The breach revealed the credit card information plus personal information of customers. The number of exposed records is not know, but it is believed that it surpassed the million mark.

August – Carphone Warehouse (http://bit.ly/1IxPcza)
The phone retailer saw as much as 2.4 million of their customer data exposed on August the 5th. The exposure included bank and personal information.
This is considered the biggest hack done in the UK. For a term of comparison, it exposed the information of 4% of its population.

August - Ashley Madison (http://bit.ly/1M9qPJN)
The popular cheaters site was hacked in August. Due to the site policy of not deleting customer records, the attack exposed names, addresses, email accounts, hashed password of 37 million customers.

September - Excellus BlueCross BlueShield (http://bit.ly/1UJOpP7)
In September, Excellus announced that it had discovered a 2 year old hack that had stolen the information of 10.5 million customers. The stollen information contained personal and payment (including credit card information) records.

October – Scottrade (http://cnnmon.ie/1QXZ9Mv)
The stock trading service revealed that they had suffured a data breach 2 years prior. The breach exposed the email accounts and social security numbers of 4.5 million users.

October – Experian (http://bit.ly/1FJnpOs)
The credit company was breach and as much as 15 million (from T-Mobile) customer records were stolen. Among the stolen information, it was possible to find the clients personal information. One interesting thing is that although the social security numbers were encrypted, the company said that the encryption may have been compromised.

October – Patreon (http://bit.ly/1YSE2gH)
15gb of data was stolen from the crowdfunding company. Among the stolen information, it was possible to find user information like their donation, names and more. The hack exploited the debug mode of the site thus, revealing the stolen information.

October - Trump Hotel Collection (http://bit.ly/1OXdCCx)
THC has reported that they have detected malware activity on their network. The activity started in May 2014 and lasted until July 2015 (yes, more than a year). During the campaign, the malware accessed the credit card information of the hotel clients. THC reports that the malware accessed the information in “real-time” when the clients were inserting them on the terminals. The number of stolen credit cards is unclear.

November – FBI portal (http://zd.net/1SDCFyb)
A group of hackers breached a system that i used to share sensitive information between law enforcement agencies on the US. During the attack, the group of hackers was able to steal the arrest records, access the policy chat and file transferring services. There is no account of how much information was stolen.

November - Securus Technologies (http://bit.ly/1MnZNMS)
The company that provides US prison phone lines may now be facing charges themselves. The source of this charges is a data breach that revealed over 70 million phone call records on their servers. Since these records come from prison lines, they may violate the client attorney confidentiality.

November – VTech (http://bit.ly/1Tkpzpn)
The breach suffered by the toy manufacturer has exposed the personal information of 4.9 million parents and 6.4 million children. In the stolen data it is possible to find credit card numbers, personal information, login credentials, password retrieve questions and photos of the children.

December – MacKeeper (http://bit.ly/1OyYRpi)
A bad database configuration that exposed it to the internet without any authentication has led to the exposure of 13 million personal records of the anti-virus users. Although the passwords are hashed, according to the source of the discovery, they are using the deprecated MD5 algorithm which can be broken in useful time.

December – Voters personal information (http://bit.ly/1PsxSi9)
Well, the year started out big, it had to end even bigger. On December 20th, a misconfiguration on the US voters database exposed the personal information (including full name, home addresses, voter ID, etc ) of 191 million voters.
The shocking part about this breach is that, no one is taking responsibility for ownership of the compromised database, meaning that the information may remain exposed until the time of writing.

So how can you minimize further exposures on your systems? Well, the correct answer to that is that, you need to introduce security at every step of the software life cycle. From the requirements to production and, even at the end of life of the software.

Another concern for the future is that, although these exposures represent a very real danger, there will be an even bigger one, the changing of data. According to the American’s intelligence chiefs, the next wave of attacks will not steal information, they will change it (http://bit.ly/1VS8Pb8).

Make sure to check the blog in the coming month as we will add a set of posts about securing you information to prevent major problems if data gets leaked and another set on how to detect and recover from data being changed on your systems.

Tuesday 15 September 2015

Email - From, To Who? (part 3)

We ended the last post stating that although the content of a message is encrypted, the headers are still in clear text and, as such, it is possible to say that a user A has sent an email message to user B. To break this connection, we can separate the sender and the receiver fields and use a mix network to hide both email trails.
When we send an email, for the protocol itself, the only thing that really matters is the receiver address. With it, the email system is able to identify the MTA that handles the receiver email messages and deliver it. In the end, the sender address is only required for the receiver to know who sent the email (actually it may be used for things like spam filters and so on).
Now, imagine that the SMTP relays act like nodes of a mix network (we are not going to enter in details of how mixed networks work, for that, we point the reader to the Wikipedia post at http://bit.ly/1XOI9db). To be able to send an email message through a mix network, we would have to create an envelope for the message that is capable of identifying the path to transverse the mix network.
The email message itself would also have to be a little different, being composed by:
  • A normal email header that only identifies the receiver
  • An encrypted body (PGP or S/MIME) that is the actual email message and contains:
    • Header information (sender and receiver)
    • Body of the message
This message is then enveloped like any other mix network packet. The envelope would look something like this:
Km(R1, Kb(R0, M),B)
Where:
  • Km: the key for the mix node;
  • R1: random value;
  • Kb: the next hop key;
  • R0: random value;
  • M: email message;
  • B: the next hop address.
Note that this is just an example; it is possible to create envelopes that contain several nodes of the mix network.
The sender then would send the enveloped message to the first node in the mix (the entry node). It would then transverse the network until it reached the exit node. Once there, M would be delivered to the receiver SMTP server.
Email relay through mix networks
M, is then treated like a normal email message; being delivered to the receiver email queue. When the receiver retrieves the message and decrypts it using is private key, it obtains a complete email message containing the sender address.
So how does this evade the metadata problem? Well, if someone is observing the communication between A and B, it would only see that the MTA of A as sent a message to an entry node, without a receiver address. In the same way, it would only be possible to say that the SMTP server of B received a message from an exit node and, since the full header is encrypted and only the receiver can decrypt it, it would only be possible to prove that A sent a message to B if the private key of B is compromised.
In this three part post we have addressed the main issue that exist in the current email protocols. There have been several tries during the last years to create email services that offer end-to-end encryption and more secure routing protocols (the last one is ProtonMail: http://bit.ly/1gXt15N ), however for some reason; none of them was able to endure.
As email use raises so will the interest in monitor more and more email messages. So how do you feel about your email privacy?