|Scrambling Data||KEVIN RESCHENBERG|
Do you ever have a need to hide personal information from users
of your PeopleSoft database? This would be the case, for example,
if certain developers should not have access to salary or other
personal information, or if you want to clone a database for
training purposes. As another example, I once wrote a program
to "scramble" the data in a database for PeopleSoft's internal HR department,
so that it could be used for a large-scale user test. How is this
Many organizations today recognize
the risk of giving too much access to personal information.
However, this must be balanced with the need to customize, support and test
the system. The overall direction you take may be dictated by an internal
audit department or by the IT, Legal, or HR department's policy.
Work closely with them to define the scope of the project.
If you decide to "scramble" sensitive data, you must
carefully determine which types of data to change. Scramble too little,
and you have accomplished nothing. Scramble too much, and the system
could be nearly unusable for any support and testing.
Also, an HRMS/HCM database is more appropriate for this than, say, a
First, determine the needs of your developers and users. A developer
who supports payroll processing, for example, needs full
access to real salary and pay period data. At the other end of the scale, a temporary
consultant working on a training module project may not need any access
to salary at all. Can this requirement be met through the normal
If you decide to continue, the next step is to determine the
fields to be scrambled. Take the social security number (national ID),
for example. There is rarely a need to know an actual SSN during
development and initial testing. We could simply update the PERS_NID
table, setting all SSNs to the same thing. However, this can interfere
with testing of interfaces that use SSN as a "key". It may be better
to set the SSN to a unique value that includes the EMPLID.
Salary-related fields are difficult to scramble, especially if you are
running payroll or Payroll Interface. You could simply set each
employee's annual salary to some arbitrary number, such as $50,000.
However, the system also stores monthly and hourly pay; pay components;
earnings, deduction and tax balances; and many other amouonts that should
combine in a reasonable way. I've come to the conclusion that it's
just about impossible to scramble salary figures meaningfully, so I
generally just set everything to $1. It is then obvious to any user
that these are scrambled numbers and the user should not expect them
to "tie" to each other.
Note that there is nothing to prevent a user from "hiring" a new
employee and giving that employee a reasonable salary for more realistic
When scrambling salary, don't forget about related fields such
as salary grade. Knowing someone's grade is often about the same as
knowing the salary itself.
An alternative method is to hide the identity of an employee
rather than other personal information about the employee. To do this,
you could leave the salary information alone and instead change
the EMPLID, name, address, job title, and other identifying information. This is
what I did for PeopleSoft.
Of course, if you decide to use this method, you must change every
instance of EMPLID throughout the system. Otherwise, tables won't
join accurately. Remember also that an employee ID field is not
always called "EMPLID". You could find employee IDs in other fields—SUPERVISOR_ID,
for example. How to find all instances of a particular field
is another topic that I'll deal with here soon.
Once you have determined your approach and the fields to be changed,
write a combination of programs and/or SQL scripts to accomplish the
scrambling. Keep these in a safe place, since you will need to rerun
them after every database refresh.
I believe that we should do everything we can to protect our data
and to minimize unnecessary access, but scrambling data effectively
is a significant effort, so plan carefully.