Have you ever wondered how does a company know when to send your birthday card, what item would suit you based on your purchase history or how can they get so much on someone just by inputting their name? Well, the answer is simple, databases! When you do activities you are saved in a so-called database and anything you do will be added under your name, like in a big file.
I will be making an easy step-by-step guide on how to create your own database, for both professional and personal use. The program we will be making this database in is called Microsoft Office Access, there are other programs for database creating but this one is easy to use and understand. Additional skills required: basic English and patience.
Before we start the guide there is an important, if not the most important part of a database, namely database normalization. Database normalization is the process of organizing the columns/attributes and tables/relations of a relational database to minimize data redundancy.
It involves decomposing a database table into several less redundant ones without information loss or inputting the same data multiple times. It is only a matter of defining foreign keys in the old tables linking them to the primary keys in the new ones. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then sent along through the rest of the database using the defined foreign keys.
The first person to think of normalization is Edgar F. Codd. He introduced what we now know as the First normal form (1NF – 1970), the second and third normal form (2NF/3NF – 1971). If the database meets all the normalization forms it is considered “normalized” or “rationalized”. Most 3NF tables are free of insertion, update, and deletion anomalies.
An example of normalization is that an entity’s name is stored in only one table, while the other tables just have its unique ID. It makes it easier to update the name, because instead of searching for each apparition of that particular piece of information you only have to update it in its specific table. A good example is what the RIM Company did, changing their name to BlackBerry. The update was done in one place and the BlackBerry name was immediately replaced in the whole system.
Stacking and manipulating the data in a data structure that has not even been normalized with the 1st NF may involve more complexity than needed. To make it crystal-clear, we will be taking a look at the following example of an association’s provider bills:
Each provider has a correspondent number of bills. The automated evaluation of any query relating to providers’ bills therefore would broadly involve two stages:
- Unpack one or several providers’ invoices allowing individual bills of a group to be examined;
- Derivation query results based on results of the first stage.
Think of it this way, an assignment is to get a sum of all the bills that the said association has conducted in 2015, for an accounting issue, the database will work double-time to unpack the Bills information from each provider, then make a sum of all the Amounts that occurred on the dates between the 1st of January 2015 and the 31st of December.
One of the most impactful concepts in Codd’s thinking was that the structural complexity could be removed with no harm, meaning that the queries could be more flexible, easier to formulate and evaluate. The normalized equivalent for this particular example would look like this:
The first step is to create the database sheet, so open Microsoft Office Access and create a new blank database, name and save it.
It is needed to create some tables so that we can have where to save our precious data and to link the connections between them. To do that, on the top left there are some tabs, click Create and hit Table Design and add the table’s content (the fields must be relevant to each other but not all the data collected will go under one table), a field needs to be checked as Primary Key and set to AutoNumber (Data Type). To do that, press the field for the desired primary key and select the Primary Key button from the upper left. The Primary Key is what we will be using to link the tables. For the sake of the tutorial, we will use a “Provider”, a “Bill” and a “Warehouse” table with all their implicit data.
Note: When creating a Date field, Input the phrase “Now()” in the Default Value in the Field Properties>General (under the Data lists).
Now that the tables are ready, we can link them, go to Database Tools (top left) and hit Relationships, now add the tables that must be linked, in our case all three of them, highlight all of them then hit Add.
Something like this will be shown, then just drag and drop the primary key onto the correspondent one in the other table to link them.
Note: Always check “Enforce Referential Integrity” when linking a primary key to a non-primary one.
In the end, it should look like this, keep an eye out for the links, they have to be 1 and ∞ if a primary/non-primary link is required.
That’s all! The database is created, normalized and finished. This is the core of all the databases out there. The database is ready to start collecting data and keep track of any relevant information. It is basic, but it is a great start for the more complex ones.
- Database Normalization Basics by Mike Chapple (About.com)
- Database Normalization Intro, Part 2