Tuesday, October 27, 2009

What's so Great About a Relational Database?

Take a look at the spreadsheet below. This is for an actual business selling fashion items like shoes and purses. This spreadsheet is every item in this client's inventory. While it might be a little overwhelming at first, it makes sense. There is a column for the item title, inventory number, price, etc. It also has columns (if you scroll way to the right) like classification, supplier, warehouse, ad template name, posting name, etc. that has a lot of repeated data.





Let's go over the problems with this method of keeping a database.
  • There is an incredible amount of data that is repeated. The Classification, Warehouse, Supplier, Ad Template Name, Posting Template Name, and many other fields all have data that is repeated over and over again. This is bad for the following reasons:
    1. Repeated information is hard to change easily. What if this client had to change all of the products that said "custom sandals" to "custom summer wear?" She would have to go through a very messy find-and-replace, which would be much harder than it sounds, considering this particular spreadsheet contains over 2000 rows (which I had to cut down to upload). This problem has occurred, and because it is an inconvenience to change everything in the spreadsheet, the client resorts to changing all of the items "from here on." Hopefully we can all see why this is a bad idea. We then have to write programs in order to interpret whether the cell says one thing OR the other, in every situation interpretation occurs. Obviously not a fun time. (This can be changed with a relational database because you only have to change the text in one table, because all of the occurrences in the large table would be just numbers referring to whatever value is in the other table. Change the value in that one place, it changes throughout.)
    2. Mistakes are easily made when data input is all text rather than, say, a dropdown. While a dropdown can be created when a program has values to choose from (like a table of values), if it's just one big spreadsheet all input has to be straight text. This is very conducive to errors. Considering this spreadsheet is on her website and is subject to searching, this causes many problems. Let's say she accidentally puts "Custom Booots" instead of "Custom Boots." That means when someone searches for "boots" that won't come up. Also considering that copy and paste is commonly used in data entry, this could make for all of the entries in a session to be left out of a very important search. (This doesn't occur in a relational database because rather than everything being text, you have the options that appear in the table that connects with the integers in your main table.)
  • There is very little focus on data type with spreadsheets. For example, if you scroll over to column AM (Attribute13Value), you will see the apparent height of the shoe. Let's say I'm a female that is looking for some shoes, but I absolutely cannot do shoes over 2". I can search for shoes that are 0 to 1/2", I can search for shoes that are 1 3/4" to 2 3/4", but I can't compare values. I can't "sort by shoe height." While in this case it's not that big of a deal considering there are only three categories, comparisons can't be made when all of the data type is text. Data types are simply the classification of what type of data appears in the cell. If all of the information is of the data type text, which is often the case with spreadsheets, when you "order by" a column it puts the numbers in alphabetical order. So, essentially, higher-level queries (a query being essentially the way a DBMS searches a database for information) are much more difficult and often impossible. (This isn't the case in Relational Databases because data types are much more easily assigned in databases and you can essentially do any query you want, like I want shoes that are between the height of 2" and 3.5" and are between the sizes of 8.5 and 9.5, which would never happen if you are just dealing with text.)
  • While creating a database can initially make the size much larger, once you get to databases of this size you would save a lot more space with a database rather than a spreadsheet type of database which uses a lot of overhead. Overhead is when you basically say "I don't want to assign the data type of this piece of data, so basically just make sure it has enough room to be whatever it wants." This isn't a good idea. When you know that you are just going to be putting shoe size in a column, there is no need to leave it the data type of "text." This data type takes up a lot of space, because it has to make sure that "just in case the user puts in a lot of data, the room is available," even if the user knows that more than a certain amount will never be used. The increased overhead and ultimately size of the database results in slower search queries, as well as inserting or deleting from the database. (This isn't the case with Relational Databases because typically when you create the database you are very specific with which data type is going to be in that field, and how much space it will need.)
Overall, while it might take a little bit of time to go through the normalization procedure, especially with a project like this, it would be well worth it for the advantages of having multiple tables as a Relational Database rather than just one table in a database, which is essentially a spreadsheet.

(While I only took the time to thoroughly explain Relational Databases, there are other types of databases, including flat, hierarchical, and network databases. You can find out more about these types of databases at techFAQ.com.)

No comments:

Post a Comment