{"id":1729,"date":"2023-10-16T07:27:58","date_gmt":"2023-10-16T07:27:58","guid":{"rendered":"https:\/\/matob.web.id\/en\/?p=1729"},"modified":"2023-10-16T07:27:58","modified_gmt":"2023-10-16T07:27:58","slug":"what-is-normalization-of-data-in-database","status":"publish","type":"post","link":"https:\/\/matob.web.id\/en\/what-is-normalization-of-data-in-database\/","title":{"rendered":"What is Normalization of Data in Database?"},"content":{"rendered":"<p>Data transformation is one of the basic steps in the <a href=\"https:\/\/matob.web.id\/en\/a-deep-dive-into-data-processing-cycles-and-types\/\">data preprocessing<\/a> section. When we first learn feature scaling techniques, we will deal with the terms scale, standardization, and normalization a lot.<\/p>\n<p>Normalization is one of the most frequently used data preparation techniques. In<a href=\"https:\/\/matob.web.id\/en\/10-machine-learning-projects-to-get-you-started-in-2022\/\"> machine learning<\/a> and data mining, this process helps us convert the values of <a href=\"https:\/\/matob.web.id\/en\/fundamentals-of-confidence-interval-in-statistics\/\">numeric fields<\/a> in a dataset to use a common scale.<\/p>\n<p>If you have ever dealt with databases, you are probably familiar with the term &#8220;data normalization.&#8221; One of the challenges in databases is having attributes with different units, ranges, and scales.<\/p>\n<p>Applying data mining or machine learning algorithms to data with drastic ranges can produce inaccurate results. Because of that, a data normalization process is needed.<\/p>\n<h2><strong>Normalization of Data<\/strong><\/h2>\n<p>Normalization is a logical design technique in a database that groups attributes from various entities. It aims to form a good relationship structure (without data redundancy\/repetition), and most of the ambiguity can be eliminated.<\/p>\n<p>In a nutshell, Database Normalization is the process of grouping data attributes. That form simple, non-redundant, flexible, and adaptable entities so that it can be ensured that the databases created are of good quality.<\/p>\n<p>Database normalization consists of many forms. In database science, at least 9 forms of normalization exist, namely 1NF, 2NF, 3NF, EKNF, BCNF, 4NF, 5NF, DKNF, and 6NF.<\/p>\n<p>When creating an optimal database, 1NF, 2NF, and 3NF databases will often be encountered. To become a Database Administrator (DBA), you must know how to optimize database normalization.<\/p>\n<p>For example, one day, when the website you are creating is experiencing a decrease in performance, you may be asked if the database has been normalized correctly.<\/p>\n<h2><strong>Normalization of Data Stages<\/strong><\/h2>\n<p>Several stages of database normalization need to be carried out so that the results are appropriate and good, namely:<\/p>\n<h4><strong>1. Unnormalized Form (UNF)<\/strong><\/h4>\n<p>UNF is an abnormal form of data due to repeated groups in the data, so it becomes a problem when manipulating data.<\/p>\n<h4><strong>2. First Normal Form (1NF)<\/strong><\/h4>\n<p>1NF is a form of normalization for grouping several similar data to overcome anomaly problems. A data model is said to be in first normal form if each of its attributes has one and only one value.<\/p>\n<p>Suppose there is an attribute that has more than one value. In that case, the attribute is a candidate to become a separate entity.<\/p>\n<h4><strong>3. Second Normal Form (2NF)<\/strong><\/h4>\n<p>2NF is the second form that performs table decomposition to find the primary key of each table. A data model is said to meet the second normal form if it satisfies the first normal form. Every non-identifier attribute of an entity depends entirely on all of the entity&#8217;s identifiers.<\/p>\n<h4><strong>4. Third Normal Form (3NF)<\/strong><\/h4>\n<p>3NF is a form of data normalization where there can be no attributes that depend on other fields.<\/p>\n<p>Not on the primary key, so that attribute must be separated into a new table. A <a href=\"https:\/\/matob.web.id\/en\/data-modeling-mastery-the-top-5-tools-every-data-modeler-should-know\/\">data model<\/a> is said to be in third normal form if it is in second normal form.<\/p>\n<p>None of the non-identifying attributes (not unique identifiers) depend on other non-identifying attributes. If present, split one of the attributes into a new entity, and the attributes that depend on it become the attributes of the new entity.<\/p>\n<h4><strong>5. Code Normal Form (BCNF)<\/strong><\/h4>\n<p>BCNF is a form of normalization that aims to overcome anomalies and over looping that cannot be overcome in 3NF.<\/p>\n<p>Fifth Normal Form (5NF). 5NF is the stage to overcome occurrence of joint dependent. There is a breakdown of the relationship into two.<\/p>\n<h2><strong>Why Need Data Normalization in Data Mining?<\/strong><\/h2>\n<p>Data normalization is usually important when dealing with large datasets to ensure data consistency and quality.<\/p>\n<p>Normalization is generally needed when the dataset&#8217;s attributes have different scales or ranges. For example, there is an imbalance where some data are too high, and some are too low.<\/p>\n<p>If normalization is not carried out, it can cause dilution in data attributes with a lower scale. It&#8217;s because other attributes have values on a larger scale, even though these attributes also have the same degree of importance.<\/p>\n<p>In conclusion, when there are many attributes, but these attributes have values on different scales, it can lead to a bad data model when performing <a href=\"https:\/\/matob.web.id\/en\/data-mining-and-machine-learning-differences\/\">data mining operations.<\/a><\/p>\n<p>So the dataset needs to be normalized to bring all the attributes on the same scale. In addition, data normalization techniques are useful for ensuring data remains consistent.<\/p>\n<h2><strong>Methods of Data Normalization<\/strong><\/h2>\n<p>There are several data normalization methods, but in this article, we will discuss the three most frequently used techniques: Z-score normalization, min-max normalization, and decimal scaling normalization.<\/p>\n<h3><strong>1. Z-score normalization<\/strong><\/h3>\n<p>Z-score normalization, known as standardization, is a technique in which the attribute&#8217;s value will be normalized based on the mean and standard deviation.<\/p>\n<p>The essence of this technique is to transform the data from values to a common scale where the mean is equal to zero, and the standard deviation is one.<\/p>\n<p>Z-score normalization in data mining is useful for analyzing data that requires comparing values with the average value.<\/p>\n<h3><strong>2. Min-max normalization<\/strong><\/h3>\n<p>Which is easier to understand: the difference between 500 and 1000000 or between 0.5 and 1? Data is easier to understand when the minimum and highest values range is smaller.<\/p>\n<p>The min-max normalization method transforms a data set into a scale ranging from 0 (min) to 1 (max).<\/p>\n<p>The original data underwent linear modification in this data normalization procedure.<\/p>\n<h3><strong>3. Decimal Scaling Normalization<\/strong><\/h3>\n<p>In <a href=\"https:\/\/matob.web.id\/en\/what-is-data-mining-a-complete-guide\/\">data mining<\/a>, decimal scaling is another way to normalize. This method works by rounding the decimal number to the nearest decimal point.<\/p>\n<p>This method normalizes the data by shifting the decimal point of the number. The data value, v, is normalized to be v&#8217; using the formula below.<\/p>\n<h2><strong>Importance of Normalization<\/strong><\/h2>\n<p>A database design is bad if:<\/p>\n<ul>\n<li>The same data is stored in several places<br \/>\n(files or records).<\/li>\n<li>Inability to generate information<br \/>\ncertain.<\/li>\n<li>An information loss has occurred.<\/li>\n<li>There is redundancy (repetition) or duplication of data, which wastes storage space and makes it difficult to update data.<\/li>\n<li>A NULL VALUE appears.<\/li>\n<li>Loss of information can occur when designing the database (doing the wrong decomposition process).<\/li>\n<li>The forms of normalization often used are 1st NF, 2nd NF, 3rd NF, and BCNF.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Data transformation is one of the basic steps in the data preprocessing section. When we first learn feature scaling techniques, we will deal with the terms scale, standardization, and normalization a lot. Normalization is one of the most frequently used data preparation techniques. In machine learning and data mining, this process helps us convert the [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":1789,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-1729","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-business"],"_links":{"self":[{"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/posts\/1729","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/comments?post=1729"}],"version-history":[{"count":0,"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/posts\/1729\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/media\/1789"}],"wp:attachment":[{"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/media?parent=1729"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/categories?post=1729"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/matob.web.id\/en\/wp-json\/wp\/v2\/tags?post=1729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}