What Are The Problems Encountered While Cleaning Data

High quality of data is a pre-requisite for making valuable business decisions. Yet, most of the time, data quality of a dataset often turns out to be poor owing to inconsistencies, errors, and missing information among other reasons. Data inconsistency occurs due to multiple reasons including manual wrong entry, misspelling, missing data and presence of redundant information in unlike representations. Not correcting this erroneous data tin lead to major problems during subsequent downstream data processing, which leads to incorrect concern decisions, which can be extremely plush for the organization. It is of import for information managers to ensure that data cleansing procedures are in identify. A data entry outsourcing service adept would have systematic information cleansing and scrubbing procedures in place.
Data Cleansing or Scrubbing is the procedure of detecting & removing inconsistencies & errors from information to ameliorate the quality of data. The need for data cleansing increases significantly when multiple data sources are integrated. This process of making data accurate and consistent is riddled with many issues, few of which are mentioned below:
-
High Volume of Data:
Applications such as Data Warehouses load huge amounts of data from a multifariousness of sources continuously and further they conduct pregnant amount of dirty information (data errors). In such case the job of data cleansing becomes both significant and formidable at the same time.
-
Misspellings:
Misspellings occur mostly due to typing error. The wrong spelling can be detected and corrected for common words and grammatical errors, however, as database constrain huge corporeality of data that is unique, it is difficult to detect spelling mistake at input-level. Farther, Spelling mistakes in data such equally names, addresses are always difficult to place and right.
-
Lexical Errors:
Lexical errors occur in data due to name discrepancies between the construction of the data items and the specified format. Example, a particular database records attribute for name, historic period, sex and pinnacle. When an individual does non enter an intermediate value say (age) the information for following attributes changes field. In above case, when individual does not enter value for age, value for sexual practice, say male is read as age and value of height is read as sex.
-
Misfielded Value:
Misfielded value problem occurs when the values entered are right as far format is concerned simply does not belong to the field. Instance in field of urban center, value recorded is Germany.
-
Domain Format Errors:
Domain format errors occur when the value for a particular attribute is correct but do not comply with format of domain. Example, a particular NAME database requires kickoff name and surname to be separated with comma just the input is without comma. In this case while the input may be right but it does not comply with domain format.
-
Irregularities:
Irregularities deal with non-compatible use of units or values. Example while doing entry of salary of employee, the salary is mentioned using different currencies. This kind of data requires subjective interpretation and can often consequence in wrong results.
-
Missing Values:
Missing values occur as a event of omissions that happen while collecting the data. They signify unavailability of values during process of data entry. Both dummy values and cipher values are included in missing values. For example, 000-0000 and 999-9999 in the telephone number field.
-
Contradiction:
Contradiction error occurs when the same real world entity is described by two different values in data. Example in personal database for the aforementioned person there are 2 records with ii unlike date of birth, notwithstanding, other values and entity is same.
-
Duplication:
Duplication problem signifies a state of affairs where the same information is represented multiple times on account of some data entry error. For example, at that place tin exist two records of same person with everything same simply minor difference in proper name with no use of eye name in one of the entry. No data is wrong but the person gets represented twice on account of failure to check duplicity.
-
Integrity Constraint Violations or Illegal Values:
Integrity constraint violations depict values that practice not satisfy integrity value constraints. It occurs when input value is outside limits of values allowed for representing a particular attribute.
-
Cryptic Values & Abbreviations:
These include use of ambiguous values and abbreviations in fields. Example instead of full mention of college name using merely initials. This kind of errors increment chances of duplication and reduce sorting ability.
-
Violated Attribute Dependencies:
These errors when value for a secondary does not match the chief aspect. Case when the listed city does non lie within the land or when postal nada-code does not coincide with the mentioned metropolis.
-
Wrong References:
Errors related to wrong result inhibit information validation and result in data mismatch. For example in department field if an individual enters wrong value of reference department. The subsequent process of information validation results in mismatch.
-
Embedded Values:
This type of error occurs when multiple values are entered in the same field. This practice seriously restricts the power of data indexing and sorting abilities. As an example where the values for name, historic period and sex are entered in the proper name field itself.
Data cleansing is an integral function of information management. Information technology is necessary to make information authentic, consistent and to avert duplication of information. The commodity highlights the common problems i encounters while doing data cleansing and aims to serve as guideline for data quality improvement & data cleansing process. Each of the higher up issues can be easily avoided if proper procedures are followed during the design and execution job of cleansing. Outsourcing the data scrubbing task to an expert in providing data cleansing outsourcing services tin considerably speed upwardly the chore and ensure that your information gets and remains clean.
Also Read Related Manufactures:
Source: https://www.invensis.net/blog/14-key-data-cleansing-pitfalls/
Posted by: huntandess.blogspot.com
0 Response to "What Are The Problems Encountered While Cleaning Data"
Post a Comment