Data Preprocessing

Discussion 1:In today’s world, data is being generated from various sources and in various formats; as the internet utilization is drastically increasing from different devices like sensors, cc cameras, laptops, workstations, tablets and iPad’s; the data available from internet is in unstructured formats and available in the form of text files, pdf files, images, videos, tweets and other formats (García, Luengo & Herrera, 2015). The collected is not normalized, clean, availability of incomplete data, de-normalized and unprocessed data. Using direct raw or unprocessed data produced false results and it is not useful for analytics.To process the data and used for the analytics, the quality of data is based on the three factors like accuracy, completeness, and consistency. Initially the data need to be accurate where the inaccuracy causes by human enters random data or chance of entering error data so incorrect and duplication of data causes inaccuracy in data processing. The other factor make sure is completeness where the incomplete data caused by data unavailability, and deleting consistent data. The third factor is consistency, to process the data in order to produce the analytical results maintaining the consistent data is one of the key factors.To perform various analysis where using processed data helps in generating various graphs and tables in decision making. The four stages that include preprocessing the data are data cleaning, data integration, data reduction and data transformation (Kamiran, & Calders, 2012). The first stage data cleaning involves identifying the missing values and eliminating noisy data. In order to remove noisy data different techniques used are binning, regression and outlier analysis. The second stage is data integration- data is being collected from various sources it is necessary to integrate the data to identify the related or correlated data. The third stage is data reduction- using different techniques data reduction helps in eliminating the duplicate data and reduces large volumes of data. Final stage is data transformation- data transformation helps in forming appropriate data in performing various algorithms and analytic techniques.ReferencesGarcía, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining (pp. 195-243). Cham, Switzerland: Springer International Publishing.Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1), 1-33.Discussion 2:Why are the original/raw data not readily usable by analytics tasks?Raw data is usually dirty, inaccurate and misaligned. This means that it cannot be utilized in its raw format (Sharda et al., 2020). Moreover, raw data can be unstructured and overly complicated. This means that data analytics have to be performed to transform raw data into refined data (Sharda et al., 2020). Therefore, data analytics is a critical approach to transform raw data into refined data.What are the main data preprocessing steps?The process starts with data consolidation, which collects, selects and integrates data. It may involve filtering any unnecessary data before its adequately utilized. The next step data cleaning, which ensures that errors are removed from the data (Sharda et al., 2020). Moreover, in this step, data is usually imputed and eliminates any duplication of data. The third step, data transformation, involves standardization, where data is placed in a range between the smallest and largest data. Nevertheless, discretion involves the categorization of data into different classifications (Alasadi & Bhaya, 2017). In data transformation, there is the creation of different attributes of data. The last step in data preprocessing is data reduction, which ensures reduced dimension, reduced volume and balanced data (Alasadi & Bhaya, 2017). The last step ensures that there is no too much data, which may be challenging to handle.List and explain their importance in analytics.Data consolidation, the first step, is essential because it allows for data collection, selection and integration. In this step, all the unnecessary data is usually eliminated to ensure that only appropriate data is available (Losarwar, V., & Joshi, 2012). In data cleaning, data scrubbing is vital because it ensures that all the data with errors is removed. Moreover, the step ensures that there is a reduction in duplication, removing data redundancy. Data transformation enables easier categorization of data (Alasadi & Bhaya, 2017). This is important because when data is organized into categories, it can efficiently be utilized, which would be impossible when data is unstructured (Sharda et al., 2020). Data reduction enables data balancing to ensure that some of the data is not over or under-sampled. Therefore, the process of preprocessing is necessary for data analytics.

RECOMMENDED: [SOLVED] Data Preprocessing

Don't use plagiarized sources. Get Your Custom Essay on
Data Preprocessing
Get a 15% discount on this Paper
Order Essay

homeworkhelp

Quality Guaranteed

With us, you are either satisfied 100% or you get your money back-No monkey business

Check Prices
Make an order in advance and get the best price
Pages (550 words)
$0.00
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know that being a student these days is hard. Because of this, our prices are some of the lowest on the market.

Instead, we offer perks, discounts, and free services to enhance your experience.
Sign up, place your order, and leave the rest to our professional paper writers in less than 2 minutes.
step 1
Upload assignment instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
s
Get personalized services with My Paper Support
One writer for all your papers
You can select one writer for all your papers. This option enhances the consistency in the quality of your assignments. Select your preferred writer from the list of writers who have handledf your previous assignments
Same paper from different writers
Are you ordering the same assignment for a friend? You can get the same paper from different writers. The goal is to produce 100% unique and original papers
Copy of sources used
Our homework writers will provide you with copies of sources used on your request. Just add the option when plaing your order
What our partners say about us
We appreciate every review and are always looking for ways to grow. See what other students think about our do my paper service.
Other
GREAT
Customer 452813, June 25th, 2022
Nursing
I appreciate all the hard work. Thank you!
Customer 452525, August 13th, 2021
Database design and optimization
communication was great and the work looks perfect.
Customer 452715, February 26th, 2022
Other
Excellent work, delivered ahead of schedule
Customer 452467, January 19th, 2024
Human Resources Management (HRM)
Thanks for the revision. Your support is greatly appreciated.
Customer 452701, August 27th, 2023
Web programming
outstanding!
Customer 452715, September 16th, 2022
Other
NICE
Customer 452813, June 25th, 2022
Marketing
Thank you great job
Customer 452813, July 10th, 2022
Human Resources Management (HRM)
Thank you so much.
Customer 452701, August 15th, 2023
Professions and Applied Sciences
Amazing work!
Customer 452707, May 29th, 2022
IT, Web
Did an excellent job with the body of the paper and staying on the topic.
Customer 452885, October 27th, 2022
Human Resources Management (HRM)
The paper was good but they writing the paper labeling me as a Registered Nurse and I have express this several time so o had to go in make corrections
Customer 452901, April 8th, 2024
Enjoy affordable prices and lifetime discounts
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Order Now Order in Chat

Ensure originality, uphold integrity, and achieve excellence. Get FREE Turnitin AI Reports with every order.