{"id":6156,"date":"2023-09-04T12:00:00","date_gmt":"2023-09-04T12:00:00","guid":{"rendered":"https:\/\/businessyield.com\/tech\/?p=6156"},"modified":"2023-09-03T18:34:51","modified_gmt":"2023-09-03T18:34:51","slug":"data-profiling-what-it-is-tools-best-practices","status":"publish","type":"post","link":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/","title":{"rendered":"Data Profiling: What It Is, Tools &amp; Best Practices","gt_translate_keys":[{"key":"rendered","format":"text"}]},"content":{"rendered":"\n<p><span style=\", Arial, sans-serif;font-size: 16px\">Data profiling, or data archeology, is the process of reviewing and cleansing data to better understand its structure and maintain data quality standards within an organization.<\/span>&nbsp;It is the process of examining, analyzing, and creating useful summaries of data. <\/p>\n\n\n\n<p>The process of data mining yields a high-level overview that aids in the discovery of&nbsp;data quality&nbsp;issues, risks, and overall trends. Data profiling produces critical insights into data that companies can then leverage to their advantage.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-basics-of-data-profiling\"><span id=\"basics-of-data-profiling\"><strong>Basics of Data Profiling<\/strong><\/span><\/h2>\n\n\n\n<p>Data profiling is the process of reviewing source data, and understanding structure, content, and interrelationships. It also identifies the potential for data projects.&nbsp;<\/p>\n\n\n\n<p>Data profiling evaluates data based on factors such as accuracy, consistency, and timeliness to show if the data lacks consistency or accuracy or has null values. A result could be something as simple as statistics, such as numbers or values in the form of a column, depending on the data set. <\/p>\n\n\n\n<p>Data profiling is a crucial part of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data warehouse and business intelligence (DW\/BI) projects<\/strong>: Data profiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.<\/li>\n\n\n\n<li><strong>Data conversion and migration projects<\/strong>: Data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target. It can also uncover new requirements for the target system.<\/li>\n\n\n\n<li><strong>Source system data quality projects<\/strong>: Data profiling can highlight data suffering from serious or numerous quality issues. It can also highlight the source of the issues, e.g. user inputs, errors in interfaces, data corruption).<\/li>\n<\/ul>\n\n\n\n<p>Specifically, data profiling sifts through data to determine its legitimacy and quality. Analytical algorithms detect dataset characteristics such as mean, minimum, maximum, percentile, and frequency to examine data in minute detail. It then performs analyses to uncover metadata, including frequency distributions, key relationships, foreign key candidates, and functional dependencies. <\/p>\n\n\n\n<p>Finally, it uses all of this information to expose how those factors align with your business\u2019s standards and goals.<\/p>\n\n\n\n<p>Data profiling can eliminate costly errors that are common in customer databases. These errors include null values (unknown or missing values), and values that should not be included. This also includes values with unusually high or low frequency, values that don\u2019t follow expected patterns, and values outside the normal range.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"types-of-data-profiling\"><strong>Types of data profiling<\/strong><\/h2>\n\n\n\n<p>There are three main types of data profiling:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"content-discovery\"><strong>Content discovery<\/strong><\/h3>\n\n\n\n<p>This looks into individual data records to discover errors. Content discovery identifies which specific rows in a table contain problems, and which systemic issues occur in the data (for example, phone numbers with no area code).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"relationship-discovery\"><strong>Relationship discovery<\/strong><\/h3>\n\n\n\n<p>This discovers how parts of the data are interrelated. For example, the key relationships between database tables, and the references between cells or tables in a spreadsheet. Understanding relationships is crucial to reusing data; related data sources should be united into one or imported in a way that preserves important relationships.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"structure-discovery\"><strong>Structure discovery<\/strong><\/h3>\n\n\n\n<p>Validating that data is consistent and formatted correctly, and performing mathematical checks on the data (e.g. sum, minimum or maximum). Structure discovery helps understand how well data is structured\u2014for example, what percentage of phone numbers do not have the correct number of digits.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"data-profiling-stepsan-efficient-process-for-data-profiling\"><span id=\"data-profiling-steps\"><strong>Data profiling steps<\/strong><\/span><\/h2>\n\n\n\n<p>Ralph Kimball, a data warehouse architecture expert, suggests a four-step process for data profiling:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Use data profiling at the project start to discover if data is suitable for analysis. Also, make a \u201cgo\/no go\u201d decision on the project.<\/li>\n\n\n\n<li>Identify and correct data quality issues in source data, even before starting to move it into the target database.<\/li>\n\n\n\n<li>Identify data quality issues that can be corrected by Extract-Transform-Load (ETL), while data is moved from source to target. Data profiling can uncover if additional manual processing is needed.<\/li>\n\n\n\n<li>Identify unanticipated business rules, hierarchical structures, and foreign key\/private key relationships. Use them to fine-tune the ETL process.<\/li>\n<\/ol>\n\n\n\n<h2 id=\"benefits-of-data-profiling\" class=\"wp-block-heading\"><strong>Benefits of data profiling<\/strong><\/h2>\n\n\n\n<p>Bad data&nbsp;<a href=\"https:\/\/www.entrepreneur.com\/article\/332238\" target=\"_blank\" rel=\"noreferrer noopener\">can cost businesses 30% or more of their revenue<\/a>. For most companies, that means millions of dollars wasted, strategies recalculated, and tarnished reputations. And often, the culprit is oversight. <\/p>\n\n\n\n<p>Companies can become so busy collecting data and managing operations that they compromise on the efficacy and quality of data. That could mean lost productivity, missed sales opportunities, and missed chances to improve the bottom line. That is where a data profiling tool comes in.<\/p>\n\n\n\n<p>Once a data profiling application is engaged, it continually analyzes,&nbsp;cleans, and updates data in order to provide critical insights that are available right from your laptop. Specifically, data profiling provides:<\/p>\n\n\n\n<h3 id=\"better-data-quality-and-credibility\" class=\"wp-block-heading\"><strong>Better data quality and credibility<\/strong><\/h3>\n\n\n\n<p>Once data has been analyzed, the application can help eliminate duplications or anomalies. It can determine useful information that could affect business choices, identify quality problems that exist within an organization\u2019s system, and be used to draw certain conclusions about the future health of a company.<\/p>\n\n\n\n<h3 id=\"organized-sorting\" class=\"wp-block-heading\"><strong>Organized sorting<\/strong><\/h3>\n\n\n\n<p>Most databases interact with a diverse set of data that could include blogs, social media, and other big data markets. Profiling can trace back to the original data source and ensure proper encryption for safety. A data profiler can then analyze those different databases, source applications, or tables, and ensure that the data meets standard statistical measures and specific business rules.<\/p>\n\n\n\n<p>Understanding the relationship between available data, missing data, and required data helps an organization chart its future strategy and determine long-term goals. Access to a data profiling application can streamline these efforts.<\/p>\n\n\n\n<h3 id=\"predictive-decision-making\" class=\"wp-block-heading\"><strong>Predictive decision making<\/strong><\/h3>\n\n\n\n<p>Profiled information can be used to stop small mistakes from becoming big problems. It can also reveal possible outcomes for new scenarios. Data profiling helps create an accurate snapshot of a company\u2019s health to better inform the decision-making process.<\/p>\n\n\n\n<h3 id=\"proactive-crisis-management\" class=\"wp-block-heading\"><strong>Proactive crisis management<\/strong><\/h3>\n\n\n\n<p>Data profiling can help quickly identify and address problems, often before they arise.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-challenges-of-data-profiling\"><span id=\"challenges-of-data-profiling\"><strong>Challenges of Data Profiling<\/strong><\/span><\/h2>\n\n\n\n<p>Data profiling challenges typically stem from the complexity of the work involved. More specifically, you can expect:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-expensive-and-time-consuming\"><span id=\"expensive-and-time-consuming\"><strong>Expensive and time-consuming<\/strong><\/span><\/h3>\n\n\n\n<p>Data profiling can become very complex when trying to implement a successful program due to the sheer volume of data collected by a typical organization. This can become a very expensive and time-consuming task to hire trained experts to analyze the results and then make decisions without the correct tools.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-inadequate-resources\"><span id=\"inadequate-resources\"><strong>Inadequate resources<\/strong><\/span><\/h3>\n\n\n\n<p>In order to start the data profiling process a company needs its data all in one place, which is often not the case. If the data lives across different departments and there is no trained data professional in place, it can become very difficult to data profile a company as a whole.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-data-profiling-vs-data-mining\"><span id=\"data-profiling-vs-data-mining\"><strong>Data profiling vs. data mining<\/strong><\/span><\/h2>\n\n\n\n<p>While there is overlap with<a href=\"https:\/\/www.ibm.com\/topics\/data-mining\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">&nbsp;data mining<\/a>, data profiling has a different goal in mind. What is the difference?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data profiling helps in the understanding of data and its characteristics, whereas data mining is the process of discovering patterns or trends by analyzing the data.<\/li>\n\n\n\n<li>Data profiling focuses on the collection of metadata and then using methods to analyze it to support&nbsp;data management.<\/li>\n\n\n\n<li>Data profiling, unlikely data mining, produces a summary of the data\u2019s characteristics and enables use of the data.<\/li>\n<\/ul>\n\n\n\n<p>In other words, data profiling is the first of the tools you use to ensure the data is accurate and there are no inaccuracies.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"data-profiling-and-data-quality-analysis-best-practices\"><strong>Data profiling and data quality analysis best practices<\/strong><\/h2>\n\n\n\n<p><strong>Basic data profiling techniques:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Distinct count and percent<\/strong>: Identifies natural keys, the distinct values in each column that can help process inserts and updates. Handy for tables without headers.<\/li>\n\n\n\n<li><strong>Percent of zero\/ blank\/null values<\/strong>: Identifies missing or unknown data. Helps ETL architects set up appropriate default values.<\/li>\n\n\n\n<li><strong>Minimum\/maximum\/average string length<\/strong>: Helps select appropriate data types and sizes in the target database. Enables setting column widths just wide enough for the data, to improve performance.<\/li>\n<\/ul>\n\n\n\n<p><strong>Advanced data profiling techniques:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Key integrity<\/strong>: Ensures keys are always present in the data, using zero\/blank\/null analysis. Also, helps identify orphan keys, which are problematic for ETL and future analysis.<\/li>\n\n\n\n<li><strong>Cardinality<\/strong>: Checks relationships like one-to-one, one-to-many, many-to-many, between related data sets. This helps BI tools perform inner or outer joins correctly.<\/li>\n\n\n\n<li><strong>Pattern and frequency distributions<\/strong>: Check if data fields are formatted correctly, e.g., if emails are in a valid format. Extremely important for data fields used for outbound communications (emails, phone numbers, addresses).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"6-data-profiling-toolsopen-source-and-commercial\"><span id=\"data-profiling-tools-open-source-and-commercial\"><strong>Data profiling tools: Open source and commercial<\/strong><\/span><\/h2>\n\n\n\n<p>Data profiling, a tedious and labor-intensive activity, can be automated with tools, to make huge data projects more feasible. These are essential to your data analytics stack.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"open-source-data-profiling-tools\"><strong>Open-source data profiling tools<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"2-aggregate-profiler-open-source-data-quality-and-profilingkey-features-include\"><span id=\"1-aggregate-profiler-open-source-data-quality-and-profiling\"><strong>1. <a href=\"https:\/\/sourceforge.net\/projects\/dataquality\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Aggregate Profiler<\/a>&nbsp;(Open Source Data Quality and Profiling)<\/strong><\/span><\/h4>\n\n\n\n<p><strong>Key features include:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data profiling, filtering, and governance<\/li>\n\n\n\n<li>Similarity checks<\/li>\n\n\n\n<li>Data enrichment<\/li>\n\n\n\n<li>Real-time alerting for data issues or changes<\/li>\n\n\n\n<li>Basket analysis with bubble chart validation<\/li>\n\n\n\n<li>Single customer view<\/li>\n\n\n\n<li>Dummy data creation<\/li>\n\n\n\n<li>Metadata discovery<\/li>\n\n\n\n<li>Anomaly discovery and data cleansing tool<\/li>\n\n\n\n<li>Hadoop integration<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"1-quadient-datacleanerkey-features-include\"><span id=\"2-quadient-datacleaner\"><strong>2. <a href=\"https:\/\/datacleaner.org\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Quadient DataCleaner<\/a><\/strong><\/span><\/h4>\n\n\n\n<p><strong>Key features include:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data quality, data profiling and data wrangling<\/li>\n\n\n\n<li>Detect and merge duplicates<\/li>\n\n\n\n<li>Boolean analysis<\/li>\n\n\n\n<li>Completeness analysis<\/li>\n\n\n\n<li>Character set distribution<\/li>\n\n\n\n<li>Date gap analysis<\/li>\n\n\n\n<li>Reference data matching<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"3-talend-open-studioa-suite-of-open-source-tools-data-quality-features-include\"><span id=\"3-talend-open-studio-a-suite-of-open-source-tools\"><strong>3.&nbsp;<a href=\"https:\/\/www.talend.com\/products\/talend-open-studio\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Talend Open Studio<\/a> (a suite of open-source tools) <\/strong><\/span><\/h4>\n\n\n\n<p><strong>Data quality features include:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Customizable data assessment<\/li>\n\n\n\n<li>A pattern library<\/li>\n\n\n\n<li>Analytics with graphical charts<\/li>\n\n\n\n<li>Fraud pattern detection<\/li>\n\n\n\n<li>Column set analysis<\/li>\n\n\n\n<li>Advanced matching<\/li>\n\n\n\n<li>Time column correlation<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"commercial-data-profiling-tools\"><strong>Commercial data profiling tools<\/strong><\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"4-data-profiling-in-informaticakey-features-include\"><span id=\"4-data-profiling-in-informatica\"><strong>4.&nbsp;<a href=\"https:\/\/www.informatica.com\/products\/data-quality.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Data Profiling in Informatica<\/a><\/strong><\/span><\/h4>\n\n\n\n<p><strong>Key features include:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data stewardship console which mimics data management workflow<\/li>\n\n\n\n<li>Exception handling interface for business users<\/li>\n\n\n\n<li>Enterprise data governance<\/li>\n\n\n\n<li>Map data quality rules once and deploy on any platform<\/li>\n\n\n\n<li>Data standardization, enrichment, de-duplication and consolidation<\/li>\n\n\n\n<li>Metadata management<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"5-oracle-enterprise-data-qualitykey-features-include\"><span id=\"5-oracle-enterprise-data-quality\"><strong>5.&nbsp;<a href=\"http:\/\/www.oracle.com\/us\/products\/middleware\/data-integration\/enterprise-data-quality\/oracle-enterprise-data-quality-ds-430148.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Oracle Enterprise Data Quality<\/a><\/strong><\/span><\/h4>\n\n\n\n<p><strong>Key features include:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data profiling, auditing and dashboards<\/li>\n\n\n\n<li>Parsing and standardization including constructed fields, misfiled data, poorly structured data and notes fields<\/li>\n\n\n\n<li>Automated match and merge<\/li>\n\n\n\n<li>Case management by human operators<\/li>\n\n\n\n<li>Address verification<\/li>\n\n\n\n<li>Product data verification<\/li>\n\n\n\n<li>Integration with Oracle Master Data Management<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"6-sas-datafluxkey-features-include\"><span id=\"6-sas-dataflux\"><strong>6.&nbsp;<a href=\"https:\/\/www.sas.com\/en_us\/solutions\/data-management.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">SAS DataFlux<\/a><\/strong><\/span><\/h4>\n\n\n\n<p><strong>Key features include:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Extracts, cleanses, transforms, conforms, aggregates, loads and manages data<\/li>\n\n\n\n<li>Supports batch-oriented and real-time Master Data Management<\/li>\n\n\n\n<li>Creates real-time, reusable data integration services<\/li>\n\n\n\n<li>User-friendly semantic reference data layer<\/li>\n\n\n\n<li>Visibility into where data originated and how it was transformed<\/li>\n\n\n\n<li>Optional enrichment components<\/li>\n<\/ul>\n\n\n\n<h2 id=\"data-profiling-in-action\" class=\"wp-block-heading\"><strong>Data profiling in action<\/strong><\/h2>\n\n\n\n<p>With the enormous amount of data available today, companies sometimes get overwhelmed by all the information they\u2019ve collected. As a result, they fail to take full advantage of their data, and its value and usefulness diminish. <\/p>\n\n\n\n<p>Data profiling organizes and manages big data to unlock its full potential and deliver powerful insights. <\/p>\n\n\n\n<h3 id=\"dominos-data-avalanche\" class=\"wp-block-heading\"><strong>Domino\u2019s data avalanche<\/strong><\/h3>\n\n\n\n<p>With almost 14,000 locations, Domino\u2019s was already the largest pizza company in the world by 2015. But when the company launched its&nbsp;<a href=\"https:\/\/www.forbes.com\/sites\/bernardmarr\/2016\/04\/06\/big-data-driven-decision-making-at-dominos-pizza\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">AnyWare ordering system<\/a>, it suddenly faced an avalanche of data. Users could now place orders through virtually any type of device or app, including smartwatches, TVs, car entertainment systems, and social media platforms.<\/p>\n\n\n\n<p>That meant Domino\u2019s had data coming at it from all sides. And, by putting reliable data profiling to work,&nbsp;Domino\u2019s now collects and analyzes data&nbsp;from all of the company\u2019s point of sales systems in order to streamline analysis and improve data quality. <\/p>\n\n\n\n<p>As a result, Domino\u2019s has gained deeper insights into its customer base, enhanced its fraud detection processes, boosted operational efficiency, and increased sales.<\/p>\n\n\n\n<h3 id=\"data-quality-for-customer-loyalty\" class=\"wp-block-heading\"><strong>Data quality for customer loyalty<\/strong><\/h3>\n\n\n\n<p>Office Depot combines an online presence with continued, offline strategies. Integration of data is crucial, combining information from three channels: the offline catalog, the online website, and customer call centers.<\/p>\n\n\n\n<p>Among other things,&nbsp;Office Depot uses data profiling&nbsp;to perform checks and quality control on data before it is entered into the company\u2019s data lake. Integrated online and offline data results in a complete 360-degree view of customers. It also provides high-quality data to back-office functions throughout the company.<\/p>\n\n\n\n<h3 id=\"higher-customer-lifetime-value-with-healthy-data\" class=\"wp-block-heading\"><strong>Higher customer lifetime value with healthy data<\/strong><\/h3>\n\n\n\n<p>Globe Telecom provides connectivity services to more than 94.2 million mobile subscribers and 2 million home broadband customers in the Philippines. Opportunities to expand market share are limited, so it was vital that Globe get a better understanding of its existing customer base so it could&nbsp;grow the lifetime value&nbsp;of each relationship.<\/p>\n\n\n\n<p>To deliver the customer insights the business required, Globe needed data that was healthy and suitable for applications such as data analytics. However, this proved to be a challenge in areas like data scoring, which at that point was manually addressed by using spreadsheets and offline databases to apply validation and data quality rules to existing data.<\/p>\n\n\n\n<p>Today, Globe operates a center of excellence for its data that encompasses data quality, data engineering, and&nbsp;data governance. With healthy data, Globe improved the availability of data quality scores from once a month to every day, increased trusted email addresses by 400%, and achieved higher ROI per marketing campaign. <\/p>\n\n\n\n<p>Metrics include a 30% cost reduction per lead, a 13% improvement in conversion rates, and an 80% increase in click-through rates.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-recommended-articles\"><span id=\"recommended-articles\"><strong>Recommended Articles<\/strong><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/businessyield.com\/tech\/technology\/it-architect\/\">IT ARCHITECT: What Is It &amp; How to Become One<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/technology\/microsoft-hyper-v\/\">MICROSOFT HYPER-V: What Is It &amp; How Do You Use It?<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/how-to\/production-scheduler\/\">Production Scheduler: Job Description, Salary &amp; Becoming One<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/technology\/best-voip-service-for-homes-in-2023\/\">Best VoIP Service For Homes In 2023<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/ecommerce\/best-real-estate-apps-in-2023\/\">Best Real Estate Apps In 2023<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/businessyield.com\/tech\/technology\/what-is-a-wlan-what-is-it-why-do-you-need-it\/\">What Is A WLAN: What Is It &amp; Why Do You Need It?<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-references\"><span id=\"references\"><strong>References<\/strong><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.talend.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Talend<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/panoply.io\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Panoply<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ibm.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">IBM<\/a><\/li>\n<\/ul>\n","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"excerpt":{"rendered":"Data profiling, or data archeology, is the process of reviewing and cleansing data to better understand its structure&hellip;\n","protected":false,"gt_translate_keys":[{"key":"rendered","format":"html"}]},"author":290,"featured_media":6118,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[35],"tags":[218,219,159],"class_list":{"0":"post-6156","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology","8":"tag-data-profiling","9":"tag-data-quality","10":"tag-management"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Data Profiling: What It Is, Tools &amp; Best Practices - Business Yield Technology<\/title>\n<meta name=\"description\" content=\"Data profiling evaluates data based on factors to show if the data lacks consistency or accuracy or has null values.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Profiling: What It Is, Tools &amp; Best Practices - Business Yield Technology\" \/>\n<meta property=\"og:description\" content=\"Data profiling evaluates data based on factors to show if the data lacks consistency or accuracy or has null values.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/\" \/>\n<meta property=\"og:site_name\" content=\"Business Yield Technology\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/Jay.Arnis\" \/>\n<meta property=\"article:published_time\" content=\"2023-09-04T12:00:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jimmy Anisulowo\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/forlahjay\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jimmy Anisulowo\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/\",\"url\":\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/\",\"name\":\"Data Profiling: What It Is, Tools &amp; Best Practices - Business Yield Technology\",\"isPartOf\":{\"@id\":\"https:\/\/businessyield.com\/tech\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1\",\"datePublished\":\"2023-09-04T12:00:00+00:00\",\"author\":{\"@id\":\"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff\"},\"description\":\"Data profiling evaluates data based on factors to show if the data lacks consistency or accuracy or has null values.\",\"breadcrumb\":{\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1\",\"width\":1200,\"height\":630,\"caption\":\"Data Profiling\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/businessyield.com\/tech\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Profiling: What It Is, Tools &amp; Best Practices\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/businessyield.com\/tech\/#website\",\"url\":\"https:\/\/businessyield.com\/tech\/\",\"name\":\"Business Yield Technology\",\"description\":\"Best Tech Reviews, Apps, Phones, &amp; Gaming\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/businessyield.com\/tech\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff\",\"name\":\"Jimmy Anisulowo\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/businessyield.com\/tech\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g\",\"caption\":\"Jimmy Anisulowo\"},\"description\":\"Jimmy generally lives his life by one dogma: steady improvement. This has taken him on a relentless pursuit of knowledge in diverse fields such as business, tech, insurance, health and many others. With a background in content creation and digital marketing plus over ten years of writing and research experience, he implements an expert's view to help his audiences gain valuable insight. He is also an avid reader, gamer, drummer, full-blown metalhead, and all-round fun gi.\",\"sameAs\":[\"https:\/\/www.facebook.com\/Jay.Arnis\",\"https:\/\/www.instagram.com\/forlahjay\/\",\"https:\/\/x.com\/https:\/\/twitter.com\/forlahjay\"],\"url\":\"https:\/\/businessyield.com\/tech\/author\/jimmy\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Profiling: What It Is, Tools &amp; Best Practices - Business Yield Technology","description":"Data profiling evaluates data based on factors to show if the data lacks consistency or accuracy or has null values.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/","og_locale":"en_US","og_type":"article","og_title":"Data Profiling: What It Is, Tools &amp; Best Practices - Business Yield Technology","og_description":"Data profiling evaluates data based on factors to show if the data lacks consistency or accuracy or has null values.","og_url":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/","og_site_name":"Business Yield Technology","article_author":"https:\/\/www.facebook.com\/Jay.Arnis","article_published_time":"2023-09-04T12:00:00+00:00","og_image":[{"width":1200,"height":630,"url":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1","type":"image\/jpeg"}],"author":"Jimmy Anisulowo","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/forlahjay","twitter_misc":{"Written by":"Jimmy Anisulowo","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/","url":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/","name":"Data Profiling: What It Is, Tools &amp; Best Practices - Business Yield Technology","isPartOf":{"@id":"https:\/\/businessyield.com\/tech\/#website"},"primaryImageOfPage":{"@id":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#primaryimage"},"image":{"@id":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1","datePublished":"2023-09-04T12:00:00+00:00","author":{"@id":"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff"},"description":"Data profiling evaluates data based on factors to show if the data lacks consistency or accuracy or has null values.","breadcrumb":{"@id":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#primaryimage","url":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1","contentUrl":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1","width":1200,"height":630,"caption":"Data Profiling"},{"@type":"BreadcrumbList","@id":"https:\/\/businessyield.com\/tech\/technology\/data-profiling-what-it-is-tools-best-practices\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/businessyield.com\/tech\/"},{"@type":"ListItem","position":2,"name":"Data Profiling: What It Is, Tools &amp; Best Practices"}]},{"@type":"WebSite","@id":"https:\/\/businessyield.com\/tech\/#website","url":"https:\/\/businessyield.com\/tech\/","name":"Business Yield Technology","description":"Best Tech Reviews, Apps, Phones, &amp; Gaming","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/businessyield.com\/tech\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/businessyield.com\/tech\/#\/schema\/person\/0f5b3b62b69726a967e6d217a4d242ff","name":"Jimmy Anisulowo","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/businessyield.com\/tech\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b20d2d093f1362590dc5b5f8b8cfb36e53decf98e57d0121be53eb533dc1f2a7?s=96&d=mm&r=g","caption":"Jimmy Anisulowo"},"description":"Jimmy generally lives his life by one dogma: steady improvement. This has taken him on a relentless pursuit of knowledge in diverse fields such as business, tech, insurance, health and many others. With a background in content creation and digital marketing plus over ten years of writing and research experience, he implements an expert's view to help his audiences gain valuable insight. He is also an avid reader, gamer, drummer, full-blown metalhead, and all-round fun gi.","sameAs":["https:\/\/www.facebook.com\/Jay.Arnis","https:\/\/www.instagram.com\/forlahjay\/","https:\/\/x.com\/https:\/\/twitter.com\/forlahjay"],"url":"https:\/\/businessyield.com\/tech\/author\/jimmy\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/businessyield.com\/tech\/wp-content\/uploads\/sites\/2\/2023\/09\/Data-Profiling.jpg?fit=1200%2C630&ssl=1","jetpack_sharing_enabled":true,"gt_translate_keys":[{"key":"link","format":"url"}],"_links":{"self":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts\/6156","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/users\/290"}],"replies":[{"embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/comments?post=6156"}],"version-history":[{"count":2,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts\/6156\/revisions"}],"predecessor-version":[{"id":6158,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/posts\/6156\/revisions\/6158"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/media\/6118"}],"wp:attachment":[{"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/media?parent=6156"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/categories?post=6156"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/businessyield.com\/tech\/wp-json\/wp\/v2\/tags?post=6156"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}