“If communication is more art than science, then it’s more sculpture than painting. While you’re adding to build your picture in painting, you’re chipping away at sculpting. And when you’re deciding on the insights to use, you’re chipping away everything you have to reveal the core key insights that will best achieve your purpose,” according to Craig Smith, McKinsey & Company’s client communication expert.
The same principle applies in the context of data visualization. Chipping away is important to not overdress data with complicated graphs, special effects, and excess colors. Data presentations with too many elements can confuse and overwhelm the audience.
Keep in mind that data must convey information. Allow data visualization elements to communicate and not to serve as a decoration. The simpler it is, the more accessible and understandable it is. “Less is more” as long as the visuals still convey the intended message.
Finding the parallel processes of exploratory and explanatory data visualization and the practice of sculpting could help improve how data visualization is done. How can chipping away truly add more clarity to data visualization?
Exploratory Visualization: Adding Lumps of Clay
Exploratory visualization is the phase where you are trying to understand the data yourself before deciding what interesting insights it might hold in its depths. You can hunt and polish these insights in the later stage before presenting them to your audience.
In this stage, you might end up creating maybe a hundred charts. You may create some of them to get a better sense of the statistical description of the data: means, medians, maximum and minimum values, and many more.
You can also recognize in exploratory if there are any interesting outliers and experience a few things to test relationships between different values. Out of the 100 hypotheses that you visually analyze to figure your way through the data in your hands, you may end up settling on two of them to work on and present to your audience.
In the parallel world of sculpting, artists do a similar thing. They start with an armature-like raw data in designing. Then, they continue to add up lumps of clay on it in exploratory visualizations.
Artists know for sure that a lot of this clay will end up out of the final sculpture. But they are aware that this accumulation of material is essential because it starts giving them a sense of ideal materialization. Also, adding enough material will ensure that they have plenty to work with when they begin shaping up their work.
In the exploratory stage, approaching data visualization as a form of sculpting may remind us to resist two common and fatal urges:
The urge to rush into the explanatory stage – Heading to the chipping away stage too early will lead to flawed results.
The urge to show all of what has been done in the exploratory stage to the audience, begrudging all the effort that we have put into it – When you feel that urge, remember that you don’t want to show your audience that big lump of clay; you want to show a beautified result.
Explanatory Visualization: Chipping Away the Unnecessary
Explanatory visualization is where you settle on the worth-reporting insights. You start polishing the visualizations to do what they are supposed to do, which is explaining or conveying the meaning at a glance.
The main goal of this stage is to ensure that there are no distractions in your visualization. Also, this stage makes sure that there are no unnecessary lumps of clay that hide the intended meaning or the envisioned shape.
In the explanatory stage, sculptors use various tools. But what they aim for is the same. They first begin furtherly shaping the basic form by taking away large amounts of material. It is to ensure they are on track. Then, they move to finer forming using more precise tools to carve in the shape features and others to add texture. The main question driving this stage for sculptors is, what uncovers the envisioned shape underneath?
In data visualization, you can try taking out each element in your visualization like titles, legends, labels, colors, and so on. Then, ask yourself the same question each time, does the visualization still convey its meaning?
If yes, keep that element out. If not, try to figure out what is missing and think of less distracting alternatives, if any. For example, do you have multiple categories that you need to name? Try using labels attached to data points instead of separate legends.
There are a lot of things that you can always take away to make your visualization less distracting and more oriented towards your goal. But to make the chipping away stage simpler, C there are five main things to consider according to Cole Nussbaumer Knaflic as cited in her well-known book, Storytelling with Data:
De-emphasize the chart title; to not drive more attention than it deserves
Remove chart border and gridlines
Send the x- and y-axis lines and labels to the background (Plus tip from me: Also consider completely taking them out)
Remove the variance in colors between the various data points
Label the data points directly
In the explanatory stage, approaching data visualization as a form of sculpting may remind us of how vital it is to keep chipping away the unnecessary parts to uncover what’s beneath, that what you intend to convey is not perfectly visible until you shape it up.
Overall, approaching data visualization as a form of sculpting may remind us of the true sole purpose of the practice and crystalize design in the best possible form.
Deepen your understanding of processing and designing data with our insightful articles on data visualization.
In 1983, American professor of statistics and computer science Edward Tufte introduced the concept of data-ink ratio in his famous book, “The Visual Display of Quantitative Information.” Data-ink ratio is a pioneering disciplinary theory in data visualization that has been highly influential in recommended practices and teachings, with its excellency reflecting a minimalistic style of design. The art movement of Minimalism is noticeable in Tufte’s theory, even though he did not deliberately mention it. However, academic research shows that data-ink ratio generated mixed responses, resulting in the need for more complex frameworks in data visualization.
What is Data-Ink Ratio?
Tufte’s Data-Ink Ratio is built on his preposition that “data graphics should draw the viewer’s attention to the sense and substance of the data, not to something else.” The conceptual formula is:
“Data ink” means what is absolutely necessary to show the data. The word “ink” is used because the theory was formulated in the age of print graphics dominance. The equivalent of “ink” in the digital world today is “pixels.” Data-ink ratio aims to find and maximize the share of informative elements out of the total elements used in a chart, raising a ratio of 1 as the highest goal. A visualization with such a ratio is the one that only contains elements that show the data, with no decorations or redundancies.
Tufte asserted this goal by articulating two erasing principles: “erase non-data-ink, within reason” and “erase redundant data-ink, within reason.” For these two types of ink, he coined the term “chartjunk.” The excessive elements to get rid of include decorations, background images, unnecessary colors, gridlines, axes, and tick marks. He said that the principles should be applied through editing and redesign. This led to the structuring of the process of data visualization as a cyclical one that ends when the most minimal output is reached.
Minimalism became one of the most important and influential movements during the professor’s young age in the 1960s. Minimalism is defined by Cambridge dictionary as “a style in art, design, and theater that uses the smallest range of materials and colors possible, and only very simple shapes or forms.” It is a kind of reductive abstract art that is characterized by “plain geometric configurations” that lack “any decorative… flourishes.”
That can be juxtaposed with what Tufte instructed under the data-ink ratio theory: to achieve a data visualization that is simple and, therefore, minimal. Tufte’s theory suggests that such simplicity can deliver a message that is clear, precise, and efficient. As a result, it could reduce the time required for the user to perceive the visualization.
One of the many posts surfacing the internet to exemplify applying Tufte’s reductive principles on a bar chart. | Image Source: Darkhorse Analytics
Tufte’s priorities match the needs of the fast-paced business world. In business, saving time means saving cost and therefore maximizing profit. It was in the nineties, less than a decade after Tufte’s theory, when the accumulation of information started to skyrocket with the advancement of the World Wide Web. Businesspeople seemed to be lacking even more time with more information to consider, and a minimalistic approach seemed like a solution.
Minimalism did not only impact data visualization, but its effect also reached almost every corner of the human computer interaction (HCI) field. For example, as the internet became more widespread, search engines started to develop and became so powerful that they began to impose their own rules of the game. Because of the nature of how search engines work, the minimalistic structured design of web pages was more attainable for the engines to read and rank, and therefore more preferable and reachable for the audience.
For similar benefits, like reach in webpages and efficiency in data visualizations, minimalistic design became so invasive in all computer interfaces over the years. Today, minimalistic design is usually described as “modern design” and recommended to every designer building a user-interaction system, from mobile apps to webpages to data visualizations.
On the left: Donald Judd’s minimalist installation, Untitled, 1969 (photo from: Guggenheim Museum, New York) | On the right: Tufte’s minimalist variation of a double-bar chart with error lines. | Collage made by Islam Salahuddin
Reviews of Data-Ink Ratio’s Minimalism
Despite presenting his hypotheses as proven facts, Tufte had never empirically tested his promised achievements that can be reached by minimalistic design. It was not until the beginning of the nineties that academic research started to put the claims under the microscope.
The initial research findings struggled to prove the hypotheses. However, a major multi-experiment research paper in 1994 found that some non-data-ink and redundant data-ink, especially backgrounds and tick marks on the y-axis, may decrease accuracy and increase response time (which is proportionally bad). Meanwhile, the other ink that Tufte considered to be chartjunk and called for its removal whenever possible, like axis lines, was proved to increase performance in some cases. The negative effect was clear in some chartjunk types, like axis lines, but was less certain in others, like three-dimensional charts.
Such experiments were still built on the same proposition of Tufte that graphical excellency means clarity, precision, and efficiency but found that the relationship between data-ink ratio and excellency in that sense can hardly be linear as Tufte suggests. The paper states that “effects of ink are highly conditional on the features of the graph and task” and therefore “simple rules like Tufte’s will not suffice.” Instead of indicating that all non-data-ink and redundant data-ink should be erased, the authors call on data visualization designers to determine whether the use of any ink will facilitate or interfere with reading a graph, depending on its context.
Later research even questioned Tufte’s components of graphical excellency, especially the presumed all-cases importance of response time factor. An empirical paper in 2007 found that users may prefer non-minimalistic visualizations over Tufte’s minimalistic ones, partially because they may find the latter boring. This is a criticism that both Minimalism art and statistics face and a perception that Tufte tried to avert with his rule. Boredom should not be treated as a minor problem because it means less ability to induce attention. A visualization’s ability to generate attention is the gateway to the viewer’s perception in the first place.
Attention is one of the criteria that Tufte’s rule overlooks. Other significant factors are memorability and engagement. More advanced experiments in 2013 and 2015 re-asserted chartjunk as not always harmful. In some cases, it may even increase the memorability and engagement of a visualization. Attributes like color and human recognizable shapes, icons, and images can enhance memorability due to their ability to activate more parts of a viewer’s brain, leveraging its natural tendency towards what is familiar rather than what is just minimal.
Despite their popularity, chartjunk and similar terms also appear to be highly open to interpretation among practitioners. Interpretation can be affected by an individual’s circumstances that include culture, personal style, preferences, and views, as well as constraints of skills, tools, and user’s priorities, according to a discourse analysis that was published in 2022.
On the left: Frank Stella’s minimalist painting, title not known, 1967 (photo from Tate Modern) | On the right: Tufte’s minimalist variation of a multiple vertical box and whisker plot. | Collage made by Islam Salahuddin
How to Make Sense of the Previous Discussions
The growing body of research shows that data visualization is a task that can hardly be led by only a one-factor rule like data-ink ratio. It shows that even the simple task of choosing what elements to include or exclude in a visualization remains largely an uncharted territory and needs further examination.
However, one of the common underpinnings that all theoretical works share is a consideration for the importance of context in which a visualization is designed. To be fair, even Tufte himself did not ignore this consideration after all and emphasized that certain principles have to be adopted “within reason.” Asserting the “reasonability” factor, he deliberately mentions in the Data-Ink Maximization chapter of his book that maximizing data-ink “is but a single dimension of a complex and multivariate design task.” He recognized the possible existence of factors other than excellency that come into play, including “beauty,” even if he did not prioritize them.
Therefore, synthesizing all the critiques arising against Tufte’s rule of data-ink ratio appears to be possible by quoting Tufte himself. He said that determining to which extent the data-ink ratio should be maximized rests on statistical and aesthetic criteria.” This allows data visualization designers to figure out the sweet spot where a visualization delivers what it intends to and, at the same time, does not alienize itself for the sake of being minimal.
All in all, minimalism can be considered one of the means to design a great data visualization, but not a goal. After all, the goal will remain to deliver the intended message to the audience so they can perceive it best.
Edward Tufte’s principles of data-Ink ratio have prevailed in data visualization since they were introduced in the 1980s. His theory has imposed a tendency towards a minimalistic style, defining excellence as clarity, precision, and efficiency and reducing the time users perceive information.
Meanwhile, academic research that has put the American pioneering statistician’s teachings to test does not show the linear relationship between data-ink ratio and visualization’s excellency. Further research shed light on other important criteria that Tufte overlooked, like the ability of a visualization to induce attention, memorability, and engagement. Overall, the academic body of literature has strongly suggested that no simple rule like data-ink ratio can suffice in data designing.
Debates among practitioners have been ongoing about the repeated notion of “less is more,” which leans back on Tufte’s teachings. Some believe that simplicity and quick perceiving should be the goals of all visualizations at all times. Others support embracing complexity and slow viewing time in some circumstances.
As a response to these debates, two interesting frameworks have emerged to suggest more criteria that should be considered. The first is “Levers of Chart-Making” by Andy Cotgreave, a senior data evangelist at Tableau, and the second is “Cognitive Load as a Guide” by Eva Sibinga and Erin Waldron, data science and visualization specialists.
Cotgreave suggested this under-formulation framework in the November 2022 edition of his newsletter The Sweet Spot. He put forward five scales of levers that “chart producers can use to enlighten, not bamboozle.” They are as follows:
Speed to primary insight – How fast or slow insight is intended to be extracted from a graph According to him, “it is ok to make charts that take time to understand”.
Granularity – How sparse or granular is the data that a chart intends to show?
Explore or explain – whether a visualization is intended to give the users the opportunity to explore the data themselves (like self-service dashboards) or to be accompanied by an explaining presentation
Dry or emotional – refers to how serious the way of presenting the data is versus how informal and relevant it is to non-data people. According to Cotgreave, an example of the serious approach is a normal column chart and for the emotional, a necklace of which the bead’s size represents the same underlying data.
Ambiguity vs. accuracy – For Cotgreave, there can be intended ambiguity in chart-making instead of clear accuracy.
Cognitive load is a more detailed and rigid framework that takes its inspiration from the psychology of instructional design. Suggested by Sibinga and Waldron, the framework was published by the Journal of the Data Visualization Society (Nightingale) in September 2021.
Cognitive load proposes 12 spectra, offering “an alternative to one-size-fits-all rules” and aiming to “encourage a more nuanced strategy” for data visualization. Divided into three categories, the spectra are supposed to “gauge the complexity of our data on one side, identify the needs of our audience on the other, and then calibrate our visualization to successfully bridge the gap between the two.”
Intrinsic load – This is the first group of spectra that is concerned with the data itself. It considers the inherent level of complexity in the data that a designer is tasked to explain with a visualization. The included spectra are:
Measurement (quantitative vs. qualitative) – According to the authors, quantitative data has less cognitive load (easier to perceive) than qualitative data. That is because the former usually has obvious measuring units, like dollars or miles, while the latter usually needs a conceptual rating scale, like satisfaction rate from 1 to 5.
Knowability (certain vs. uncertain) – Data collected from the whole population is easier to perceive than data estimated depending on a sample or predicted for the future. This is because the former usually has a high level of certainty that is easier to perceive than the uncertainty that comes with the latter, intertwined with its inevitable statistical margins of error.
Specificity (precise vs. ambiguous) – Undebated data categories, like blood type or zip codes, tend to be easier to perceive than socially determined concepts, like gender, race, and social class.
Relatability (concrete vs. abstract) – How relatable is the data to what humans see in everyday life? Concrete data would be small numbers like the cost of lunch and one’s age, while abstract data would be conceptual ones like GDP and the age of the earth.
Germane load – The second group of spectra is concerned with the audience and how ready they are to process the new information shown by a visualization. The included spectra are:
Connection (intentional vs. coincidental) – How will the audience have the first look at the visualization? Intentional viewers are likely better propped to perceive the visualization than viewers who stumble upon it by accident.
Pace (slow vs. fast) – Slow viewers are the ones that have more time in hand and therefore +have more ability to perceive a visualization (interpreting into lighter cognitive load).
Knowledge (expert vs. novice) – Expert viewers are the ones who are already familiar with the subject and therefore will have to afford lighter cognitive load when viewing a visualization.
Confidence (confident vs. anxious) – This spectrum addresses the intersection of the audience and the data reporting format. The cognitive load that is required from an audience familiar with the data reporting format, such as an interactive dashboard and a data-based report, will require lighter cognitive load than the one that is encountering such a channel for the first time.
Extraneous load – The final group addresses how new information is presented. The authors believe that these are the criteria where a designer has the most control and should therefore be considered last. The advice to determine a visualization’s place on the following spectra is by answering the question: “Given the existing intrinsic and germane loads, how much more cognitive load are we comfortable adding to the mix?”
Chart type (common vs. rare) – Chart types like bar charts need lighter cognitive load than uncommon ones, like violin charts or rose diagrams and the more innovative ones.
Interpretation (accurate vs. approximate) – Does the chart aim to deliver precise values or paint a wide picture? According to the authors, charts delivering specific values tend to take a lighter cognitive load than the ones dealing with overall objectives.
Composition (concise vs. detailed) – This spectrum assumes a high data-ink ratio and no chartjunk (from Tufte’s concepts) are already in place and then asks, how dense is the information on the page? Less dense visualizations require lighter cognitive load.
Delivery (explanatory vs. exploratory) – Does the data report explain itself, or is built to be explored? Exploration, naturally, takes more cognitive load than a self-explaining visualization.
How to make sense of all the previous discussions
Levers of chart-making and cognitive load as a guide are two of the recently suggested frameworks that offer a more complex approach to the task of data visualization. The two have similarities, like their consideration of complexity, granularity, and way of delivery. They differ from Tufte’s approach mainly through their acceptance of the need to slowly perceive designs in some circumstances. Cognitive load still deliberately pre-assumes applying data-ink ratio principles beforehand.
Therefore, no framework is likely to totally replace the others. At best, they tend to complement each other to cover the vast territory of the data visualization domain.
Data-ink ratio principles remain a good point to start as it best fits most business contexts. It can also help designers keep in mind the point of their design and avoid getting distracted amidst all the available software tools today. However, considering the emerging frameworks can make the practice more nuanced for tackling different needs, messages, and audiences.
The final determinant of how to incorporate the three frameworks -and any other emerging ones- in practice will remain to be the context of the visualization. A better understanding of the audience, the message, and the medium is key before using the different frameworks to decide on how information should be delivered.
Big data is a major asset for businesses that can access its insights. Making this happen, though, is a complicated job that needs the right tools. Enter data enrichment.
Understanding how it works and its impact on current industries is a great way to get to know what data enrichment can do for your organization. How it benefits the use of big data will become clearer, too.
What Is Data Enrichment?
Data enrichment is the process of identifying and adding information from different datasets, open or closed, to your primary data. Sources can be anything from a third-party database to online magazines or a social network’s records.
People and organizations use data enrichment to gather legitimate intel on specific things, like a customer, product, or list of competitors. And they can start with just their names or email addresses.
As a result, the original data becomes richer in information and more useful. You can find education trends, profitable news, evidence of fraud, or just a deeper understanding of users. This helps improve your conversion rate, customer relations, cybersecurity, and more.
The most popular method of making all this a reality is specialized software. Their algorithms vary in strengths and weaknesses, as SEON’s review of data enrichment tools shows. They can target human resources, underwriting, fraud, criminal investigations, and more. However, the goal is the same: to support the way we work and give us better insights.
Data Enrichment and Big Data: What Statistics Say
Data enrichment is a good answer to the problem of big data, which often sees masses of disorganized and sometimes inaccurate information that often needs cleaning, maintenance, and coordination.
Creating a data-driven culture within organizations
Despite the benefits of smart data management and major investments already in place, only 24% of firms have become data-driven, down from 37.8%. Also, only 29.2% of transformed businesses are reaching set outcomes.
What this shows is that, yes, big data is difficult to deal with but not impossible. It takes good planning and dedication to get it right.
There are several promising big data statistics on FinancesOnline. For starters, thanks to big data, businesses have seen their profits increase by 8-10%, while some brands using IoT saved $1 trillion by 2020.
Also, the four biggest benefits of data analytics are:
Faster innovation
Greater efficiency
More effective research and development
Better products and services
These achievements are taken further with data enrichment, which adds value to a company’s datasets, not just more information to help with decision-making.
How Does Data Enrichment Help Different Industries?
The positive impact of constructively managing data is clear in existing fields that thrive because of data enrichment and other techniques. Here are some examples.
Fraud Prevention
Data enrichment helps businesses avoid falling victim to fraudsters. It does this by gathering and presenting to fraud analysts plenty of information to identify genuine people and transactions.
For example, you can build a clear picture of a potential customer or partner based on information linked to their email address and phone number. Do they have any social media profiles? Are they registered on a paid or free domain? Have they been involved in data leaks in previous years? How old are those?
It’s then easier to make informed decisions because we know much more about how legitimate a user looks.
Banking services, from J.P. Morgan to PayPal, benefit from such intensive data analytics, as do brands in the fields of ecommerce, fintech, payments, online gaming, and more.
But so do online communities, where people create profiles and interact with others. For example, fake accounts are always a problem on LinkedIn, mainly countered through careful tracking of user activity. Data enrichment can help weed out suspicious users in such communities, keeping everyone else safe.
Marketing
Data enrichment in marketing tracks people’s activities and preferences through cookies, subscription forms, and other sources. To be exact, V12’s report on data-driven marketing reveals Adobe’s survey findings regarding what data is most valuable to marketers.
48% prefer CRM data
40% real-time data from analytics
38% analytics data from integrated channels
Companies collect this data and enrich it to create a more personalized experience for customers in terms of interactions, discounts, ads, etc. Additionally, brands can produce services and products tailored to people’s tastes.
HR
The more information your human resources department has, the better it’s able to recruit and deal with staff members. Data enrichment is a great way to build strong teams and keep them happy.
Starting from the hiring stage, data enrichment can use applicants’ primary data, available on their CVs, and grab additional details from other sources. Apart from filling in any blanks, you can flag suspicious applicants for further investigation or outright rejection.
As for team management, data enrichment can give you an idea of people’s performance, strengths, weaknesses, hobbies, and more. You can then help them improve or organize an event everyone will enjoy.
Summing Up
As we saw in these examples, data enrichment already contributes to the corporate world in different ways, both subtle and grand.
With the right knowledge and tools, we can tap into this wealth of information even further, allowing it to make a real difference in how we work and what we know, rather than simply amassing amorphous and vast amounts of data.
Learn more about data enrichment by exploring our articles on data analytics.
**********
About the Author
Gergo Varga has been fighting online fraud since 2009 at various companies – even co-founding his own anti-fraud startup. He’s the author of the Fraud Prevention Guide for Dummies – SEON Special edition. He currently works as the Senior Content Manager / Evangelist at SEON, using his industry knowledge to keep marketing sharp and communicating between the different departments to understand what’s happening on the frontlines of fraud detection. He lives in Budapest, Hungary, and is an avid reader of philosophy and history.
Image source: DKosig from Getty Images Signature via Canva
In today’s data-driven world, organizations are constantly grappling with an abundance of data coming from various sources and in different formats. Data integration has emerged as a critical process that enables businesses to connect these disparate data sources by consolidating them into repositories called data silos, creating a comprehensive and unified view of their information. This single source of truth empowers organizations to make more informed decisions and derive valuable insights for better business intelligence.
These disparate data sources can vary in type, structure, and format. Successful data integration finds a way to connect these sources, either by building relationships between them where they reside or by periodically extracting, transforming, and loading data (a process known as ETL) from these sources into one big database dubbed a data warehouse.
For example, when sales data is combined with customer data, the organization can gain a deeper understanding of customer behavior and preferences, which would allow personalized marketing efforts and improved customer satisfaction.
Data integration can be challenging as there is no one technical way of implementing it. Rather, the process depends on the needs and resources of each organization. Organizations with no technical capabilities would need to seek a third-party service provider.
Despite the variance across organizations, one thing remains consistent—every data integration process should be approached systematically by taking into consideration the following key strategic steps:
Defining integration goals: Organizations need to clearly outline the objectives and outcomes they want to achieve through data integration.
Assessment of data sources: This includes identifying all the data sources within the organization and understanding the structure, format, and quality of the data coming from each source.
Data mapping and transformation: This entails defining how different sources will be mapped to a common format. This may involve cleaning and preparing data silos in the first place.
Defining technique and tools: Based on the previous steps, a technical decision should be made on how to do the integration and the degree with which manual labor and automation will be utilized.
Building integration processes: This answers the question, “How will future data be integrated as well?” It involves defining workflows and processes that should be scalable, reliable, and capable of handling future data growth.
Testing and monitoring: As data integration is a continuous process, organizations should always test and monitor the integrated data thoroughly to ensure accuracy, consistency, and reliability. Validating the integration results should be done against predefined criteria, along with making necessary adjustments if discrepancies are found or to adapt to changing data sources and business needs.
In conclusion, data integration plays a crucial role in enabling organizations to harness the full potential of their data. By connecting disparate data sources and creating a single source of truth, organizations can unlock valuable insights, improve decision-making, and enhance operational efficiency. Following a systematic approach and leveraging appropriate integration tools lets organizations achieve successful data integration and gain a competitive edge in today’s data-driven landscape.
Get more insights on data integration and management practices by exploring our articles on data analytics.