Scatter Plots
A scatter plot is a 2-dimensional visualization technique that uses dots to represent values for two different variables plotted along the x- and y-axes
At its core, a scatter plot consists of a horizontal axis (x-axis) and a vertical axis (y-axis), each representing a different variable. Each data point in the dataset is represented by a dot or marker on the graph, with its position determined by the values of the two variables being examined. This simple yet effective visualization technique allows for the quick identification of trends, clusters, and anomalies within the data.
Dimensions
A scatter plot displays the relationship between 2 numeric variables. A correlation coefficient calculation measure the strength of the relationship between the variables. For each data point, plot the value of its first variable on the X axis and the second variable on the Y axis. It is common to provide even more information using colors or shapes (to show groups, or a third variable).

Strengths
One of the primary strengths of scatter plots lies in their ability to reveal the type and strength of relationships between variables. A positive correlation is indicated when the data points form a pattern that rises from the lower left to the upper right of the graph, suggesting that as one variable increases, the other tends to increase as well. Conversely, a negative correlation is shown by a pattern that descends from the upper left to the lower right, indicating that as one variable increases, the other tends to decrease. The tightness of the pattern around a straight line or curve indicates the strength of the correlation, with a tighter grouping suggesting a stronger relationship.
Scatter plots are not limited to showing linear relationships. They can also reveal non-linear patterns, such as exponential or logarithmic relationships, which might not be immediately apparent in other forms of data representation. This makes scatter plots particularly valuable in fields where complex relationships between variables are common, such as in scientific research, economics, and social sciences.
In addition to revealing correlations, scatter plots are excellent for identifying outliers in a dataset. Outliers appear as points that are significantly distant from the main cluster of data points. These outliers can be important in many analyses, as they may represent errors in data collection, exceptional cases that warrant further investigation, or indicators of underlying phenomena that are not captured by the main trend in the data.
When presenting the results, you could encircle an interesting group of points or region in the plot. Outliers distort the relationship between the variables. Eliminate them, but only if their absence does not affect the analysis of relationship between the two variables. Encircling outliers also helps draw attention to those interesting exceptions / cases. Scatter plots help identify outliers i.e. values that are abnormally distant from most of the data. Eliminating outliers helps improve the visual and inference.

Interpreting scatter plots
Check for negative relationships between the two variables in the plot. If you see low values for the first variable and high values of the second variable, there is a negative correlation. In this case, a line drawn through the data points will slope downwards i.e. a negative slope. Also examine the plot for positive relationship between the variables. If low values for the first variable correspond to low values in the second, and the high values for the first correspond with high values for the second, then the variables have a positive correlation. In this case, a line drawn through the data points will slope upwards. Also inspect the plot for no relationships between the variables. No apparent relationship between the variables if the data points are randomly distributed. In this case, the data points have either no correlation, or small, statistically insignificant correlation.
Versatility
The versatility of scatter plots extends to their ability to incorporate additional dimensions of data beyond the two primary variables. This can be achieved through the use of different colors, shapes, or sizes for the data points, each representing an additional variable or category within the dataset. For example, in a scatter plot showing the relationship between a person’s height and weight, different colors could be used to represent gender, adding another layer of information to the visualization.
When to Use Scatter Plots
Scatter plots are used when you want to show the relationship between two variables. A scatter chart works best when comparing large numbers of data points without regard to time. Often, scatter plots will include a trend line to help make the relationship more clear. Additionally, the size, shape or color of the dot could represents a third (or even fourth variable).
Written by
Mithun Sridharan
Founder, LinkPress™
Mithun is a strategist, advisor, educator, and speaker focused on helping leaders make better decisions in environments shaped by change, complexity, and emerging technology. His work brings together leadership, management consulting, digital transformation, and artificial intelligence in a way that is practical, grounded, and commercially relevant.
Related Posts
Data Storytelling
Data storytelling requires a structured approach for organizing and communicating the insights from data using three core components - data analysis, visualization, and narrative
Mithun Sridharan History Of Data Visualization
We live in an exciting yet challenging time for data visualization. As we enter the information age, it's both exciting and terrifying to imagine what the future holds in store for us
Mithun Sridharan Data Literacy Empowers Academic Writing
Elevating student research performance through evidence
Mithun Sridharan