The Talent500 Blog

SQL Query Optimization: Techniques to Save Time

Abstract red light hexagon line in grey modern luxury futuristic background vector illustration.

Introduction

SQL is one of the most preferred languages used for handling databases. With its robust features, it helps organizations make data-driven decisions.

Although SQL provides quite powerful queries, it is important for developers to learn query optimization, as this can efficiently improve the speed and performance of database operations. In this blog, you will learn some query optimization techniques that can save you a lot of time during data analysis processes.

1. SQL Indexing

Indexing provides a quick way to retrieve a specific data point from a table using a search key. Identifying the right column for indexing can significantly reduce the query loading time. 

SELECT *
FROM orders
WHERE customer_id = 62;

Data is updated over time, so regularly organizing indexes helps to maintain query response time.

2. Optimizing Database Design

Databases contain a huge amount of tabular data with high workloads, so it is important to keep database design optimized, as it improves database performance and makes it more scalable. Following are some ways to achieve that:

  1. Normalized tables can help to reduce unwanted memory usage.
  2. Properly indexed tables can speed up the query loading time in SQL shell.
  3. Database configuration by memory allocation, cache limits, and other database parameters makes an efficient and streamlined data management system. This works well for handling heavy query workloads as well.

3. Optimizing SQL Queries

One of the best ways to improve the SQL data analysis process is optimizing the queries themselves. Complex queries with no proper structuring and execution order can lower the query performance significantly. Just by simply being aware of what to avoid and what to use, you can easily optimize your SQL queries:

Things to Avoid:

Wildcard characters such as ^, %,-, and  _  are used with the LIKE operator to find specific data within a column. As they search the entire database, the process can be quite time-consuming, particularly in larger databases. Wildcards should only be used when necessary:

SELECT * FROM players WHERE info LIKE ‘%alex%’;

— much faster
SELECT * FROM players WHERE info LIKE ‘alex’;

Subqueries are basically queries nested within another query. They are very powerful, however their use should be minimized wherever possible as JOINs tend to execute much faster than subqueries. Subqueries also increase memory load during query processing as the nested queries are executed after every instance.

Another great alternative is CTEs or Common Table Expressions, which are more readable in comparison to nested subqueries. 

— using subqueries
SELECT *
FROM players
WHERE player_id IN (SELECT player_id FROM matches WHERE age > 30);
— using joins
SELECT *
FROM players p
JOIN matches m
ON p.player_id = m.player_id
WHERE m.age > 30;

The second query will execute faster.

SELECT * returns every column of a table when a query is executed, causing unnecessary data transfers which increases the processing time as well. Instead of using SELECT *, it is more efficient to fetch only the required columns for faster analysis.

— avoid this
SELECT *
FROM players;

— this works faster
SELECT name, age, country
FROM players;

SELECT DISTINCT looks only for unique values in a column, and ignores the duplicate data. This can take a lot of memory. Unless it is required, consider minimizing the use of DISTINCT in queries:

— avoid this
SELECT DISTINCT player_id
FROM players;

— use this
SELECT player_id
FROM players;

As discussed earlier, unwanted nesting queries can make SQL queries complex and harder to read:

SELECT *
FROM (SELECT *
      FROM players) AS new_players
WHERE player_id IN (SELECT player_id
                    FROM matches
                    WHERE score > 100); 

Instead, this can be simplified as:

SELECT *
FROM players
WHERE player_id IN (SELECT player_id
                    FROM matches
                    WHERE score > 100); 

Tips to consider:

When a query is executed, it returns every row that meets the given condition. However, this is not always necessary, and you can utilize the LIMIT and TOP clauses in a query to fetch a limited set of data, making it memory efficient and faster: 

— Using MySQL
SELECT * FROM players ORDER BY age DESC LIMIT 3;
— Using SQL Server
SELECT TOP 3 * FROM players ORDER BY age DESC;

This query will extract the top three rows from the players table, arranged in descending order of age, referring to the three eldest members of the team.

Both CHAR and VARCHAR are designed to store character strings. CHAR has a fixed storage criteria, while VARCHAR uses less memory by excluding blank spaces.

VARCHAR(100): Consumes 1 byte of memory
CHAR(100): Consumes 100 bytes of memory 

— avoid this
CREATE TABLE students (name CHAR(100));
— use this
CREATE TABLE students (name VARCHAR(100));

EXISTS works faster than IN when used with subqueries:

— avoid this
SELECT *
FROM players
WHERE player_id IN (SELECT player_id
                    FROM matches
                    WHERE age > 27);
— use this
SELECT *
FROM players p
WHERE EXISTS (SELECT 1
              FROM matches m
              WHERE m.player_id = p.player_id
              AND m.age > 27); 

UNION takes only unique data from both tables, eliminating duplicates. In contrast, UNION ALL works faster as it fetches all the data available:

— using UNION
SELECT player_id, name
FROM players WHERE type = ‘regular’
UNION
SELECT player_id, name
FROM players WHERE type = ‘periodic’;

— using UNION ALL: works faster
SELECT player_id, name
FROM players WHERE type = ‘regular’
UNION ALL
SELECT player_id, name
FROM players WHERE type = ‘periodic’;

WHERE and HAVING clauses are used to filter rows with GROUP BY statements. WHERE filters rows before grouping happens, whereas HAVING filters after grouping, potentially increasing the amount of data to be filtered. Following is the use case of both:

— WHERE clause
SELECT team, COUNT(*) as total_qualified
FROM players
WHERE age > 25
GROUP BY team;

— HAVING clause
SELECT team, AVG(age) as average_age
FROM players
GROUP BY team
HAVING AVG(age) > 30;

SQL Joins are more powerful than subqueries. However, there are some things to keep in mind to make the most of them without any query overload:

Window functions such as ROW_NUMBER() and RANK() can simplify complex queries, and provide a faster way to handle ranking operations.

— using ROW_NUMBER()
SELECT name, score,
      ROW_NUMBER() OVER (ORDER BY score DESC) AS row
FROM players;

This query will create a unique sequential number for each player based on their scores in descending order.

— using RANK()
SELECT name, score,
      RANK() OVER (ORDER BY score DESC) AS rank
FROM players;

This query will assign a rank to players based on their scores, in descending order. RANK() assigns the same rank to players with the equal scores.

Stored procedures is a set of SQL queries under a unique name that can be reused when needed. When the name is called, the whole query gets executed reducing repetitions in code.

CREATE PROCEDURE GetPlayers (@asia VARCHAR(50))
AS
BEGIN
    SELECT * FROM players WHERE team = @asia;
END

Whenever “GetPlayers” is called, it retrieves all players belonging to the specified team (“asia”) from the players table .

Temporary tables function just like the regular tables of a database, except that they store data temporarily. These tables are deleted automatically once a session is closed. For huge databases, these can come in handy to save memory usage within a session.

The GROUP BY clause returns a summary of columns where aggregate functions like SUM, AVG, and COUNT are applied to each group.

SELECT team, AVG(score) AS avg_score
FROM players
GROUP BY team;

Lastly, using relevant data types for each column ensures data consistency and eases the analysis process. Using VARCHAR for text values and INT for numeric values makes aggregate operations much faster, reduces any chance of errors, and takes less space.

  CREATE TABLE players (
      name VARCHAR(100),
      player_id INT,
      age INT,
);

4. Analyzing Query Performance

You have learnt how to optimize your databases and sql queries, but it’s also essential to analyze queries regularly as it defines a database’s efficiency. Using database monitoring and profiling tools can significantly improve query speed by identifying and fixing gaps and using the available resources effectively.

5. Handling Huge Databases

Databases dealing with tons of data often face challenges with over execution and bloating of SQL queries. This can be improved using some effective strategies such as: 

Partitioning involves splitting a large table into smaller fragments, under some conditions. Smaller tables are feasible and comparatively easier to manage as they confine searches within their specific data when a query is executed.  

CREATE TABLE members (
    member_id INT,
    member_name VARCHAR(50),
    registered_on DATE,
)
PARTITION BY RANGE (YEAR(registered_on)) (
    PARTITION reg_2021 VALUES LESS THAN (2022),
    PARTITION reg_2022 VALUES LESS THAN (2023),
);

This will make partitions of the “members” table for each registration year. Analysts can easily carry time-based queries when retrieving data, making their work easier.

Pagination includes the use of LIMIT and OFFSET clauses to fetch a limited or specific part of data. This method reduces the amount of data needed to be analyzed, subsequently reducing the processing time.

SELECT * FROM members LIMIT 10 OFFSET 10;

This query will retrieve 10 records from the “members” table (11th to 20th position).

Conclusion

Query optimization is an important step to enhance overall speed and performance of your databases, improving their execution time. Following are some simple tips to consider while working with databases:

0