The Talent500 Blog
SQL

SQL Query Optimization: Techniques to Save Time

Introduction

SQL is one of the most preferred languages used for handling databases. With its robust features, it helps organizations make data-driven decisions.

Although SQL provides quite powerful queries, it is important for developers to learn query optimization, as this can efficiently improve the speed and performance of database operations. In this blog, you will learn some query optimization techniques that can save you a lot of time during data analysis processes.

1. SQL Indexing

Indexing provides a quick way to retrieve a specific data point from a table using a search key. Identifying the right column for indexing can significantly reduce the query loading time. 

  • Using WHERE and JOINs is often a good practice as it minimizes the amount of data that needs to be fetched. 
SELECT *
FROM orders
WHERE customer_id = 62;
  • In case of complex SQL queries, multiple columns can be combined into a single (composite) index. Here, order of columns has a great impact on query efficiency.
  • Data comes unprepared from the original source with missing values. Finding and fixing missing indexes also helps in improving performance.

Data is updated over time, so regularly organizing indexes helps to maintain query response time.

2. Optimizing Database Design

Databases contain a huge amount of tabular data with high workloads, so it is important to keep database design optimized, as it improves database performance and makes it more scalable. Following are some ways to achieve that:

  1. Normalized tables can help to reduce unwanted memory usage.
  2. Properly indexed tables can speed up the query loading time in SQL shell.
  3. Database configuration by memory allocation, cache limits, and other database parameters makes an efficient and streamlined data management system. This works well for handling heavy query workloads as well.

3. Optimizing SQL Queries

One of the best ways to improve the SQL data analysis process is optimizing the queries themselves. Complex queries with no proper structuring and execution order can lower the query performance significantly. Just by simply being aware of what to avoid and what to use, you can easily optimize your SQL queries:

Things to Avoid:

  • Avoid using wildcard characters

Wildcard characters such as ^, %,-, and  _  are used with the LIKE operator to find specific data within a column. As they search the entire database, the process can be quite time-consuming, particularly in larger databases. Wildcards should only be used when necessary:

SELECT * FROM players WHERE info LIKE ‘%alex%’;

— much faster
SELECT * FROM players WHERE info LIKE ‘alex’;
  • Avoid using subqueries

Subqueries are basically queries nested within another query. They are very powerful, however their use should be minimized wherever possible as JOINs tend to execute much faster than subqueries. Subqueries also increase memory load during query processing as the nested queries are executed after every instance.

Another great alternative is CTEs or Common Table Expressions, which are more readable in comparison to nested subqueries. 

— using subqueries
SELECT *
FROM players
WHERE player_id IN (SELECT player_id FROM matches WHERE age > 30);
— using joins
SELECT *
FROM players p
JOIN matches m
ON p.player_id = m.player_id
WHERE m.age > 30;

The second query will execute faster.

  • Avoid using SELECT *

SELECT * returns every column of a table when a query is executed, causing unnecessary data transfers which increases the processing time as well. Instead of using SELECT *, it is more efficient to fetch only the required columns for faster analysis.

— avoid this
SELECT *
FROM players;

— this works faster
SELECT name, age, country
FROM players;
  • Avoid using SELECT DISTINCT

SELECT DISTINCT looks only for unique values in a column, and ignores the duplicate data. This can take a lot of memory. Unless it is required, consider minimizing the use of DISTINCT in queries:

— avoid this
SELECT DISTINCT player_id
FROM players;

— use this
SELECT player_id
FROM players;
  • Avoid using nested queries

As discussed earlier, unwanted nesting queries can make SQL queries complex and harder to read:

SELECT *
FROM (SELECT *
      FROM players) AS new_players
WHERE player_id IN (SELECT player_id
                    FROM matches
                    WHERE score > 100); 

Instead, this can be simplified as:

SELECT *
FROM players
WHERE player_id IN (SELECT player_id
                    FROM matches
                    WHERE score > 100); 

Tips to consider:

  • Use LIMIT and TOP clauses

When a query is executed, it returns every row that meets the given condition. However, this is not always necessary, and you can utilize the LIMIT and TOP clauses in a query to fetch a limited set of data, making it memory efficient and faster: 

— Using MySQL
SELECT * FROM players ORDER BY age DESC LIMIT 3;
— Using SQL Server
SELECT TOP 3 * FROM players ORDER BY age DESC;

This query will extract the top three rows from the players table, arranged in descending order of age, referring to the three eldest members of the team.

  • Use VARCHAR in place of CHAR

Both CHAR and VARCHAR are designed to store character strings. CHAR has a fixed storage criteria, while VARCHAR uses less memory by excluding blank spaces.

VARCHAR(100): Consumes 1 byte of memory
CHAR(100): Consumes 100 bytes of memory 

— avoid this
CREATE TABLE students (name CHAR(100));
— use this
CREATE TABLE students (name VARCHAR(100));
  • Use EXISTS in place of IN

EXISTS works faster than IN when used with subqueries:

— avoid this
SELECT *
FROM players
WHERE player_id IN (SELECT player_id
                    FROM matches
                    WHERE age > 27);
— use this
SELECT *
FROM players p
WHERE EXISTS (SELECT 1
              FROM matches m
              WHERE m.player_id = p.player_id
              AND m.age > 27); 
  • Use UNION ALL in place of UNION

UNION takes only unique data from both tables, eliminating duplicates. In contrast, UNION ALL works faster as it fetches all the data available:

— using UNION
SELECT player_id, name
FROM players WHERE type = ‘regular’
UNION
SELECT player_id, name
FROM players WHERE type = ‘periodic’;

— using UNION ALL: works faster
SELECT player_id, name
FROM players WHERE type = ‘regular’
UNION ALL
SELECT player_id, name
FROM players WHERE type = ‘periodic’;
  • Use WHERE instead of HAVING

WHERE and HAVING clauses are used to filter rows with GROUP BY statements. WHERE filters rows before grouping happens, whereas HAVING filters after grouping, potentially increasing the amount of data to be filtered. Following is the use case of both:

— WHERE clause
SELECT team, COUNT(*) as total_qualified
FROM players
WHERE age > 25
GROUP BY team;

— HAVING clause
SELECT team, AVG(age) as average_age
FROM players
GROUP BY team
HAVING AVG(age) > 30;
  • Use JOINs efficiently

SQL Joins are more powerful than subqueries. However, there are some things to keep in mind to make the most of them without any query overload:

  • Reduce table sizes wherever possible.
  • Avoid use of too many JOINS in your queries.
  • Avoid using cross joins.
  • Prefer INNER JOINS over OUTER JOINS for faster processing.
  • Use Window Functions

Window functions such as ROW_NUMBER() and RANK() can simplify complex queries, and provide a faster way to handle ranking operations.

— using ROW_NUMBER()
SELECT name, score,
      ROW_NUMBER() OVER (ORDER BY score DESC) AS row
FROM players;

This query will create a unique sequential number for each player based on their scores in descending order.

— using RANK()
SELECT name, score,
      RANK() OVER (ORDER BY score DESC) AS rank
FROM players;

This query will assign a rank to players based on their scores, in descending order. RANK() assigns the same rank to players with the equal scores.

  • Use Stored Procedures

Stored procedures is a set of SQL queries under a unique name that can be reused when needed. When the name is called, the whole query gets executed reducing repetitions in code.

CREATE PROCEDURE GetPlayers (@asia VARCHAR(50))
AS
BEGIN
    SELECT * FROM players WHERE team = @asia;
END

Whenever “GetPlayers” is called, it retrieves all players belonging to the specified team (“asia”) from the players table .

  • Use temporary tables for huge data

Temporary tables function just like the regular tables of a database, except that they store data temporarily. These tables are deleted automatically once a session is closed. For huge databases, these can come in handy to save memory usage within a session.

  • Use GROUP BY efficiently

The GROUP BY clause returns a summary of columns where aggregate functions like SUM, AVG, and COUNT are applied to each group.

SELECT team, AVG(score) AS avg_score
FROM players
GROUP BY team;
  • Choose appropriate data types

Lastly, using relevant data types for each column ensures data consistency and eases the analysis process. Using VARCHAR for text values and INT for numeric values makes aggregate operations much faster, reduces any chance of errors, and takes less space.

  CREATE TABLE players (
      name VARCHAR(100),
      player_id INT,
      age INT,
);

4. Analyzing Query Performance

You have learnt how to optimize your databases and sql queries, but it’s also essential to analyze queries regularly as it defines a database’s efficiency. Using database monitoring and profiling tools can significantly improve query speed by identifying and fixing gaps and using the available resources effectively.

  • Rewrite and monitor queries by their order of execution to improve their speed.
  • Database monitoring tools such as SQL Server Profiler, Oracle Enterprise Manager, and MySQL Enterprise Monitor helps to identify slow queries.
  • Query optimizer tools like Oracle SQL developer and MS SQL Server Query optimizer are useful to identify performance issues in query execution. 

5. Handling Huge Databases

Databases dealing with tons of data often face challenges with over execution and bloating of SQL queries. This can be improved using some effective strategies such as: 

  • Table Partitioning

Partitioning involves splitting a large table into smaller fragments, under some conditions. Smaller tables are feasible and comparatively easier to manage as they confine searches within their specific data when a query is executed.  

CREATE TABLE members (
    member_id INT,
    member_name VARCHAR(50),
    registered_on DATE,
)
PARTITION BY RANGE (YEAR(registered_on)) (
    PARTITION reg_2021 VALUES LESS THAN (2022),
    PARTITION reg_2022 VALUES LESS THAN (2023),
);

This will make partitions of the “members” table for each registration year. Analysts can easily carry time-based queries when retrieving data, making their work easier.

  • Database Pagination

Pagination includes the use of LIMIT and OFFSET clauses to fetch a limited or specific part of data. This method reduces the amount of data needed to be analyzed, subsequently reducing the processing time.

SELECT * FROM members LIMIT 10 OFFSET 10;

This query will retrieve 10 records from the “members” table (11th to 20th position).

Conclusion

Query optimization is an important step to enhance overall speed and performance of your databases, improving their execution time. Following are some simple tips to consider while working with databases:

  • Periodically check and optimize SQL queries and database configurations to ensure sufficient space for CPU processes to run effortlessly.
  • Schedule routine data backups to prevent loss of data in the event of query failures.
  • Keep query queues clear to avoid issues like index bloating.
  • Regularly test, rerun, and compare different queries to find the most efficient one for fetching the desired results.
0
Shreya Purohit

Shreya Purohit

As a data wizard and technical writer, I demystify complex concepts of data science and data analytics into bite-sized nuggets that are easy for anyone to understand.

Add comment