Introduction
SQL is one of the most preferred languages used for handling databases. With its robust features, it helps organizations make data-driven decisions.
Although SQL provides quite powerful queries, it is important for developers to learn query optimization, as this can efficiently improve the speed and performance of database operations. In this blog, you will learn some query optimization techniques that can save you a lot of time during data analysis processes.
1. SQL Indexing
Indexing provides a quick way to retrieve a specific data point from a table using a search key. Identifying the right column for indexing can significantly reduce the query loading time.
- Using WHERE and JOINs is often a good practice as it minimizes the amount of data that needs to be fetched.
SELECT * FROM orders WHERE customer_id = 62; |
- In case of complex SQL queries, multiple columns can be combined into a single (composite) index. Here, order of columns has a great impact on query efficiency.
- Data comes unprepared from the original source with missing values. Finding and fixing missing indexes also helps in improving performance.
Data is updated over time, so regularly organizing indexes helps to maintain query response time.
2. Optimizing Database Design
Databases contain a huge amount of tabular data with high workloads, so it is important to keep database design optimized, as it improves database performance and makes it more scalable. Following are some ways to achieve that:
- Normalized tables can help to reduce unwanted memory usage.
- Properly indexed tables can speed up the query loading time in SQL shell.
- Database configuration by memory allocation, cache limits, and other database parameters makes an efficient and streamlined data management system. This works well for handling heavy query workloads as well.
3. Optimizing SQL Queries
One of the best ways to improve the SQL data analysis process is optimizing the queries themselves. Complex queries with no proper structuring and execution order can lower the query performance significantly. Just by simply being aware of what to avoid and what to use, you can easily optimize your SQL queries:
Things to Avoid:
-
Avoid using wildcard characters
Wildcard characters such as ^, %,-, and _ are used with the LIKE operator to find specific data within a column. As they search the entire database, the process can be quite time-consuming, particularly in larger databases. Wildcards should only be used when necessary:
SELECT * FROM players WHERE info LIKE ‘%alex%’; — much faster SELECT * FROM players WHERE info LIKE ‘alex’; |
-
Avoid using subqueries
Subqueries are basically queries nested within another query. They are very powerful, however their use should be minimized wherever possible as JOINs tend to execute much faster than subqueries. Subqueries also increase memory load during query processing as the nested queries are executed after every instance.
Another great alternative is CTEs or Common Table Expressions, which are more readable in comparison to nested subqueries.
— using subqueries SELECT * FROM players WHERE player_id IN (SELECT player_id FROM matches WHERE age > 30); — using joins SELECT * FROM players p JOIN matches m ON p.player_id = m.player_id WHERE m.age > 30; |
The second query will execute faster.
-
Avoid using SELECT *
SELECT * returns every column of a table when a query is executed, causing unnecessary data transfers which increases the processing time as well. Instead of using SELECT *, it is more efficient to fetch only the required columns for faster analysis.
— avoid this SELECT * FROM players; — this works faster SELECT name, age, country FROM players; |
-
Avoid using SELECT DISTINCT
SELECT DISTINCT looks only for unique values in a column, and ignores the duplicate data. This can take a lot of memory. Unless it is required, consider minimizing the use of DISTINCT in queries:
— avoid this SELECT DISTINCT player_id FROM players; — use this SELECT player_id FROM players; |
-
Avoid using nested queries
As discussed earlier, unwanted nesting queries can make SQL queries complex and harder to read:
SELECT * FROM (SELECT * FROM players) AS new_players WHERE player_id IN (SELECT player_id FROM matches WHERE score > 100); |
Instead, this can be simplified as:
SELECT * FROM players WHERE player_id IN (SELECT player_id FROM matches WHERE score > 100); |
Tips to consider:
-
Use LIMIT and TOP clauses
When a query is executed, it returns every row that meets the given condition. However, this is not always necessary, and you can utilize the LIMIT and TOP clauses in a query to fetch a limited set of data, making it memory efficient and faster:
— Using MySQL SELECT * FROM players ORDER BY age DESC LIMIT 3; — Using SQL Server SELECT TOP 3 * FROM players ORDER BY age DESC; |
This query will extract the top three rows from the players table, arranged in descending order of age, referring to the three eldest members of the team.
-
Use VARCHAR in place of CHAR
Both CHAR and VARCHAR are designed to store character strings. CHAR has a fixed storage criteria, while VARCHAR uses less memory by excluding blank spaces.
VARCHAR(100): Consumes 1 byte of memory
CHAR(100): Consumes 100 bytes of memory
— avoid this CREATE TABLE students (name CHAR(100)); — use this CREATE TABLE students (name VARCHAR(100)); |
-
Use EXISTS in place of IN
EXISTS works faster than IN when used with subqueries:
— avoid this SELECT * FROM players WHERE player_id IN (SELECT player_id FROM matches WHERE age > 27); — use this SELECT * FROM players p WHERE EXISTS (SELECT 1 FROM matches m WHERE m.player_id = p.player_id AND m.age > 27); |
-
Use UNION ALL in place of UNION
UNION takes only unique data from both tables, eliminating duplicates. In contrast, UNION ALL works faster as it fetches all the data available:
— using UNION SELECT player_id, name FROM players WHERE type = ‘regular’ UNION SELECT player_id, name FROM players WHERE type = ‘periodic’; — using UNION ALL: works faster SELECT player_id, name FROM players WHERE type = ‘regular’ UNION ALL SELECT player_id, name FROM players WHERE type = ‘periodic’; |
- Use WHERE instead of HAVING
WHERE and HAVING clauses are used to filter rows with GROUP BY statements. WHERE filters rows before grouping happens, whereas HAVING filters after grouping, potentially increasing the amount of data to be filtered. Following is the use case of both:
— WHERE clause SELECT team, COUNT(*) as total_qualified FROM players WHERE age > 25 GROUP BY team; — HAVING clause SELECT team, AVG(age) as average_age FROM players GROUP BY team HAVING AVG(age) > 30; |
-
Use JOINs efficiently
SQL Joins are more powerful than subqueries. However, there are some things to keep in mind to make the most of them without any query overload:
- Reduce table sizes wherever possible.
- Avoid use of too many JOINS in your queries.
- Avoid using cross joins.
- Prefer INNER JOINS over OUTER JOINS for faster processing.
-
Use Window Functions
Window functions such as ROW_NUMBER() and RANK() can simplify complex queries, and provide a faster way to handle ranking operations.
— using ROW_NUMBER() SELECT name, score, ROW_NUMBER() OVER (ORDER BY score DESC) AS row FROM players; |
This query will create a unique sequential number for each player based on their scores in descending order.
— using RANK() SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM players; |
This query will assign a rank to players based on their scores, in descending order. RANK() assigns the same rank to players with the equal scores.
-
Use Stored Procedures
Stored procedures is a set of SQL queries under a unique name that can be reused when needed. When the name is called, the whole query gets executed reducing repetitions in code.
CREATE PROCEDURE GetPlayers (@asia VARCHAR(50)) AS BEGIN SELECT * FROM players WHERE team = @asia; END |
Whenever “GetPlayers” is called, it retrieves all players belonging to the specified team (“asia”) from the players table .
-
Use temporary tables for huge data
Temporary tables function just like the regular tables of a database, except that they store data temporarily. These tables are deleted automatically once a session is closed. For huge databases, these can come in handy to save memory usage within a session.
-
Use GROUP BY efficiently
The GROUP BY clause returns a summary of columns where aggregate functions like SUM, AVG, and COUNT are applied to each group.
SELECT team, AVG(score) AS avg_score FROM players GROUP BY team; |
-
Choose appropriate data types
Lastly, using relevant data types for each column ensures data consistency and eases the analysis process. Using VARCHAR for text values and INT for numeric values makes aggregate operations much faster, reduces any chance of errors, and takes less space.
CREATE TABLE players ( name VARCHAR(100), player_id INT, age INT, ); |
4. Analyzing Query Performance
You have learnt how to optimize your databases and sql queries, but it’s also essential to analyze queries regularly as it defines a database’s efficiency. Using database monitoring and profiling tools can significantly improve query speed by identifying and fixing gaps and using the available resources effectively.
- Rewrite and monitor queries by their order of execution to improve their speed.
- Database monitoring tools such as SQL Server Profiler, Oracle Enterprise Manager, and MySQL Enterprise Monitor helps to identify slow queries.
- Query optimizer tools like Oracle SQL developer and MS SQL Server Query optimizer are useful to identify performance issues in query execution.
5. Handling Huge Databases
Databases dealing with tons of data often face challenges with over execution and bloating of SQL queries. This can be improved using some effective strategies such as:
-
Table Partitioning
Partitioning involves splitting a large table into smaller fragments, under some conditions. Smaller tables are feasible and comparatively easier to manage as they confine searches within their specific data when a query is executed.
CREATE TABLE members ( member_id INT, member_name VARCHAR(50), registered_on DATE, ) PARTITION BY RANGE (YEAR(registered_on)) ( PARTITION reg_2021 VALUES LESS THAN (2022), PARTITION reg_2022 VALUES LESS THAN (2023), ); |
This will make partitions of the “members” table for each registration year. Analysts can easily carry time-based queries when retrieving data, making their work easier.
-
Database Pagination
Pagination includes the use of LIMIT and OFFSET clauses to fetch a limited or specific part of data. This method reduces the amount of data needed to be analyzed, subsequently reducing the processing time.
SELECT * FROM members LIMIT 10 OFFSET 10; |
This query will retrieve 10 records from the “members” table (11th to 20th position).
Conclusion
Query optimization is an important step to enhance overall speed and performance of your databases, improving their execution time. Following are some simple tips to consider while working with databases:
- Periodically check and optimize SQL queries and database configurations to ensure sufficient space for CPU processes to run effortlessly.
- Schedule routine data backups to prevent loss of data in the event of query failures.
- Keep query queues clear to avoid issues like index bloating.
- Regularly test, rerun, and compare different queries to find the most efficient one for fetching the desired results.
Add comment