![]() ![]() The most important maintenance tasks you need to perform in Redshift are: Use sort keys in GROUP BY and SORT BY clauses so that the query planner can use more efficient aggregation.Whenever you can, use the WHERE statement to restrict the amount of data to be read.Do not use cross-joins unless absolutely necessary.Avoid using SELECT *. and include only the columns you need.Therefore, if a query does not match, you need to review the design of the participating tables. The tables need to be designed considering the queries that will be made.Use queries optimised for MPP environmentsĪs Redshift is a distributed MPP environment, query performance needs to be maximised by following some basic recommendations. It accomplishes this by scanning a limited number of items. It uses automatic compression, allowing Redshift to select the optimal type for each column.The data will be distributed randomly in this way. When you are not clear on how you are going to query a very large table or it simply has no relation to the rest, choose the EVEN type.In that way the JOIN statement with much bigger fact tables will execute much faster. It is advisable to copy those tables commonly used in joins of dictionary type to all the nodes. For dimension tables with a few million entries, choose the ALL type.This will enable you to perform JOIN type queries on that column very efficiently. This will distribute the data to the various nodes grouped by the chosen key values. For fact tables choose the DISTKEY type.Select the optimum data distribution type:.When designing the database, bear in mind that some key table design decisions have a considerable influence on overall query performance. ![]()
0 Comments
Leave a Reply. |