Yolinux.com

cluster manpage

Search topic Section


CLUSTER()			 SQL Commands			     CLUSTER()



NAME
       CLUSTER - cluster a table according to an index


SYNOPSIS
       CLUSTER tablename [ USING indexname ]
       CLUSTER


DESCRIPTION
       CLUSTER	instructs  PostgreSQL to cluster the table specified by table-
       name based on the index specified by indexname. The index must  already
       have been defined on tablename.

       When  a	table  is  clustered,  it is physically reordered based on the
       index information. Clustering is a one-time operation: when  the  table
       is  subsequently  updated,  the	changes are not clustered. That is, no
       attempt is made to store new or updated rows according to  their  index
       order.  (If  one  wishes, one can periodically recluster by issuing the
       command again. Also, setting the table's FILLFACTOR  storage  parameter
       to  less  than  100%  can  aid  in  preserving  cluster ordering during
       updates, since updated rows are preferentially kept on the same	page.)

       When  a	table  is  clustered,  PostgreSQL remembers which index it was
       clustered by. The form CLUSTER tablename reclusters the table using the
       same index as before.

       CLUSTER	without  any parameter reclusters all the previously-clustered
       tables in the current database that the calling user owns, or all  such
       tables  if  called  by a superuser. This form of CLUSTER cannot be exe-
       cuted inside a transaction block.

       When a table is being clustered, an ACCESS EXCLUSIVE lock  is  acquired
       on  it.	This  prevents	any  other database operations (both reads and
       writes) from operating on the table until the CLUSTER is finished.

PARAMETERS
       tablename
	      The name (possibly schema-qualified) of a table.

       indexname
	      The name of an index.

NOTES
       In cases where you are accessing single rows randomly within  a	table,
       the  actual  order of the data in the table is unimportant. However, if
       you tend to access some data more than others, and there  is  an  index
       that groups them together, you will benefit from using CLUSTER.	If you
       are requesting a range of indexed values from  a  table,  or  a	single
       indexed	value  that  has  multiple  rows that match, CLUSTER will help
       because once the index identifies the table page for the first row that
       matches, all other rows that match are probably already on the same ta-
       ble page, and so you save disk accesses and speed up the query.

       During the cluster operation, a temporary copy of the table is  created
       that  contains  the  table data in the index order. Temporary copies of
       each index on the table are created as well. Therefore, you  need  free
       space on disk at least equal to the sum of the table size and the index
       sizes.

       Because CLUSTER remembers the clustering information, one  can  cluster
       the  tables  one  wants	clustered manually the first time, and setup a
       timed event similar to VACUUM  so  that	the  tables  are  periodically
       reclustered.

       Because the planner records statistics about the ordering of tables, it
       is advisable to run ANALYZE [analyze(7)] on the newly clustered	table.
       Otherwise, the planner might make poor choices of query plans.

       There  is another way to cluster data. The CLUSTER command reorders the
       original table by scanning it using the index you specify. This can  be
       slow  on  large	tables	because the rows are fetched from the table in
       index order, and if the table is disordered, the entries are on	random
       pages,  so there is one disk page retrieved for every row moved. (Post-
       greSQL has a cache, but the majority of a big table will not fit in the
       cache.)	The other way to cluster a table is to use:

       CREATE TABLE newtable AS
	   SELECT * FROM table ORDER BY columnlist;

       which  uses  the  PostgreSQL sorting code to produce the desired order;
       this is usually much faster than an index  scan	for  disordered  data.
       Then  you  drop	the  old  table,  use ALTER TABLE ... RENAME to rename
       newtable to the old name, and recreate the table's  indexes.   The  big
       disadvantage  of  this approach is that it does not preserve OIDs, con-
       straints, foreign key  relationships,  granted  privileges,  and  other
       ancillary  properties  of  the table -- all such items must be manually
       recreated. Another disadvantage is that this way requires a sort tempo-
       rary  file  about the same size as the table itself, so peak disk usage
       is about three times the table size instead of twice the table size.

EXAMPLES
       Cluster the table employees on the basis of its index employees_ind:

       CLUSTER employees USING employees_ind;


       Cluster the employees table using the same index that was used before:

       CLUSTER employees;


       Cluster all tables in the database that have previously been clustered:

       CLUSTER;


COMPATIBILITY
       There is no CLUSTER statement in the SQL standard.

       The syntax

       CLUSTER indexname ON tablename

       is also supported for compatibility with pre-8.3 PostgreSQL versions.

SEE ALSO
       clusterdb [clusterdb(1)]



SQL - Language Statements	  2010-03-12			     CLUSTER()
Hypertext Manpage Browser (cluster)
YoLinux.com Home Page
YoLinux Tutorial Index
Privacy Policy | Advertise with us | Feedback Form |
Unauthorized copying or redistribution prohibited.
    Bookmark and Share