These five key rules have been distilled from several years working on complex commercial databases and help in writing queries that perform quickly, are easy to maintain, and need minimal debugging to provide correct results every time.
I concentrate on mainly on SELECT queries here, but the principles are applicable to the other SQL forms.
1. Write a comment with the primary or alternate keys when adding a table to a join
Be conscious of the most appropriate primary or alternate key when adding a table to a join. Write a comment such as /* PK Pipeline_No, Contract_No, Effective_Date_From, Amend_no */ when adding a table to a join. This lets people know that you have thought about join efficiency and gives confirmation that you have used the appropriate fields.
If you have a large join and there’s no suitable unique index, you might want to check if there should be.
Example – Join Dispensing Sheet to Doses to List Patients for a Hospital / Facility / Day
SELECT dose.patient_code, patient.given_name, dose.drug_type /* PK: hospital_code, facility_code, disp_date */ FROM DispensingSheets disp_sheet /* IX: hospital_code, facility_code, disp_date */ JOIN Doses doses ON disp_sheet.hospital_code = doses.hospital_code and disp_sheet.facility_code = dose.facility_code and disp_sheet.disp_date = dose.disp_date /* PK: hospital_code, patient_code, eff_date_from */ LEFT JOIN Patients patient ON disp_sheet.hospital_code = patients.hospital_code AND disp_sheet.patient_code = patients.patient_code AND disp_sheet.disp_date BETWEEN patient.eff_date_from AND patient.eff_date_to WHERE disp_sheet.hospital_code = '0001' and facility_code = 'pharmacy' and disp_date = to_date( '2006-09-03', 'YYYY-MM-DD' );
2. Always SELECT only the columns you need. Don’t ever use ‘*’.
Always SELECT only the columns you need in a query. If you are lazy and just use ‘*’ in a SELECT clause, it tells other programmers nothing about what you intend the query to return.
For example, look at the example above and consider the extra cost of building the data to return using ‘*’, compared with the few columns we actually need.
3. Use GROUP BY instead of DISTINCT
Most beginning SQL programmers use DISTINCT when they really mean GROUP BY. GROUP BY can be a pest if you are not really sure what you are selecting. However, if you are inserting to a table with a primary key (which should always be true), then the GROUP BY clause can be used to guarantee that you do not have duplicate values that will break the INSERT.
Example – Insert patients from another system
INSERT INTO PH_PATIENT_TABLE ( hospital_code, patient_code, eff_date_from, eff_date_to, given_name ) SELECT imp.hospital_code, imp.patient_code, imp.eff_date_from, MIN( imp.eff_date_to ), -- Use MIN/MAX to get one value to match the MIN( imp.given_name ) -- the GROUP BY. FROM PH_PATIENT_IMPORT imp /* Filter existing PK: hospital_code, patient_code, eff_date_from */ LEFT JOIN PH_PATIENT_TABLE patient ON imp.hospital_code = patient.hospital_code AND imp.patient_code = patient.patient_code AND imp.eff_dt_from = patient.eff_dt_from /* Don't try and insert existing records */ WHERE patient.patient_code IS NULL GROUP BY imp.hospital_code, imp.patient_code, imp.eff_date_from
If you need to insert other columns outside the GROUP BY columns that occur in the SELECT, then you can use MIN(), MAX() which pick the minimum or maximum values for columns outside the GROUP BY key.
In most cases, there won’t be multiple values for eff_date_to, or given_name, but we could use a CASE expression, or HAVING statement to check if there were multiple values in these fields.
4. Never SELECT and INSERT to the same table
If you have a large SELECT/INSERT then use a staging table to break the query into two stages. Many databases allow you to look at queries that block. Use these tools to confirm and justify the extra table.
The main exception to this rule is shown above, where we do a LEFT JOIN to avoid trying insert values in the table that are already present.
5. Look at the query plan as you write the query
If you follow the steps above then your queries should be very efficient. Trust, but use the query optimizer to verify this. As a side-benefit, the query optimizer will pick up any typos prior to starting to test the query.
If you see an expensive operation, try and work out why. Fix it.
