昨天浏览网页发现了一篇书写sql格式的git readMe的文章。纯英文,现转义成中文说明,记录分享。
文章篇幅有些长,需要静心浏览。
例子:
下面是一个非常规的查询语句,以示本篇sql导读在训练过程中的格式。
with hubspot_interest as ( select email, timestamp_millis(property_beacon_interest) as expressed_interest_at from hubspot.contact where property_beacon_interest is not null ), support_interest as ( select email, created_at as expressed_interest_at from helpscout.conversation inner join helpscout.conversation_tag on conversation.id = conversation_tag.conversation_id where tag = 'beacon-interest' ), combined_interest as ( select * from hubspot_interest union all select * from support_interest ), final as ( select email, min(expressed_interest_at) as expressed_interest_at from combined_interest group by email ) select * from final
格式参考说明
使用小写SQL
相比于大写的可读性好,并且不用在输入的时候按着 shift键
-- Good select * from users -- Bad SELECT * FROM users -- Bad Select * From users
一行与多行的查询语句
当查询以下情况的时候可以将你的SQL写在一行中:
- 所有的列(*) 或者 1 个,2个列。
- 在你的查询中没有额外的复杂表达式语句
-- Good select * from users -- Good select id from users -- Good select id, email from users -- Good select count(*) from users
这样的原因很简单,就是因为能够很容易的看到一行中的所有内容。但是如果你要添加新的列或者更多的复杂表达式,那么多行就会更易读:
-- Good select id, email, created_at from users -- Good select * from users where email = 'example@domain.com'
查询1 或者2列的时候,你可以将列放在一行,3(含)个以上的列,就要将每一个列的名字自成一行,包括第一个项:
-- Good select id, email from users where email like '%@gmail.com' -- Good select user_id, count(*) as total_charges from charges group by user_id -- Good select id, email, created_at from users -- Bad select id, email, created_at from users -- Bad select id, email from users
一切左对齐
-- Good select id, email from users where email like '%@gmail.com' -- Bad select id, email from users where email like '%@gmail.com'
使用单引号
一些SQL方言支持服务,像:BigQuery,支持使用双银行,但是更多的方言支持平台会将双引号当做列来进行对待,因此,单引号是最好的选择:
-- Good select * from users where email = 'example@domain.com' -- Bad select * from users where email = "example@domain.com"
使用 != 替代 <>
简单来讲,因为 != 读起来跟 不等于 很像,更贴近于我们发音。
-- Good select count(*) as paying_users_count from users where plan_name != 'free'
逗号应该在行的末尾
-- Good select id, email from users -- Bad select id , email from users
缩进where语句
当只有一个where条件时,可以跟where在同一行。
select email from users where id = 1234
当有多个where条件时,缩进每一个条件,比where深一个等级即可。将逻辑关系运算符放置在上一条件所在行的末端:
select id, email from users where created_at >= '2019-03-01' and vertical = 'work'
避免括号内部的空格
-- Good select * from users where id in (1, 2) -- Bad select * from users where id in ( 1, 2 )
将in 条件里面的长列表值,改为多个缩进行
-- Good select * from users where email in ( 'user-1@example.com', 'user-2@example.com', 'user-3@example.com', 'user-4@example.com' )
表名应该是名词的蛇形复数格式
-- Good select * from users select * from visit_logs -- Bad select * from user select * from visitLog
列名应该是蛇形格式
-- Good select id, email, timestamp_trunc(created_at, month) as signup_month from users -- Bad select id, email, timestamp_trunc(created_at, month) as SignupMonth from users
列名格式参考
- 布尔列应该添加 is_, has_, 或者 does_ 前缀。比如:is_customer, has_unsubscribed 等等。
- 只是日期格式的列,应该添加 _date 后缀,比如:report_date.
- 日期+时间的列,应该添加 _at 后缀,比如:created_at, posted_at 等等。
列顺序参考
将主键放在第一位,后面跟外键,然后是其他的列。如果表中有一些系统类的列(created_at,updated_at, is_deleted 等等),将他们放在最后面:
-- Good select id, name, created_at from users -- Bad select created_at, name, id, from users
给inner join 加上 inner 字符
能够很好的理解 join 的类型:
-- Good select email, sum(amount) as total_revenue from users inner join charges on users.id = charges.user_id -- Bad select email, sum(amount) as total_revenue from users join charges on users.id = charges.user_id
join 格式语句,将第一个被引用的表紧跟on 后面
这样做,是为了能够很容易的决定你的join是否会引起结果的散开输出。
-- Good select ... from users left join charges on users.id = charges.user_id -- primary_key = foreign_key --> one-to-many --> fanout select ... from charges left join users on charges.user_id = users.id -- foreign_key = primary_key --> many-to-one --> no fanout -- Bad select ... from users left join charges on charges.user_id = users.id
单个 join 条件 应该跟 join在同一行
-- Good select email, sum(amount) as total_revenue from users inner join charges on users.id = charges.user_id group by email -- Bad select email, sum(amount) as total_revenue from users inner join charges on users.id = charges.user_id group by email
当你有多个 join 条件时,将每一个条件单列一行并缩进:
-- Good select email, sum(amount) as total_revenue from users inner join charges on users.id = charges.user_id and refunded = false group by email
避免给表起别名
-- Good select email, sum(amount) as total_revenue from users inner join charges on users.id = charges.user_id -- Bad select email, sum(amount) as total_revenue from users u inner join charges c on u.id = c.user_id
除非,当你需要join一个表多次的时候,需要区分它们。
除非必要的时候,否则不要轻易的在查询时候带上表名
-- Good select id, name from companies -- Bad select companies.id, companies.name from companies
将聚合函数以及系统方法重命名
-- Good select count(*) as total_users from users -- Bad select count(*) from users -- Good select timestamp_millis(property_beacon_interest) as expressed_interest_at from hubspot.contact where property_beacon_interest is not null -- Bad select timestamp_millis(property_beacon_interest) from hubspot.contact where property_beacon_interest is not null
要显性的判断布尔条件
-- Good select * from customers where is_cancelled = true select * from customers where is_cancelled = false -- Bad select * from customers where is_cancelled select * from customers where not is_cancelled
使用 as 关键字 给列 起别名
-- Good select id, email, timestamp_trunc(created_at, month) as signup_month from users -- Bad select id, email, timestamp_trunc(created_at, month) signup_month from users
group by 后跟列名,而不是数字
-- Good select user_id, count(*) as total_charges from charges group by user_id -- Bad select user_id, count(*) as total_charges from charges group by 1
利用列别名的优势
-- Good select timestamp_trunc(com_created_at, year) as signup_year, count(*) as total_companies from companies group by signup_year -- Bad select timestamp_trunc(com_created_at, year) as signup_year, count(*) as total_companies from companies group by timestamp_trunc(com_created_at, year)
需要分组的列应该放在前面
-- Good select timestamp_trunc(com_created_at, year) as signup_year, count(*) as total_companies from companies group by signup_year -- Bad select count(*) as total_companies, timestamp_trunc(com_created_at, year) as signup_year from mysql_helpscout.helpscout_companies group by signup_year
对齐 case/when表达式
每一个when应该保持在一行(case单独一行),而且应该比case行多缩进一个级别。 then 可以与 when 在同一行,或者自成一行,且格式保持一致。
-- Good select case when event_name = 'viewed_homepage' then 'Homepage' when event_name = 'viewed_editor' then 'Editor' end as page_name from events -- Good too select case when event_name = 'viewed_homepage' then 'Homepage' when event_name = 'viewed_editor' then 'Editor' end as page_name from events -- Bad select case when event_name = 'viewed_homepage' then 'Homepage' when event_name = 'viewed_editor' then 'Editor' end as page_name from events
使用CTE(公用表表达式,common table expression),而不是子查询
避免子查询,CTE可以使你的查询语句更易读并且更条理清晰。
当使用CTE时,将query 另起一行。
使用CTE使,经常会有一个 CTE 被命名为 final 并且 在最后会 使用 select * from final。 这样你可以快速的检查当前查询中的其他CTE的输出,并调试结果。
CTE的括号关闭符应该与 with 和 CTE 名字 使用同样的缩进级别。
-- Good with ordered_details as ( select user_id, name, row_number() over (partition by user_id order by date_updated desc) as details_rank from billingdaddy.billing_stored_details ), final as ( select user_id, name from ordered_details where details_rank = 1 ) select * from final -- Bad select user_id, name from ( select user_id, name, row_number() over (partition by user_id order by date_updated desc) as details_rank from billingdaddy.billing_stored_details ) ranked where details_rank = 1
使用见名视意的 CTE名字
-- Good with ordered_details as ( -- Bad with d1 as (
显示功能
你可以将所有的内容放置在一行,或者拆分缩进显示在多行。
-- Good select user_id, name, row_number() over (partition by user_id order by date_updated desc) as details_rank from billingdaddy.billing_stored_details -- Good select user_id, name, row_number() over ( partition by user_id order by date_updated desc ) as details_rank from billingdaddy.billing_stored_details
该篇文章内容主要是以 https://github.com/mattm/sql-style-guide 中的内容为要。
为了深入理解,并学习记录,特此翻译记录。
另外也要谢谢,阮一峰老师的博客,才有机会看到这篇文章。