昨天浏览网页发现了一篇书写sql格式的git readMe的文章。纯英文,现转义成中文说明,记录分享。

文章篇幅有些长,需要静心浏览。

例子:

下面是一个非常规的查询语句,以示本篇sql导读在训练过程中的格式。

with hubspot_interest as (

    select
        email,
        timestamp_millis(property_beacon_interest) as expressed_interest_at
    from hubspot.contact
    where property_beacon_interest is not null

), 

support_interest as (

    select 
        email,
        created_at as expressed_interest_at
    from helpscout.conversation
    inner join helpscout.conversation_tag on conversation.id = conversation_tag.conversation_id
    where tag = 'beacon-interest'

), 

combined_interest as (

    select * from hubspot_interest
    union all
    select * from support_interest

),

final as (

    select 
        email,
        min(expressed_interest_at) as expressed_interest_at
    from combined_interest
    group by email

)

select * from final

格式参考说明


使用小写SQL

相比于大写的可读性好,并且不用在输入的时候按着 shift键

-- Good
select * from users

-- Bad
SELECT * FROM users

-- Bad
Select * From users


一行与多行的查询语句

当查询以下情况的时候可以将你的SQL写在一行中:

  • 所有的列(*) 或者 1 个,2个列。
  • 在你的查询中没有额外的复杂表达式语句
-- Good
select * from users

-- Good
select id from users

-- Good
select id, email from users

-- Good
select count(*) from users

这样的原因很简单,就是因为能够很容易的看到一行中的所有内容。但是如果你要添加新的列或者更多的复杂表达式,那么多行就会更易读:

-- Good
select
    id,
    email,
    created_at
from users

-- Good
select *
from users
where email = 'example@domain.com'

查询1 或者2列的时候,你可以将列放在一行,3(含)个以上的列,就要将每一个列的名字自成一行,包括第一个项:

-- Good
select id, email
from users
where email like '%@gmail.com'

-- Good
select user_id, count(*) as total_charges
from charges
group by user_id

-- Good
select
    id,
    email,
    created_at
from users

-- Bad
select id, email, created_at
from users

-- Bad
select id,
    email
from users

一切左对齐

-- Good
select id, email
from users
where email like '%@gmail.com'

-- Bad
select id, email
  from users
 where email like '%@gmail.com'

使用单引号

一些SQL方言支持服务,像:BigQuery,支持使用双银行,但是更多的方言支持平台会将双引号当做列来进行对待,因此,单引号是最好的选择:

-- Good
select *
from users
where email = 'example@domain.com'

-- Bad
select *
from users
where email = "example@domain.com"

使用 != 替代 <>

简单来讲,因为 != 读起来跟 不等于 很像,更贴近于我们发音。

-- Good
select count(*) as paying_users_count
from users
where plan_name != 'free'

逗号应该在行的末尾

-- Good
select
    id,
    email
from users

-- Bad
select
    id
    , email
from users

缩进where语句

当只有一个where条件时,可以跟where在同一行。

select email
from users
where id = 1234

当有多个where条件时,缩进每一个条件,比where深一个等级即可。将逻辑关系运算符放置在上一条件所在行的末端:

select id, email
from users
where 
    created_at >= '2019-03-01' and 
    vertical = 'work'

避免括号内部的空格

-- Good
select *
from users
where id in (1, 2)

-- Bad
select *
from users
where id in ( 1, 2 )

将in 条件里面的长列表值,改为多个缩进行

-- Good
select *
from users
where email in (
    'user-1@example.com',
    'user-2@example.com',
    'user-3@example.com',
    'user-4@example.com'
)

表名应该是名词的蛇形复数格式

-- Good
select * from users
select * from visit_logs

-- Bad
select * from user
select * from visitLog

列名应该是蛇形格式

-- Good
select
    id,
    email,
    timestamp_trunc(created_at, month) as signup_month
from users

-- Bad
select
    id,
    email,
    timestamp_trunc(created_at, month) as SignupMonth
from users

列名格式参考

  • 布尔列应该添加 is_, has_, 或者 does_ 前缀。比如:is_customer, has_unsubscribed 等等。
  • 只是日期格式的列,应该添加 _date 后缀,比如:report_date.
  • 日期+时间的列,应该添加 _at 后缀,比如:created_at, posted_at 等等。

列顺序参考

将主键放在第一位,后面跟外键,然后是其他的列。如果表中有一些系统类的列(created_at,updated_at, is_deleted 等等),将他们放在最后面:

-- Good
select
    id,
    name,
    created_at
from users

-- Bad
select
    created_at,
    name,
    id,
from users

给inner join 加上 inner 字符

能够很好的理解 join 的类型:

-- Good
select
    email,
    sum(amount) as total_revenue
from users
inner join charges on users.id = charges.user_id

-- Bad
select
    email,
    sum(amount) as total_revenue
from users
join charges on users.id = charges.user_id

join 格式语句,将第一个被引用的表紧跟on 后面

这样做,是为了能够很容易的决定你的join是否会引起结果的散开输出。

-- Good
select
    ...
from users
left join charges on users.id = charges.user_id
-- primary_key = foreign_key --> one-to-many --> fanout
  
select
    ...
from charges
left join users on charges.user_id = users.id
-- foreign_key = primary_key --> many-to-one --> no fanout

-- Bad
select
    ...
from users
left join charges on charges.user_id = users.id

单个 join 条件 应该跟 join在同一行

-- Good
select
    email,
    sum(amount) as total_revenue
from users
inner join charges on users.id = charges.user_id
group by email

-- Bad
select
    email,
    sum(amount) as total_revenue
from users
inner join charges
on users.id = charges.user_id
group by email

当你有多个 join 条件时,将每一个条件单列一行并缩进:

-- Good
select
    email,
    sum(amount) as total_revenue
from users
inner join charges on 
    users.id = charges.user_id and
    refunded = false
group by email

避免给表起别名

-- Good
select
    email,
    sum(amount) as total_revenue
from users
inner join charges on users.id = charges.user_id

-- Bad
select
    email,
    sum(amount) as total_revenue
from users u
inner join charges c on u.id = c.user_id

除非,当你需要join一个表多次的时候,需要区分它们。

除非必要的时候,否则不要轻易的在查询时候带上表名

-- Good
select
    id,
    name
from companies

-- Bad
select
    companies.id,
    companies.name
from companies

将聚合函数以及系统方法重命名

-- Good
select count(*) as total_users
from users

-- Bad
select count(*)
from users

-- Good
select timestamp_millis(property_beacon_interest) as expressed_interest_at
from hubspot.contact
where property_beacon_interest is not null

-- Bad
select timestamp_millis(property_beacon_interest)
from hubspot.contact
where property_beacon_interest is not null

要显性的判断布尔条件

-- Good
select * from customers where is_cancelled = true
select * from customers where is_cancelled = false

-- Bad
select * from customers where is_cancelled
select * from customers where not is_cancelled

使用 as 关键字 给列 起别名

-- Good
select
    id,
    email,
    timestamp_trunc(created_at, month) as signup_month
from users

-- Bad
select
    id,
    email,
    timestamp_trunc(created_at, month) signup_month
from users

group by 后跟列名,而不是数字

-- Good
select user_id, count(*) as total_charges
from charges
group by user_id

-- Bad
select
    user_id,
    count(*) as total_charges
from charges
group by 1

利用列别名的优势

-- Good
select
  timestamp_trunc(com_created_at, year) as signup_year,
  count(*) as total_companies
from companies
group by signup_year

-- Bad
select
  timestamp_trunc(com_created_at, year) as signup_year,
  count(*) as total_companies
from companies
group by timestamp_trunc(com_created_at, year)

需要分组的列应该放在前面

-- Good
select
  timestamp_trunc(com_created_at, year) as signup_year,
  count(*) as total_companies
from companies
group by signup_year

-- Bad
select
  count(*) as total_companies,
  timestamp_trunc(com_created_at, year) as signup_year
from mysql_helpscout.helpscout_companies
group by signup_year

对齐  case/when表达式

每一个when应该保持在一行(case单独一行),而且应该比case行多缩进一个级别。 then 可以与 when 在同一行,或者自成一行,且格式保持一致。

-- Good
select
    case
        when event_name = 'viewed_homepage' then 'Homepage'
        when event_name = 'viewed_editor' then 'Editor'
    end as page_name
from events

-- Good too
select
    case
        when event_name = 'viewed_homepage'
            then 'Homepage'
        when event_name = 'viewed_editor'
            then 'Editor'
    end as page_name
from events

-- Bad 
select
    case when event_name = 'viewed_homepage' then 'Homepage'
        when event_name = 'viewed_editor' then 'Editor'
    end as page_name
from events

使用CTE(公用表表达式,common table expression),而不是子查询

避免子查询,CTE可以使你的查询语句更易读并且更条理清晰。

当使用CTE时,将query 另起一行。

使用CTE使,经常会有一个 CTE  被命名为 final 并且 在最后会 使用 select * from final。 这样你可以快速的检查当前查询中的其他CTE的输出,并调试结果。

CTE的括号关闭符应该与 with 和 CTE 名字 使用同样的缩进级别。

-- Good
with ordered_details as (

    select
        user_id,
        name,
        row_number() over (partition by user_id order by date_updated desc) as details_rank
    from billingdaddy.billing_stored_details

),

final as (

    select user_id, name
    from ordered_details
    where details_rank = 1

)

select * from final

-- Bad
select user_id, name
from (
    select
        user_id,
        name,
        row_number() over (partition by user_id order by date_updated desc) as details_rank
    from billingdaddy.billing_stored_details
) ranked
where details_rank = 1

使用见名视意的 CTE名字

-- Good
with ordered_details as (

-- Bad
with d1 as (

显示功能

你可以将所有的内容放置在一行,或者拆分缩进显示在多行。

-- Good
select
    user_id,
    name,
    row_number() over (partition by user_id order by date_updated desc) as details_rank
from billingdaddy.billing_stored_details

-- Good
select
    user_id,
    name,
    row_number() over (
        partition by user_id
        order by date_updated desc
    ) as details_rank
from billingdaddy.billing_stored_details

 

该篇文章内容主要是以 https://github.com/mattm/sql-style-guide 中的内容为要。

为了深入理解,并学习记录,特此翻译记录。

另外也要谢谢,阮一峰老师的博客,才有机会看到这篇文章。

 

发表评论

电子邮件地址不会被公开。 必填项已用*标注