现象
洪敏在用 Metabase 写报表过程中,使用 Metabase 的日期变量,发现的诡异问题。两条语句仅有部分差异,返回的结果却差了一天。

期望结果。

与期望结果差了一天。
在 CDH 中查看 impala 服务器接收到的请求。
得到期望结果的SQL:
dt 是字符串,内容是日期,参数是TIMESTAMP,SQL引擎会自动转为两个 TIMESTAMP比较。
SELECT `a`.`dt`, (`a`.`x` - `b`.`x`) as `净增人数` FROM (SELECT `f_ods_wx_subscribe_di`.`dt`, COUNT(DISTINCT `f_ods_wx_subscribe_di`.`userinfo_openid`) as `x` FROM `product`.`f_ods_wx_subscribe_di` WHERE ((`f_ods_wx_subscribe_di`.`dt` < CAST( '2020-08-05 00:00:00.0' AS TIMESTAMP)) AND ((`f_ods_wx_subscribe_di`.`mid` = 'jt20191017175015910985') AND (`f_ods_wx_subscribe_di`.`dt` >= CAST( '2020-08-01 00:00:00.0' AS TIMESTAMP)))) GROUP BY `f_ods_wx_subscribe_di`.`dt`) AS a LEFT JOIN (SELECT `f_ods_wx_unsubscribe_di`.`dt`, COUNT(DISTINCT `f_ods_wx_unsubscribe_di`.`userinfo_openid`) as `x` FROM `product`.`f_ods_wx_unsubscribe_di` WHERE ((`f_ods_wx_unsubscribe_di`.`dt` < CAST( '2020-08-05 00:00:00.0' AS TIMESTAMP)) AND ((`f_ods_wx_unsubscribe_di`.`mid` = 'jt20191017175015910985') AND (`f_ods_wx_unsubscribe_di`.`dt` >= CAST( '2020-08-01 00:00:00.0' AS TIMESTAMP)))) GROUP BY `f_ods_wx_unsubscribe_di`.`dt`) AS b ON (`a`.`dt` = `b`.`dt`)
得到错误结果的SQL:
dt 本身是字符串,参数也是字符串。变成两个字符串比较,结果确实差一天。只是在select里使用了一个函数,就造成SQL巨大差异,很神奇。
select a.dt ,(a.x - nvl(b.x,0)) as `净增人数` from ( SELECT dt, count(distinct userinfo_openid) as x FROM f_ods_wx_subscribe_di WHERE mid = 'jt20191017175015910985' AND dt >= '2020-08-01 00:00:00.0' and dt < '2020-08-05 00:00:00.0' group by dt ) a left join ( SELECT dt, count(distinct userinfo_openid) as x FROM f_ods_wx_unsubscribe_di WHERE mid = 'jt20191017175015910985' AND dt >= '2020-08-01 00:00:00.0' and dt < '2020-08-05 00:00:00.0' group by dt ) b on a.dt = b.dt
metabase,metabase impala driver,impala JDBC driver都会有问题。