描述
题目:现在运营想要查看用户在某天刷题后第二天还会再来刷题的平均概率。请你取出相应数据。
示例:question_practice_detail
id | device_id | quest_id | result | date |
1 | 2138 | 111 | wrong | 2021-05-03 |
2 | 3214 | 112 | wrong | 2021-05-09 |
3 | 3214 | 113 | wrong | 2021-06-15 |
4 | 6543 | 111 | right | 2021-08-13 |
5 | 2315 | 115 | right | 2021-08-13 |
6 | 2315 | 116 | right | 2021-08-14 |
7 | 2315 | 117 | wrong | 2021-08-15 |
…… |
根据示例,你的查询应返回以下结果:
avg_ret |
0.3000 |
示例1
输入:
drop table if exists `user_profile`; drop table if exists `question_practice_detail`; drop table if exists `question_detail`; CREATE TABLE `user_profile` ( `id` int NOT NULL, `device_id` int NOT NULL, `gender` varchar(14) NOT NULL, `age` int , `university` varchar(32) NOT NULL, `gpa` float, `active_days_within_30` int , `question_cnt` int , `answer_cnt` int ); CREATE TABLE `question_practice_detail` ( `id` int NOT NULL, `device_id` int NOT NULL, `question_id`int NOT NULL, `result` varchar(32) NOT NULL, `date` date NOT NULL ); CREATE TABLE `question_detail` ( `id` int NOT NULL, `question_id`int NOT NULL, `difficult_level` varchar(32) NOT NULL ); INSERT INTO user_profile VALUES(1,2138,'male',21,'北京大学',3.4,7,2,12); INSERT INTO user_profile VALUES(2,3214,'male',null,'复旦大学',4.0,15,5,25); INSERT INTO user_profile VALUES(3,6543,'female',20,'北京大学',3.2,12,3,30); INSERT INTO user_profile VALUES(4,2315,'female',23,'浙江大学',3.6,5,1,2); INSERT INTO user_profile VALUES(5,5432,'male',25,'山东大学',3.8,20,15,70); INSERT INTO user_profile VALUES(6,2131,'male',28,'山东大学',3.3,15,7,13); INSERT INTO user_profile VALUES(7,4321,'male',28,'复旦大学',3.6,9,6,52); INSERT INTO question_practice_detail VALUES(1,2138,111,'wrong','2021-05-03'); INSERT INTO question_practice_detail VALUES(2,3214,112,'wrong','2021-05-09'); INSERT INTO question_practice_detail VALUES(3,3214,113,'wrong','2021-06-15'); INSERT INTO question_practice_detail VALUES(4,6543,111,'right','2021-08-13'); INSERT INTO question_practice_detail VALUES(5,2315,115,'right','2021-08-13'); INSERT INTO question_practice_detail VALUES(6,2315,116,'right','2021-08-14'); INSERT INTO question_practice_detail VALUES(7,2315,117,'wrong','2021-08-15'); INSERT INTO question_practice_detail VALUES(8,3214,112,'wrong','2021-05-09'); INSERT INTO question_practice_detail VALUES(9,3214,113,'wrong','2021-08-15'); INSERT INTO question_practice_detail VALUES(10,6543,111,'right','2021-08-13'); INSERT INTO question_practice_detail VALUES(11,2315,115,'right','2021-08-13'); INSERT INTO question_practice_detail VALUES(12,2315,116,'right','2021-08-14'); INSERT INTO question_practice_detail VALUES(13,2315,117,'wrong','2021-08-15'); INSERT INTO question_practice_detail VALUES(14,3214,112,'wrong','2021-08-16'); INSERT INTO question_practice_detail VALUES(15,3214,113,'wrong','2021-08-18'); INSERT INTO question_practice_detail VALUES(16,6543,111,'right','2021-08-13'); INSERT INTO question_detail VALUES(1,111,'hard'); INSERT INTO question_detail VALUES(2,112,'medium'); INSERT INTO question_detail VALUES(3,113,'easy'); INSERT INTO question_detail VALUES(4,115,'easy'); INSERT INTO question_detail VALUES(5,116,'medium'); INSERT INTO question_detail VALUES(6,117,'easy');
解题参考:
题意明确:
用户在某天刷题后第二天再来刷题的平均概率
问题分解:
- 限定条件:第二天再来。
- 解法1:表里的数据可以看作是全部第一天来刷题了的,那么我们需要构造出第二天来了的字段,因此可以考虑用left join把第二天来了的拼起来,限定第二天来了的可以用
date_add(date1, interval 1 day)=date2
筛选,并用device_id限定是同一个用户。 - 解法2:用lead函数将同一用户连续两天的记录拼接起来。先按用户分组
partition by device_id
,再按日期升序排序order by date
,再两两拼接(最后一个默认和null拼接),即lead(date) over (partition by device_id order by date)
- 解法1:表里的数据可以看作是全部第一天来刷题了的,那么我们需要构造出第二天来了的字段,因此可以考虑用left join把第二天来了的拼起来,限定第二天来了的可以用
- 平均概率:
- 解法1:可以count(date1)得到左表全部的date记录数作为分母,count(date2)得到右表关联上了的date记录数作为分子,相除即可得到平均概率
- 解法2:检查date2和date1的日期差是不是为1,是则为1(次日留存了),否则为0(次日未留存),取avg即可得平均概率。
- 附:lead用法,date_add用法,datediff用法,date函数
细节问题:
- 表头重命名:as
- 去重:需要按照devece_id,date去重,因为一个人一天可能来多次
- 子查询必须全部有重命名
完整代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
select count (date2) / count (date1) as avg_ret from ( select distinct qpd.device_id, qpd. date as date1, uniq_id_date. date as date2 from question_practice_detail as qpd left join ( select distinct device_id, date from question_practice_detail ) as uniq_id_date on qpd.device_id=uniq_id_date.device_id and date_add(qpd. date , interval 1 day )=uniq_id_date. date ) as id_last_next_date |
解法2:
1
2
3
4
5
6
7
8
9
10
11
|
select avg (if(datediff(date2, date1)=1, 1, 0)) as avg_ret from ( select distinct device_id, date as date1, lead( date ) over (partition by device_id order by date ) as date2 from ( select distinct device_id, date from question_practice_detail ) as uniq_id_date ) as id_last_next_date |
注意:本文归作者所有,未经作者允许,不得转载