MongoDB聚合操作


提示:以下是本篇文章正文内容,MongoDB 系列学习将会持续更新

一、单一聚合

单一作用聚合:提供了对常见聚合过程的简单访问,操作都从单个集合聚合文档。

函数 描述
db.collection.estimatedDocumentCount() 忽略查询条件,返回集合或视图中所有文档的计数
db.collection.count() 返回与find()集合或视图的查询匹配的文档计数,等同于 db.collection.find(query).count()
db.collection.distinct() 在单个集合或视图中查找指定字段的不同值,并在数组中返回结果
// 检索book集合中所有文档的计数, 效率高
db.book.estimatedDocumentCount()
// 结果同上,忽略查询条件
db.book.estimatedDocumentCount({type:"travel"})
// 统计所有文档数量
db.book.count()
// 统计符合条件的文档数量
db.book.count({favCount:{$gt:50}})
// 返回不同type的数组
db.book.distinct("type")
// 返回收藏数大于90的文档不同type的数组
db.book.distinct("type",{favCount:{$gt:90}})

注意:在分片群集上,如果存在孤立文档或正在进行块迁移,则 db.collection.count() 没有查询谓词可能导致计数不准确。要避免这些情况,请在分片群集上使用 db.collection.aggregate() 方法。

回到目录…

二、聚合管道

官方文档Aggregation Pipeline Stages — MongoDB Manual

pipeline = [$stage1, $stage2, ...$stageN];
db.collection.aggregate(pipeline, {options})

pipelines 一组数据聚合阶段。除$out$Merge$geonear阶段之外,每个阶段都可以在管道中出现多次。
options 可选,聚合操作的其他参数。包含:查询计划、是否使用临时文件、 游标、最大操作时间、读写策略、强制索引等等

在这里插入图片描述

数据准备:

var tags = ["nosql","mongodb","document","developer","popular"];
var types = ["technology","sociality","travel","novel","literature"];
var books=[];
for(var i=0;i<50;i++){
	var typeIdx = Math.floor(Math.random()*types.length);
	var tagIdx = Math.floor(Math.random()*tags.length);
	var tagIdx2 = Math.floor(Math.random()*tags.length);
	var favCount = Math.floor(Math.random()*100);
	var username = "xx00"+Math.floor(Math.random()*10);
	var age = 20 + Math.floor(Math.random()*15);
	var book = {
		_id: i+1,
		title: "book-"+i,
		type: types[typeIdx],
		tag: [tags[tagIdx],tags[tagIdx2]],
		favCount: favCount,
		author: {name:username,age:age}
	};
	books.push(book)
}
db.book2.insertMany(books);

2.1 $match / $project / $count

$match 条件筛选

$match 用于对文档进行筛选,之后可以在得到的文档子集上做聚合。(一般放在最前面,也可以根据需求放到后面做筛选)。

db.book2.aggregate([{$match:{type:"novel"}}])

$project 字段投影

//投影操作, 将原始字段投影成指定名称, 如将集合中的 title 投影成 name
db.book2.aggregate([{$project:{name:"$title"}}])
//$project 可以灵活控制输出文档的格式,也可以剔除不需要的字段
db.book2.aggregate([{$project:{_id:0,name:"$title",type:1}}])

$count 计数

> db.book2.aggregate([{$count:"all_count"}])
{ "all_count" : 50 }

// $match筛选出type匹配的文档;$count返回聚合管道中过滤后的文档计数,并将值分配给novel_count
> db.book2.aggregate([{$match:{type:"novel"}},{$count:"novel_count"}])
{ "novel_count" : 14 }

回到目录…

2.2 $limit / $skip / $sort

$limit / $skip

db.book2.find().limit(5)
db.book2.aggregate([{$limit:5}])

db.book2.find().skip(5)
db.book2.aggregate([{$skip:5}])

db.book2.find().skip(5).limit(5)
db.book2.aggregate([{$skip:5},{$limit:5}])

$sort

db.book2.find().sort({favCount:1})
db.book2.aggregate([{$sort:{favCount:1}}])

回到目录…

2.3 $group 分组查询

按指定的表达式对文档进行分组,并将每个不同分组的文档输出到下一个阶段。

{$group: {_id: <expression>, <field1>: {<accumulator1>: <expression1>}, ...} }

_id 表示分组条件 (必填),指定为 null 时以整个文档为一组;剩余的计算字段是可选的,并使用运算符进行计算。
$group 阶段的内存限制为 100M。默认情况下,如果stage超过此限制会产生错误。但是,要允许处理大型数据集,请将 allowDiskUse 选项设置为 true 以启用 $group 操作以写入临时文件。

accumulator 操作符:

名称 描述 类比 SQL
$avg 计算均值。 avg
$first 返回每组第一个文档,如果有排序,按照排序,如果没有按照默认的存储的顺序的第一个文档。 limit 0,1
$last 返回每组最后一个文档,如果有排序,按照排序,如果没有按照默认的存储的顺序的最后个文档。 -
$max 根据分组,获取集合中所有文档对应值得最大值。 max
$min 根据分组,获取集合中所有文档对应值得最小值。 min
$push 将指定的表达式的值添加到一个数组中。 -
$addToSet 将表达式的值添加到一个集合中 (无重复值,无序)。 -
$sum 计算总和 sum
$stdDevPop 返回输入值的总体标准偏差(population standard deviation) -
$stdDevSamp 返回输入值的样本标准偏差(the sample standard deviation) -

示例一:统计文档中的书籍总量、收藏总数、平均收藏量

db.book2.aggregate([
	{$group:{_id:null,bookCount:{$sum:1},favCount:{$sum:"$favCount"},favAvg:{$avg:"$favCount"}}}
])
{ "_id" : null, "bookCount" : 50, "favCount" : 2392, "favAvg" : 47.84 }

示例二:统计每个作者的book收藏总数

db.book2.aggregate([
	{$group:{_id:"$author.name",favCount:{$sum:"$favCount"}}}
])
{ "_id" : "xx007", "favCount" : 259 }
{ "_id" : "xx009", "favCount" : 160 }
{ "_id" : "xx005", "favCount" : 229 }
{ "_id" : "xx008", "favCount" : 302 }
{ "_id" : "xx001", "favCount" : 300 }
{ "_id" : "xx002", "favCount" : 311 }
{ "_id" : "xx006", "favCount" : 174 }
{ "_id" : "xx003", "favCount" : 160 }
{ "_id" : "xx004", "favCount" : 373 }
{ "_id" : "xx000", "favCount" : 124 }

示例三:统计每个作者的每本book的收藏数

db.book2.aggregate([
	{$group:{_id:{author:"$author.name",title:"$title"},favCount:{$sum:"$favCount"}}}
])
{ "_id" : { "author" : "xx006", "title" : "book-42" }, "favCount" : 77 }
{ "_id" : { "author" : "xx002", "title" : "book-33" }, "favCount" : 70 }
{ "_id" : { "author" : "xx002", "title" : "book-3" }, "favCount" : 44 }
{ "_id" : { "author" : "xx007", "title" : "book-34" }, "favCount" : 50 }
{ "_id" : { "author" : "xx004", "title" : "book-46" }, "favCount" : 84 }
{ "_id" : { "author" : "xx008", "title" : "book-4" }, "favCount" : 64 }
{ "_id" : { "author" : "xx009", "title" : "book-8" }, "favCount" : 45 }
Type "it" for more

示例四:每个作者的book的type合集

db.book2.aggregate([
	{$group:{_id:"$author.name",types:{$addToSet:"$type"}}}
])
{ "_id" : "xx007", "types" : [ "sociality", "travel", "literature", "novel", "technology" ] }
{ "_id" : "xx009", "types" : [ "literature", "novel" ] }
{ "_id" : "xx005", "types" : [ "travel", "novel", "literature", "sociality" ] }
{ "_id" : "xx008", "types" : [ "literature", "sociality", "technology" ] }
{ "_id" : "xx001", "types" : [ "novel", "sociality", "technology" ] }
{ "_id" : "xx002", "types" : [ "literature", "novel", "sociality", "technology" ] }
{ "_id" : "xx006", "types" : [ "novel" ] }
{ "_id" : "xx003", "types" : [ "literature", "novel", "technology" ] }
{ "_id" : "xx004", "types" : [ "sociality", "travel", "literature", "novel" ] }
{ "_id" : "xx000", "types" : [ "literature", "sociality" ] }

示例五:统计每个分类的book文档数量

db.book2.aggregate([
	{$group:{_id:"$type",bookCount:{$sum:1}}},
	{$sort:{bookCount:1}}
])
{ "_id" : "travel", "bookCount" : 4 }
{ "_id" : "technology", "bookCount" : 9 }
{ "_id" : "literature", "bookCount" : 11 }
{ "_id" : "sociality", "bookCount" : 12 }
{ "_id" : "novel", "bookCount" : 14 }

回到目录…

2.4 $unwind 展开数组

可以将数组拆分为单独的文档。

> db.book2.aggregate([{$unwind:"$tag"}])
{ "_id" : 1, "title" : "book-0", "type" : "literature", "tag" : "developer", "favCount" : 76, "author" : { "name" : "xx003", "age" : 30 } }
{ "_id" : 1, "title" : "book-0", "type" : "literature", "tag" : "document", "favCount" : 76, "author" : { "name" : "xx003", "age" : 30 } }
{ "_id" : 2, "title" : "book-1", "type" : "sociality", "tag" : "document", "favCount" : 41, "author" : { "name" : "xx004", "age" : 27 } }
{ "_id" : 2, "title" : "book-1", "type" : "sociality", "tag" : "document", "favCount" : 41, "author" : { "name" : "xx004", "age" : 27 } }
{ "_id" : 3, "title" : "book-2", "type" : "novel", "tag" : "nosql", "favCount" : 17, "author" : { "name" : "xx009", "age" : 30 } }
{ "_id" : 3, "title" : "book-2", "type" : "novel", "tag" : "mongodb", "favCount" : 17, "author" : { "name" : "xx009", "age" : 30 } }
{ "_id" : 4, "title" : "book-3", "type" : "literature", "tag" : "popular", "favCount" : 44, "author" : { "name" : "xx002", "age" : 29 } }
{ "_id" : 4, "title" : "book-3", "type" : "literature", "tag" : "popular", "favCount" : 44, "author" : { "name" : "xx002", "age" : 29 } }
Type "it" for more

示例一:姓名为xx006的作者的book的tag数组拆分为多个文档

db.book2.aggregate([
	{$match:{"author.name":"xx006"}},
	{$unwind:"$tag"}
])
{ "_id" : 16, "title" : "book-15", "type" : "novel", "tag" : "popular", "favCount" : 97, "author" : { "name" : "xx006", "age" : 30 } }
{ "_id" : 16, "title" : "book-15", "type" : "novel", "tag" : "popular", "favCount" : 97, "author" : { "name" : "xx006", "age" : 30 } }
{ "_id" : 43, "title" : "book-42", "type" : "novel", "tag" : "developer", "favCount" : 77, "author" : { "name" : "xx006", "age" : 28 } }
{ "_id" : 43, "title" : "book-42", "type" : "novel", "tag" : "popular", "favCount" : 77, "author" : { "name" : "xx006", "age" : 28 } }

示例二:每个作者的book的tag合集

db.book2.aggregate([
	{$unwind:"$tag"},
	{$group:{_id:"$author.name",tags:{$addToSet:"$tag"}}}
])
{ "_id" : "xx003", "tags" : [ "document", "mongodb", "developer", "nosql" ] }
{ "_id" : "xx004", "tags" : [ "nosql", "developer", "popular", "mongodb", "document" ] }
{ "_id" : "xx007", "tags" : [ "document", "mongodb", "popular", "nosql", "developer" ] }
{ "_id" : "xx008", "tags" : [ "document", "mongodb", "popular", "developer", "nosql" ] }
{ "_id" : "xx006", "tags" : [ "popular", "developer" ] }
{ "_id" : "xx009", "tags" : [ "nosql", "developer", "popular", "mongodb", "document" ] }
{ "_id" : "xx005", "tags" : [ "nosql", "popular", "mongodb", "document" ] }
{ "_id" : "xx001", "tags" : [ "document", "mongodb", "popular", "nosql", "developer" ] }
{ "_id" : "xx002", "tags" : [ "document", "popular", "developer", "nosql" ] }
{ "_id" : "xx000", "tags" : [ "document", "developer", "nosql" ] }

示例三:统计tag标签的热度排行 (根据favCount总量排行即为热度)

db.book2.aggregate([
	{$unwind:"$tag"},
	{$group:{_id:"$tag",favCount:{$sum:"$favCount"}}},
	{$sort:{favCount:-1}}
])
{ "_id" : "popular", "favCount" : 1421 }
{ "_id" : "document", "favCount" : 977 }
{ "_id" : "developer", "favCount" : 893 }
{ "_id" : "nosql", "favCount" : 850 }
{ "_id" : "mongodb", "favCount" : 643 }

回到目录…

2.5 $lookup 左外连接

Mongodb3.2 新增,主要用来实现多表关联查询。每个输入待处理的文档,经过 $lookup 阶段的处理,输出的新文档中会包含一个新生成的数组。数组列存放的数据是来自被 Join 集合的适配文档 (如果没有即为[ ])。

db.collection.aggregate([{
	$lookup: {
		from: "<等待被Join的集合>",
		localField: "<源集合中的match值>",
		foreignField: "<待Join的集合的match值>",
		as: "<输出文档的新增值命名>"
	}
})

数据准备

db.customer.insert({_id:1,name:"customer1",phone:"13112345678",address:"test1"})
db.customer.insert({_id:2,name:"customer2",phone:"13112345679",address:"test2"})
db.order.insert({_id:1,orderCode:"order001",customerId:1,price:200})
db.order.insert({_id:2,orderCode:"order002",customerId:2,price:400})
db.orderItem.insert({_id:1,productName:"apples",qutity:2,orderId:1})
db.orderItem.insert({_id:2,productName:"oranges",qutity:2,orderId:1})
db.orderItem.insert({_id:3,productName:"mangoes",qutity:2,orderId:1})
db.orderItem.insert({_id:4,productName:"apples",qutity:2,orderId:2})
db.orderItem.insert({_id:5,productName:"oranges",qutity:2,orderId:2})
db.orderItem.insert({_id:6,productName:"mangoes",qutity:2,orderId:2})

示例一:联表查询用户以及对应的订单信息

db.customer.aggregate([
	{$lookup: {
		from: "order",
		localField: "_id",
		foreignField: "customerId",
		as: "customerOrder"
		}
	}
]).pretty()
// 结果如下:
{
	"_id" : 1,
	"name" : "customer1",
	"phone" : "13112345678",
	"address" : "test1",
	"customerOrder" : [
		{
	        "_id" : 1,
            "orderCode" : "order001",
            "customerId" : 1,
            "price" : 200
        }
    ]
}
{
	"_id" : 2,
	"name" : "customer2",
	"phone" : "13112345679",
	"address" : "test2",
	"customerOrder" : [
		{
			"_id" : 2,
			"orderCode" : "order002",
			"customerId" : 2,
			"price" : 400
		}
	]
}

示例二:联表查询订单信息以及对应的商品项

db.order.aggregate([
	{$lookup: {
		from: "orderItem",
		localField: "_id",
		foreignField: "orderId",
		as: "orderItem"
		}
	}
]).pretty()
// 结果如下:
{
	"_id" : 1,
	"orderCode" : "order001",
	"customerId" : 1,
	"price" : 200,
	"orderItem" : [
		{
			"_id" : 1,
			"productName" : "apples",
			"qutity" : 2,
			"orderId" : 1
		},
		{
			"_id" : 2,
			"productName" : "oranges",
			"qutity" : 2,
			"orderId" : 1
		},
		{
			"_id" : 3,
			"productName" : "mangoes",
			"qutity" : 2,
			"orderId" : 1
		}
	]
}
{
	"_id" : 2,
	"orderCode" : "order002",
	"customerId" : 2,
	"price" : 400,
	"orderItem" : [
		{
			"_id" : 4,
			"productName" : "apples",
			"qutity" : 2,
			"orderId" : 2
		},
		{
			"_id" : 5,
			"productName" : "oranges",
			"qutity" : 2,
			"orderId" : 2
		},
		{
			"_id" : 6,
			"productName" : "mangoes",
			"qutity" : 2,
			"orderId" : 2
		}
	]
}

回到目录…

2.6 $bucket 存储桶

将传入文档分类为组 (称为存储桶),指定表达式和存储桶边界并输出文档到每个存储桶,结果至少包含一个输入文档。

$bucket 阶段的 RAM 限制为 100 MB。如果阶段超过此限制,会返回一个错误。

{
  $bucket: {
      groupBy: <expression>, //指定字段,用来对文档进行分组
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ], //_id值,指定每个存储桶的边界
      default: <literal>, //可选的,指定边界外的_id值
      output: { //可选的, 除_id字段外,指定输出文档中要包含的字段
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}

示例:统计book收藏数在[0,10),[10,60),[60,80),[80,100),[100,+∞)区间内的文档数

db.book2.aggregate([{
	$bucket:{
		groupBy:"$favCount",
		boundaries:[0,10,60,80,100],
		default:"other",
		output:{"bookCount":{$sum:1}}
	}
}])
{ "_id" : 0, "bookCount" : 5 }
{ "_id" : 10, "bookCount" : 28 }
{ "_id" : 60, "bookCount" : 10 }
{ "_id" : 80, "bookCount" : 7 }

回到目录…

三、MapReduce

MapReduce 操作将大量的数据处理工作拆分成多个线程并行处理,然后将结果合并在一起。MongoDB 提供的 Map-Reduce 非常灵活,对于大规模数据分析也相当实用。

MapReduce 具有两个阶段:

  1. 将具有相同 Key 的文档数据整合在一起的 map 阶段
  2. 组合 map 操作的结果进行统计输出的 reduce 阶段
db.collection.mapReduce(
	function() {emit(key,value);}, //map函数,将数据拆分成键值对,交给reduce函数
	function(key,values) {return reduceFunction}, //reduce函数,根据键将值做统计运算
	{
		out: <collection>, //可选,将结果汇入指定表
		query: <document>, //可选,筛选数据的条件,筛选的数据送入map
		sort: <document>, //排序完后,送入map
		limit: <number>, //限制送入map的文档数
		finalize: <function>, //可选,修改reduce的结果后进行输出
		scope: <document>, //可选,指定map、reduce、finalize的全局变量
		jsMode: <boolean>, //可选,默认false。在mapreduce过程中是否将数据转换成bson格式
		verbose: <boolean>, //可选,是否在结果中显示时间,默认false
		bypassDocumentValidation: <boolean> //可选,是否略过数据校验
	}
)

在这里插入图片描述

示例:统计type为travel的不同作者的book文档收藏数

db.book2.mapReduce(
	function(){emit(this.author.name,this.favCount)},
	function(key,values){return Array.sum(values)},
	{
		query:{type:"travel"},
		out: "books_favCount"
	}
)
db.books_favCount.find()
{ "_id" : "xx007", "value" : 57 }
{ "_id" : "xx005", "value" : 95 }
{ "_id" : "xx004", "value" : 58 }

我们使用聚合管道也能实现, MongoDB5.0开始,map-reduce已被弃用。

db.book2.aggregate([
	{$match:{type:"travel"}},
	{$group:{_id:"$author.name",favCount:{$sum:"$favCount"}}}
])
{ "_id" : "xx007", "favCount" : 57 }
{ "_id" : "xx005", "favCount" : 95 }
{ "_id" : "xx004", "favCount" : 58 }

回到目录…


总结:
提示:这里对文章进行总结:
本文是对MongoDB的学习,主要学习了MongoDB的聚合操作,重点掌握了 $group分组查询, $unwind展开数组, $lookup左外连接等聚合管道,并且实现了一些应用场景。之后的学习内容将持续更新!!!

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
THE END
分享
二维码
< <上一篇
下一篇>>