管道构建复杂,如何使用MongoPlus构建$Bucket阶段
$bucket
根据指定的表达式和存储桶边界将传入的文档分为多个组(被称为桶),并为每个桶输出一个文档。每个输出的文档都包含一个 _id 字段,其值指定了桶边界范围的包含下限。output 选项指定了每个输出文档中包含的字段。
$bucket
仅为包含至少一个输出文档
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
MongoPlus的Aggregate
接口,提供了多种$bucket
阶段的重载方法
bucket(final SFunction<T,?> groupBy,final List<Boundary> boundaries);
bucket(final Object groupBy,final List<Boundary> boundaries);
bucket(final SFunction<T,?> groupBy, final List<Boundary> boundaries, BucketOptions options);
bucket(final Object groupBy, final List<Boundary> boundaries, BucketOptions options);
bucket(final Bson bson);
参数:
按年份划分存储桶,按存储桶结果筛选
创建名为 artists
的示例集合,其中包含以下文档:
db.artists.insertMany([
{ "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
{ "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
{ "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
{ "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
{ "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
{ "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
{ "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
{ "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])
以下操作根据 year_born
字段将文档分组到存储桶,并根据存储桶中的文档计数进行筛选:
MongoDB语句:
db.artists.aggregate( [
// First Stage
{
$bucket: {
groupBy: "$year_born", // Field to group by
boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets
default: "Other", // Bucket ID for documents which do not fall into a bucket
output: { // Output for each bucket
"count": { $sum: 1 },
"artists" :
{
$push: {
"name": { $concat: [ "$first_name", " ", "$last_name"] },
"year_born": "$year_born"
}
}
}
}
},
// Second Stage
{
$match: { count: {$gt: 3} }
}
] )
MongoPlus构建:
AggregateWrapper wrapper = new AggregateWrapper();
wrapper.bucket(
"$year_born", // 按字段分组,可替换为lambda,XXX::getYearBorn
Arrays.asList(1840, 1850, 1860, 1870, 1880), //桶边界
new BucketOptions()
.defaultBucket("Other") // 未放入桶中的文档的桶ID
.output(
Accumulators.sum(),
Accumulators.push("artists",
new MongoPlusDocument(){{
put("name",AggregateOperator.concat("$first_name", " ", "$last_name"));
//putOption(XXX::getYearBorn,XXX::getYearBorn);
put("year_born","$year_born");
}}
)
)
);
wrapper.match(qw -> qw.gt("count",3));
第一个阶段:
$bucket
阶段按 year_born
字段将文档分组到存储桶中。存储桶具有以下边界:
- [1840, 1850) ,范围包含下限 1840 但不包含上限 1850 。
- [1850, 1860) ,范围包含下限 1850 但不包含上限 1860 。
- [1860, 1870) ,范围包含下限 1860 但不包含上限 1870 。
- [1870, 1880) ,范围包含下限 1870 但不包含上限 1880 。
- 如果文档不包含 `year_born` 字段或其 `year_born` 字段超出上述范围,则会将其置于默认存储桶中,`_id` 值为 `"Other"`。
该阶段包括输出文档,用于确定待返回的字段:
字段 | 说明 |
---|---|
_id |
包括存储桶的下边界。 |
count |
存储桶中的文档计数。 |
artists |
包含存储桶中每位artists 信息的文档数组。 |
此阶段将以下文件传递到下一阶段:
{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
{ "name" : "Edvard Diriks", "year_born" : 1855 } ] }
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 } ] }
{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }
第二阶段
$match
阶段筛选前一阶段的输出,仅返回包含 3 个以上文档的存储桶。
该操作将返回以下文档:
{ "_id" : 1860, "count" : 4, "artists" :
[
{ "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 }
]
}
使用带有$facet的$bucket按多个字段进行存储桶分组
您可以使用 $facet 阶段在单个阶段中执行多个 $bucket 聚合。
创建名为 artwork 的示例集合,其中包含以下文档:
db.artwork.insertMany([
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : NumberDecimal("199.99") },
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : NumberDecimal("280.00") },
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : NumberDecimal("76.04") },
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : NumberDecimal("167.30") },
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : NumberDecimal("483.00") },
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : NumberDecimal("385.00") },
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
/* No price*/ },
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : NumberDecimal("118.42") }
])
以下操作使用 $facet
阶段中的两个 $bucket
阶段来创建两个分组,一个按照 price
分组,另一个按照 year
分组:
MongoDB语句:
db.artwork.aggregate( [
{
$facet: { // Top-level $facet stage
"price": [ // Output field 1
{
$bucket: {
groupBy: "$price", // Field to group by
boundaries: [ 0, 200, 400 ], // Boundaries for the buckets
default: "Other", // Bucket ID for documents which do not fall into a bucket
output: { // Output for each bucket
"count": { $sum: 1 },
"artwork" : { $push: { "title": "$title", "price": "$price" } },
"averagePrice": { $avg: "$price" }
}
}
}
],
"year": [ // Output field 2
{
$bucket: {
groupBy: "$year", // Field to group by
boundaries: [ 1890, 1910, 1920, 1940 ], // Boundaries for the buckets
default: "Unknown", // Bucket ID for documents which do not fall into a bucket
output: { // Output for each bucket
"count": { $sum: 1 },
"artwork": { $push: { "title": "$title", "year": "$year" } }
}
}
}
]
}
}
] )
MongoPlus构建:
AggregateWrapper wrapper = new AggregateWrapper();
// 输出字段1
AggregateWrapper priceWrapper = new AggregateWrapper();
priceWrapper.bucket(
"$price", // 按字段分组
Arrays.asList(0, 200, 400), // 桶边界
new BucketOptions()
.defaultBucket("Other") // 未放入桶中的文档的桶ID
.output(
Accumulators.sum(),
Accumulators.push(
"artwork",
new MongoPlusDocument() {{
put("title", "$title");
put("price", "$price");
}}
),
Accumulators.avg("averagePrice", "$price")
)
);
// 输出字段2
AggregateWrapper yearWrapper = new AggregateWrapper();
yearWrapper.bucket(
"$year",
Arrays.asList(1890, 1910, 1920, 1940),
new BucketOptions()
.defaultBucket("Unknown")
.output(
Accumulators.sum(),
Accumulators.push(
"artwork",
new MongoPlusDocument() {{
put("title", "$title");
put("price", "$price");
}}
)
)
);
wrapper.facet(
new Facet("price",priceWrapper),
new Facet("year",yearWrapper)
);
第一个分面
按 price 对输入文档进行分组。存储桶具有以下边界:
- [0, 200) 包含下边界 0 且不含上边界 200。
- [200, 400) 包含下边界 200 且不含上边界 400。
- “其他”,系指 default 存储桶包含无价格或价格超出上述范围的文档。
$bucket
阶段包括输出文档,用于确定要返回的字段:
字段 | 说明 |
---|---|
_id |
包括存储桶的下边界。 |
count |
存储桶中的文档计数。 |
artists |
包含存储桶中每位artists 信息的文档数组。 |
averagePrice |
利用 $avg 操作符来显示桶中所有artists 的平均价格。 |
第二分面
第二个分面按 year 对输入文档进行分组。存储桶具有以下边界:
- [1890, 1910) ,范围包含下限 1890 但不包含上限 1910 。
- [1910, 1920) ,范围包含下限 1910 但不包含上限 1920 。
- [1920, 1940) ,范围包含下限 1910 但不包含上限 1940 。
- “未知”,系指 default 存储桶包含无年份或年份超出上述范围的文档。
$bucket
阶段包括输出文档,用于确定要返回的字段:
字段 | 说明 |
---|---|
count |
存储桶中的文档计数。 |
artists |
包含存储桶中每位artists 信息的文档数组。 |
输出
该操作将返回以下文档:
{
"price" : [ // Output of first facet
{
"_id" : 0,
"count" : 4,
"artwork" : [
{ "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
{ "title" : "Dancer", "price" : NumberDecimal("76.04") },
{ "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
{ "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
],
"averagePrice" : NumberDecimal("140.4375")
},
{
"_id" : 200,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
{ "title" : "Composition VII", "price" : NumberDecimal("385.00") }
],
"averagePrice" : NumberDecimal("332.50")
},
{
// Includes documents without prices and prices greater than 400
"_id" : "Other",
"count" : 2,
"artwork" : [
{ "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
{ "title" : "The Scream" }
],
"averagePrice" : NumberDecimal("483.00")
}
],
"year" : [ // Output of second facet
{
"_id" : 1890,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "year" : 1902 },
{ "title" : "The Scream", "year" : 1893 }
]
},
{
"_id" : 1910,
"count" : 2,
"artwork" : [
{ "title" : "Composition VII", "year" : 1913 },
{ "title" : "Blue Flower", "year" : 1918 }
]
},
{
"_id" : 1920,
"count" : 3,
"artwork" : [
{ "title" : "The Pillars of Society", "year" : 1926 },
{ "title" : "Dancer", "year" : 1925 },
{ "title" : "The Persistence of Memory", "year" : 1931 }
]
},
{
// Includes documents without a year
"_id" : "Unknown",
"count" : 1,
"artwork" : [
{ "title" : "The Great Wave off Kanagawa" }
]
}
]
}