使用MongoPlus构建$Bucket阶段

Viewed 2

管道构建复杂,如何使用MongoPlus构建$Bucket阶段

1 Answers

定义

$bucket
根据指定的表达式和存储桶边界将传入的文档分为多个组(被称为桶),并为每个桶输出一个文档。每个输出的文档都包含一个 _id 字段,其值指定了桶边界范围的包含下限。output 选项指定了每个输出文档中包含的字段。

$bucket仅为包含至少一个输出文档

语法

{
  $bucket: {
      groupBy: <expression>,
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
      default: <literal>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}

接口方法

MongoPlus的Aggregate接口,提供了多种$bucket阶段的重载方法

bucket(final SFunction<T,?> groupBy,final List<Boundary> boundaries);
bucket(final Object groupBy,final List<Boundary> boundaries);
bucket(final SFunction<T,?> groupBy, final List<Boundary> boundaries, BucketOptions options);
bucket(final Object groupBy, final List<Boundary> boundaries, BucketOptions options);
bucket(final Bson bson);

参数:

  • groupBy: 分组字段
  • boundaries: 桶边界,参数类型为List
  • options: 可选值,其中包含default和output

示例

  1. 按年份划分存储桶,按存储桶结果筛选
    创建名为 artists 的示例集合,其中包含以下文档:

    db.artists.insertMany([
    	{ "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
    	{ "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
    	{ "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
    	{ "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
    	{ "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
    	{ "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
    	{ "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
    	{ "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
    ])
    

    以下操作根据 year_born 字段将文档分组到存储桶,并根据存储桶中的文档计数进行筛选:
    MongoDB语句:

    db.artists.aggregate( [
     // First Stage
     {
       $bucket: { 
         groupBy: "$year_born",                        // Field to group by
         boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets
         default: "Other",                             // Bucket ID for documents which do not fall into a bucket
         output: {                                     // Output for each bucket
           "count": { $sum: 1 },
           "artists" : 
             { 
               $push: { 
                 "name": { $concat: [ "$first_name", " ", "$last_name"] }, 
                 "year_born": "$year_born"
               } 
             }
         }
       }
     },
     // Second Stage
     {
       $match: { count: {$gt: 3} }
     }
    ] )
    

    MongoPlus构建:

    AggregateWrapper wrapper = new AggregateWrapper();
    wrapper.bucket(
            "$year_born",   // 按字段分组,可替换为lambda,XXX::getYearBorn
            Arrays.asList(1840, 1850, 1860, 1870, 1880), //桶边界
            new BucketOptions()
                    .defaultBucket("Other") // 未放入桶中的文档的桶ID
                    .output(
                            Accumulators.sum(),
                            Accumulators.push("artists",
                                    new MongoPlusDocument(){{
                                        put("name",AggregateOperator.concat("$first_name", " ", "$last_name"));
                                        //putOption(XXX::getYearBorn,XXX::getYearBorn);
                                        put("year_born","$year_born");
                                    }}
                            )
                    )
    );
    wrapper.match(qw -> qw.gt("count",3));
    

    第一个阶段:
    $bucket 阶段按 year_born 字段将文档分组到存储桶中。存储桶具有以下边界:

    - [1840, 1850) ,范围包含下限 1840 但不包含上限 1850 。
    
    - [1850, 1860) ,范围包含下限 1850 但不包含上限 1860 。
    
    - [1860, 1870) ,范围包含下限 1860 但不包含上限 1870 。
    
    - [1870, 1880) ,范围包含下限 1870 但不包含上限 1880 。
    
    - 如果文档不包含 `year_born` 字段或其 `year_born` 字段超出上述范围,则会将其置于默认存储桶中,`_id` 值为 `"Other"`。
    

    该阶段包括输出文档,用于确定待返回的字段:

    字段 说明
    _id 包括存储桶的下边界。
    count 存储桶中的文档计数。
    artists 包含存储桶中每位artists信息的文档数组。

    此阶段将以下文件传递到下一阶段:

    { "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
    { "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
                                           { "name" : "Edvard Diriks", "year_born" : 1855 } ] }
    { "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
                                           { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
                                           { "name" : "Alfred Maurer", "year_born" : 1868 },
                                           { "name" : "Edvard Munch", "year_born" : 1863 } ] }
    { "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }
    

    第二阶段
    $match 阶段筛选前一阶段的输出,仅返回包含 3 个以上文档的存储桶。
    该操作将返回以下文档:

    { "_id" : 1860, "count" : 4, "artists" :
    	[
    		{ "name" : "Emil Bernard", "year_born" : 1868 },
    		{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
    		{ "name" : "Alfred Maurer", "year_born" : 1868 },
    		{ "name" : "Edvard Munch", "year_born" : 1863 }
    	]
    }
    
  2. 使用带有$facet的$bucket按多个字段进行存储桶分组
    您可以使用 $facet 阶段在单个阶段中执行多个 $bucket 聚合。

    创建名为 artwork 的示例集合,其中包含以下文档:

    db.artwork.insertMany([
    	{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
      "price" : NumberDecimal("199.99") },
    	{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
      "price" : NumberDecimal("280.00") },
    	{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
      "price" : NumberDecimal("76.04") },
    	{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
      "price" : NumberDecimal("167.30") },
    	{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
      "price" : NumberDecimal("483.00") },
    	{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
      "price" : NumberDecimal("385.00") },
    	{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
      /* No price*/ },
    	{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
      "price" : NumberDecimal("118.42") }
    ])
    

    以下操作使用 $facet 阶段中的两个 $bucket 阶段来创建两个分组,一个按照 price 分组,另一个按照 year 分组:
    MongoDB语句:

    db.artwork.aggregate( [
     {
       $facet: {                               // Top-level $facet stage
         "price": [                            // Output field 1
           {
             $bucket: { 
                 groupBy: "$price",            // Field to group by
                 boundaries: [ 0, 200, 400 ],  // Boundaries for the buckets
                 default: "Other",             // Bucket ID for documents which do not fall into a bucket
                 output: {                     // Output for each bucket 
                   "count": { $sum: 1 },
                   "artwork" : { $push: { "title": "$title", "price": "$price" } },
                   "averagePrice": { $avg: "$price" }
                 }
             }
           }
         ],
         "year": [                                      // Output field 2
           { 
             $bucket: {
               groupBy: "$year",                        // Field to group by
               boundaries: [ 1890, 1910, 1920, 1940 ],  // Boundaries for the buckets
               default: "Unknown",                      // Bucket ID for documents which do not fall into a bucket
               output: {                                // Output for each bucket
                 "count": { $sum: 1 },
                 "artwork": { $push: { "title": "$title", "year": "$year" } }
               }
             }
           }
         ]
       }
     }
    ] )
    
    

    MongoPlus构建:

    AggregateWrapper wrapper = new AggregateWrapper();
    // 输出字段1
    AggregateWrapper priceWrapper = new AggregateWrapper();
    priceWrapper.bucket(
            "$price",   // 按字段分组
            Arrays.asList(0, 200, 400), // 桶边界
            new BucketOptions()
                    .defaultBucket("Other")     // 未放入桶中的文档的桶ID
                    .output(
                            Accumulators.sum(),
                            Accumulators.push(
                                    "artwork",
                                    new MongoPlusDocument() {{
                                        put("title", "$title");
                                        put("price", "$price");
                                    }}
                            ),
                            Accumulators.avg("averagePrice", "$price")
                    )
    );
    // 输出字段2
    AggregateWrapper yearWrapper = new AggregateWrapper();
    yearWrapper.bucket(
            "$year",
            Arrays.asList(1890, 1910, 1920, 1940),
            new BucketOptions()
                    .defaultBucket("Unknown")
                    .output(
                            Accumulators.sum(),
                            Accumulators.push(
                                    "artwork",
                                    new MongoPlusDocument() {{
                                        put("title", "$title");
                                        put("price", "$price");
                                    }}
                            )
                    )
    );
    wrapper.facet(
            new Facet("price",priceWrapper),
            new Facet("year",yearWrapper)
    );
    

    第一个分面
    按 price 对输入文档进行分组。存储桶具有以下边界:

    - [0, 200) 包含下边界 0 且不含上边界 200。
    
    - [200, 400) 包含下边界 200 且不含上边界 400。
    
    - “其他”,系指 default 存储桶包含无价格或价格超出上述范围的文档。
    

    $bucket 阶段包括输出文档,用于确定要返回的字段:

    字段 说明
    _id 包括存储桶的下边界。
    count 存储桶中的文档计数。
    artists 包含存储桶中每位artists信息的文档数组。
    averagePrice 利用 $avg 操作符来显示桶中所有artists的平均价格。

    第二分面
    第二个分面按 year 对输入文档进行分组。存储桶具有以下边界:

    - [1890, 1910) ,范围包含下限 1890 但不包含上限 1910 。
    
    - [1910, 1920) ,范围包含下限 1910 但不包含上限 1920 。
    
    - [1920, 1940) ,范围包含下限 1910 但不包含上限 1940 。
    
    - “未知”,系指 default 存储桶包含无年份或年份超出上述范围的文档。
    

    $bucket 阶段包括输出文档,用于确定要返回的字段:

    字段 说明
    count 存储桶中的文档计数。
    artists 包含存储桶中每位artists信息的文档数组。

    输出
    该操作将返回以下文档:

    {
       "price" : [ // Output of first facet
         {
           "_id" : 0,
           "count" : 4,
           "artwork" : [
             { "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
             { "title" : "Dancer", "price" : NumberDecimal("76.04") },
             { "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
             { "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
           ],
           "averagePrice" : NumberDecimal("140.4375")
         },
         {
           "_id" : 200,
           "count" : 2,
           "artwork" : [
             { "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
             { "title" : "Composition VII", "price" : NumberDecimal("385.00") }
           ],
           "averagePrice" : NumberDecimal("332.50")
         },
         {
           // Includes documents without prices and prices greater than 400
           "_id" : "Other",
           "count" : 2,
           "artwork" : [
             { "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
             { "title" : "The Scream" }
           ],
           "averagePrice" : NumberDecimal("483.00")
         }
       ],
       "year" : [ // Output of second facet
         {
           "_id" : 1890,
           "count" : 2,
           "artwork" : [
             { "title" : "Melancholy III", "year" : 1902 },
             { "title" : "The Scream", "year" : 1893 }
           ]
         },
         {
           "_id" : 1910,
           "count" : 2,
           "artwork" : [
             { "title" : "Composition VII", "year" : 1913 },
             { "title" : "Blue Flower", "year" : 1918 }
           ]
         },
         {
           "_id" : 1920,
           "count" : 3,
           "artwork" : [
             { "title" : "The Pillars of Society", "year" : 1926 },
             { "title" : "Dancer", "year" : 1925 },
             { "title" : "The Persistence of Memory", "year" : 1931 }
           ]
         },
         {
           // Includes documents without a year
           "_id" : "Unknown",
           "count" : 1,
           "artwork" : [
             { "title" : "The Great Wave off Kanagawa" }
           ]
         }
       ]
     }