Today I Learned

Use `pluck` to fetch paginated results from S3 client

Some of AWS client calls provide responses with the limited amount of data (typically 1.000 items per response).

Example response may look as follows:

aws_client.list_objects_v2(bucket: bucket)

=> #<struct Aws::S3::Types::ListObjectsV2Output
 is_truncated=true,
 contents=
 [#<struct Aws::S3::Types::Object
    key="reports/report_2.csv",
    last_modified=2019-03-13 14:25:04 UTC,
    etag="\"5a7c05eb47dcd13a27a26d34eb13b0ec\"",
    size=466,
    storage_class="STANDARD",
    owner=nil>,
    ...
 ]
 name="awesome-bucket",
 prefix="",
 delimiter=nil,
 max_keys=1000,
 common_prefixes=[],
 encoding_type=nil,
 key_count=1000,
 continuation_token=nil,
 next_continuation_token="1wEBwtqJOGmZF5DXgu5UhTMv386wdtND0EQzkkOUEGPPeF8tC58BEbfBvfsVHKGnxNgHxvFARrcWdCPJXXgiMzUtpedrxZP2G9wu/0but8ALLHDGdZVD4OHb41DWQKocGGAOwr0wfOeN4hUoCzimKeA==",
 start_after=nil>

Because list_objects_v2 method takes continuation_token as an argument, one of the solutions to fetch all the records may be to loop through the responses using next_continuation_token until the next_continuation_token field is empty.

Instead, you can use the built-in enumerator in the response object, which will return results from all the pages (next pages will be fetched automatically by SDK):

aws_client.list_objects_v2(bucket: bucket).map { |page| page[:contents] }

=> [[#<struct Aws::S3::Types::Object
   key="reports/report_2.csv",
   last_modified=2019-03-13 14:25:04 UTC,
   etag="\"5a7c05eb47dcd13a27a26d34eb13b0ec\"",
   size=466,
   storage_class="STANDARD",
   owner=nil>,
  #<struct Aws::S3::Types::Object
   key="reports/report_1.csv",
   last_modified=2019-03-13 13:43:30 UTC,
   etag="\"dc7215c066f62c7ddedef78e123dbc7c\"",
   size=191722,
   storage_class="STANDARD",
   owner=nil>,
   ... ]

However, there is even simpler solution to achieve the same result. You can use pluck method as follows:

aws_client.list_objects_v2(bucket: bucket).pluck(:contents)