Today I Learned

3 posts by łukaszlapczyk

Postgres timezone-aware fields don't store zone info

Let’s assume you have 2 columns in your database: time_without_tz and time_with_tz. You can now add some data:

INSERT INTO timestamps (time_without_tz, time_with_tz)
VALUES ('2019-10-19 10:30:00+02', '2019-10-19 10:30:00+02');

You can now query data with following output:

time_without_tz     | time_with_tz
--------------------+-----------------------
2019-10-19 10:30:00 | 2019-10-19 10:30:00+02

As you can see, the time_with_tz column returned the timestamp with +02 offset.

What’s really important here: Postgres do not store any information about time zone at all. If you pass the time zone information when you write the data - it is only used by Postgres to convert it to UTC. When querying, all timezone-aware fields are converted from the UTC value back to the local value (based on the timezone set by the database server). There is no way to retrieve a timezone that was used to save the record in a database - this information is gone.

It may be an obvious fact for most of you, but still, I think it’s worth noticing.

Sources: 1, 2

Long polling in BroadwaySQS

By default BroadwaySQS doesn’t wait for a message to arrive if no messages are available in the queue. Also, it has a default 5 seconds duration for which the producer waits before making a request for more messages. This may result in few second’s gap between enqueuing a message and processing it. For many cases this is good enough, but sometimes one may need a message to be processed immediately after enqueueing. In such cases you should consider using long polling.

To achieve it in BroadwaySQS just set wait_time_seconds to e.g. 10 seconds (or any value from 1 to 20) and receive_interval to 0 in producer’s options.

Example:

def start_link(_opts) do
  producer = Application.get_env(:scanner, :broadway_producer, BroadwaySQS.Producer)
  queue_url = Application.get_env(:ex_aws_sqs, :queue_url)

  Broadway.start_link(__MODULE__,
    name: __MODULE__,
    producers: [
      default: [
        module: {
          producer,
          queue_url: queue_url, wait_time_seconds: 20, receive_interval: 0
        },
        stages: 1
      ]
    ],
    processors: [
      default: [stages: 50]
    ]
  )
end

Use `pluck` to fetch paginated results from S3 client

Some of AWS client calls provide responses with the limited amount of data (typically 1.000 items per response).

Example response may look as follows:

aws_client.list_objects_v2(bucket: bucket)

=> #<struct Aws::S3::Types::ListObjectsV2Output
 is_truncated=true,
 contents=
 [#<struct Aws::S3::Types::Object
    key="reports/report_2.csv",
    last_modified=2019-03-13 14:25:04 UTC,
    etag="\"5a7c05eb47dcd13a27a26d34eb13b0ec\"",
    size=466,
    storage_class="STANDARD",
    owner=nil>,
    ...
 ]
 name="awesome-bucket",
 prefix="",
 delimiter=nil,
 max_keys=1000,
 common_prefixes=[],
 encoding_type=nil,
 key_count=1000,
 continuation_token=nil,
 next_continuation_token="1wEBwtqJOGmZF5DXgu5UhTMv386wdtND0EQzkkOUEGPPeF8tC58BEbfBvfsVHKGnxNgHxvFARrcWdCPJXXgiMzUtpedrxZP2G9wu/0but8ALLHDGdZVD4OHb41DWQKocGGAOwr0wfOeN4hUoCzimKeA==",
 start_after=nil>

Because list_objects_v2 method takes continuation_token as an argument, one of the solutions to fetch all the records may be to loop through the responses using next_continuation_token until the next_continuation_token field is empty.

Instead, you can use the built-in enumerator in the response object, which will return results from all the pages (next pages will be fetched automatically by SDK):

aws_client.list_objects_v2(bucket: bucket).map { |page| page[:contents] }

=> [[#<struct Aws::S3::Types::Object
   key="reports/report_2.csv",
   last_modified=2019-03-13 14:25:04 UTC,
   etag="\"5a7c05eb47dcd13a27a26d34eb13b0ec\"",
   size=466,
   storage_class="STANDARD",
   owner=nil>,
  #<struct Aws::S3::Types::Object
   key="reports/report_1.csv",
   last_modified=2019-03-13 13:43:30 UTC,
   etag="\"dc7215c066f62c7ddedef78e123dbc7c\"",
   size=191722,
   storage_class="STANDARD",
   owner=nil>,
   ... ]

However, there is even simpler solution to achieve the same result. You can use pluck method as follows:

aws_client.list_objects_v2(bucket: bucket).pluck(:contents)