Today I Learned

7 posts about #sql

pre-filtering table to speed up query in MS SQL

In one of my tasks I had to edit enormous query that is responsible for displaying complicated table on one of our views. As a part of this task I had to join table and take the first element of it to perform further calculation. The solution looks somehting like this:

# lots of sql here...

LEFT JOIN (
  SELECT
    id,
    post_id,
    ROW_NUMBER() OVER (
      PARTITION BY post_id
      ORDER BY id DESC
    ) row_num
  FROM
    comments
  WHERE
    deleted_at IS NULL
) c ON c.post_id = lctx.post_id AND row_num = 1
LEFT JOIN ...

# lots of sql here...

Unfortunately, this happen to be quite slow because of the enormous number of data we were working on.

So by changing the offending query to specifically pre-filtering the posts table I was able to get the query from 15 seconds to 3 (in the biggest account -> which was acceptable):

DECLARE @comments_sub TABLE (id INT, post_id INT )
 
INSERT INTO @comments_sub (id, post_id)
  SELECT c.id, c.bond_id
  FROM (
    SELECT
      id,
      lcc.post_id,
      ROW_NUMBER() OVER (
        PARTITION BY c.bond_id
        ORDER BY id DESC
    ) row_num  
  FROM comments c
  INNER JOIN lc_comments lcc ON c.post_id = lcc.post_id
  WHERE deleted_at IS NULL) c
  WHERE c.row_num = 1

and then in query instead of the first snipped we can just use:

# lots of sql here...

LEFT JOIN @comments_sub c on c.post_id = lctx.post_id
LEFT JOIN ...

# lots of sql here...

Postgres timezone-aware fields don't store zone info

Let’s assume you have 2 columns in your database: time_without_tz and time_with_tz. You can now add some data:

INSERT INTO timestamps (time_without_tz, time_with_tz)
VALUES ('2019-10-19 10:30:00+02', '2019-10-19 10:30:00+02');

You can now query data with following output:

time_without_tz     | time_with_tz
--------------------+-----------------------
2019-10-19 10:30:00 | 2019-10-19 10:30:00+02

As you can see, the time_with_tz column returned the timestamp with +02 offset.

What’s really important here: Postgres do not store any information about time zone at all. If you pass the time zone information when you write the data - it is only used by Postgres to convert it to UTC. When querying, all timezone-aware fields are converted from the UTC value back to the local value (based on the timezone set by the database server). There is no way to retrieve a timezone that was used to save the record in a database - this information is gone.

It may be an obvious fact for most of you, but still, I think it’s worth noticing.

Sources: 1, 2

PosgreSQL "DROP DATABASE.." RTFM

I have had over 200 databases to move from server to a new one. When a testing “import” succeeded I was ready for a final-final production move.

But I had to clean up the server first and remove all existing databases from “the testing” stage, I even created a SQL function:

DO $$ DECLARE
    db RECORD;
BEGIN
    FOR db IN (SELECT datname FROM pg_database where datdba != 10 AND datdba != 16385 LOOP
        EXECUTE 'DROP DATABASE IF EXISTS ' || quote_ident(db.tablename) ;
    END LOOP;
END $$;

but it did not work as PostgreSQL does not allow to drop a database from a function 😂

Finally, I’ve used a shell script to trigger DROP DATABASE.

integer limit is adjustable in activerecord migration

create_table 'example' do |t|
  t.integer :int                 # int (4 bytes, max 2,147,483,647)
  t.integer :int1, :limit => 1   # tinyint (1 byte, -128 to 127)
  t.integer :int2, :limit => 2   # smallint (2 bytes, max 32,767)
  t.integer :int3, :limit => 3   # mediumint (3 bytes, max 8,388,607)
  t.integer :int4, :limit => 4   # int (4 bytes)
  t.integer :int5, :limit => 5   # bigint (8 bytes, max 9,223,372,036,854,775,807)
  t.integer :int8, :limit => 8   # bigint (8 bytes)
  t.integer :int11, :limit => 11 # int (4 bytes)
end