Announcement

Collapse
No announcement yet.

How to handle deletion of great number of lines in a SQL table.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to handle deletion of great number of lines in a SQL table.

    Problem
    In order to keep in a stable state the sql server has to record every line it deletes with all the values and it does so by writing the deleted lines into the log file. This enables it to return to a defined state if the deletions will be interrupted. Just imagine a power of szenario. When SQL Server will start again it can decide to roll back the whole transaction or to roll in forward as he has the data to do so still in the logfile. That is why deleting lines is an action in which the SQL Server always has to write data. So if some on tries to delete millions of lines in one transaction SQL Server may exceed the hardware resource limits especially available disk space and memory.

    Solution
    To avoid system crash the deletion should be divided into transaction that can be handled by the system. In order to do so one can work with chunks or portions. To limit a data manipulation statement the top (<number>) clause ist he right way. The idea is to run deletion of a number of lines in a loop until all lines are deleted that meet given criteria mostly set by where clauses. During the deletion process it is recommended to have a look at the ratio of deleted lines and the time it took to delete them, namely lines per second. Different chunk sizes will have different lines per second. So we should keep track of all the deletions done in the while loop. To do so we will need a logfile containing each step and the values to get that ratio lines per second. Additionally we should build in a way to change the cunck size on the run.

    This is done by the following script
    -- set up log file it it does not exists
    if object_id ('delete_record3_log') is null
    create table delete_record3_log (start datetime2(3) not null, stop datetime2(3), chunk int, duration bigint, NofRows int, ratio float)

    -- write the chunk size to start with in the control line of the log file.
    -- The control line is defined by a value of 0 in the start field and all other fields left with NULL except the chunk size

    -- set up a control line if it does not yet exist and initialize the chunk size
    if (select count(1) from delete_record3_log) = 0
    insert into delete_record3_log (start, chunk) select '19000101', 1000

    -- declaration of variables needed to get the ratio
    declare @chunk int
    declare @rows int = 1 -- @rows is set to 1 in order to start the while loop
    declare @duration int
    declare @ratio float
    declare @start datetime2(3)
    declare @stop datetime2(3)



    while @rows > 0 begin -- do this until all records defined by the where clause are deleted
    select @chunk = chunk from delete_record3_log where start = '19000101' -- get chunk size from control line
    set @start = sysdatetime()
    delete top (@chunk) from DataRecord3
    where timestamp<getdate()-60
    set @rows = @@ROWCOUNT
    set @stop = sysdatetime()
    set @duration = datediff (ms, @start, @stop)
    if @duration > 0
    set @ratio = @rows*1000.0/@duration

    insert into delete_record3_log
    select @start, @stop, @chunk, @duration, @rows, @ratio -- write the data of the step into the log file.
    end

    To get the number of lines to be deleted and the number of lines that should remain in the table you can work with count (1) or count(*)
    select count(1) from DataRecord3 where TimeStamp < GetDATE()-60
    select count(1) from DataRecord3 where TimeStamp > GetDATE()-60

    To look into the log file and in order to change the chunk size we will open an additional query window to observe how our deletion statements proceed.
    select * from delete_record3_log order by start desc -- list the outcome of all steps
    select * from delete_record3_log where start = '19000101' -- provides the chunk size in use
    update delete_record3_log set chunk = <new-value> -- change the chunk size
    where start = '19000101'

    -- get a list that shows which chunk provided the best ratio
    select chunk, avg (ratio) avg_ratio, count (1) NofSteps from delete_record3_log group by chunk order by 2 desc

    Using that technique you will be able to delete as many lines you want with out exceeding the ressource limits. You can use the log file we created in our session or delete it to start again from scratch. Be aware that the chunk size is set to the initial value in the script. Feel free to start with another value than 1000 as we have seen that the system is able to handle much greater chunk sizes.
Working...
X