I performed one more testing of this subject and found that results are quite interesting. RAID5 7+1 on High-End disk system was used as dedicated device for redo log files (RH Linux 5.5, redo filesystem formatted to ext2, Oracle 11.2.0.3)
Load generator script is simple and does commit after each (!) insert.
create table &1 (id number, v varchar2(2000)) tablespace users;
BEGIN
FOR i IN 1 .. 100000000 LOOP
IF MOD (i, 10000) = 0 THEN
EXECUTE IMMEDIATE 'truncate table &1 drop storage';
END IF;
INSERT INTO &1
VALUES (
i,
'111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111');
COMMIT;
END LOOP;
END;
/
I got roughly 43MBS of redo with 2 concurrent processes running. await is amazing ~0.55msec! All writes are served by the cache of the storage array. All redo related wait events are negligible.
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
dm-12 0.00 0.00 0.00 1675.80 0.00 52.79 64.52 0.90 0.54 0.53 89.19
dm-12 0.00 0.00 0.00 1835.00 0.00 54.10 60.38 0.94 0.51 0.51 92.69
dm-12 0.00 0.00 0.00 1619.60 0.00 52.02 65.78 0.93 0.57 0.57 92.02
dm-12 0.00 0.00 0.00 1645.40 0.00 50.96 63.43 0.94 0.57 0.56 92.72
dm-12 0.00 0.00 0.00 1408.79 0.00 49.82 72.42 0.93 0.66 0.64 90.36
...
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
Redo size: 43,526,576.9 2,291.7
Logical reads: 117,138.2 6.2
Block changes: 124,364.1 6.6
Physical reads: 2.1 0.0
Physical writes: 9,053.5 0.5
Executes: 19,003.3 1.0
Transactions: 18,993.4
...
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU 188 99.4
log buffer space 22 1 34 .4 Configurat
Data file init write 516 0 0 .1 User I/O
log file switch completion 20 0 4 .0 Configurat
log file sync 2 0 39 .0 Commit
7 parallel processes produce 86MB of redo per second. avgrq-sz rose by 10 times, hense await rose to 5msec as well. I have no idea why :(
Redo waits now ~35% of total waits.
Redo waits now ~35% of total waits.
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
dm-12 0.00 0.00 0.00 325.90 0.00 103.96 653.29 1.77 5.44 2.81 91.57
dm-12 0.00 0.00 0.00 365.70 0.00 108.86 609.61 1.82 4.97 2.48 90.77
dm-12 0.00 0.00 0.00 343.80 0.00 107.91 642.84 1.77 5.14 2.67 91.80
dm-12 0.00 0.00 0.00 370.03 0.00 119.00 658.65 1.76 4.76 2.44 90.39
dm-12 0.00 0.00 0.00 362.60 0.00 113.42 640.62 1.76 4.86 2.50 90.48
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
Redo size: 86,854,088.7 2,312.4
Logical reads: 243,571.1 6.5
Block changes: 251,819.1 6.7
Physical reads: 40.0 0.0
Physical writes: 20,626.0 0.6
Executes: 37,959.4 1.0
Transactions: 37,560.4
....
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU 2,053 58.6
log buffer space 36,487 858 24 24.5 Configurat
local write wait 4,956 343 69 9.8 User I/O
enq: RO - fast object reuse 3,140 138 44 3.9 Applicatio
enq: HW - contention 16,982 85 5 2.4 Configurat
Conclusion: we don’t need SSD devices for redo on our storage array but entry level arrays and standalone servers should be tested.
Hi Alex,
ReplyDeleteKudos for testing your redo device.
While you reached the right conclusion (no need for SSD), the data you show can't be used to reach this conclusion...
First of all, you ran inserts in a tight PL/SQL loop, once with 2 concurrent processes and once with 7 concurrent processes. Are you sure this represents your production workload?
You didn't post traces, but I suspect that most of the CPU time was used for PL/SQL looping. Since the system was CPU starved the entire time, you did not test the disks at all. Note that there is no "log file sync" wait event in either of your tests, so the writing of the redo log was never an issue.
If this is really your workload, then even your high end array is an overkill for the redo logs, but in reality, you just never stressed the disk system, so its hard to tell.
Second thing that looks suspicious is that in the second test, 25% of the wait time is spent waiting for redo space. I'd allocate a larger redo buffer, to get rid of this event and maybe finally see "redo log sync" as a wait event.
Third thing that jumps to mind is the "local write wait". This event is extremely rare to see in production, and shows that significant amount of the waiting time in test2 can be attributed to flushing blocks to disk after truncate. You truncated a large table, with most of the blocks in memory, from several session concurrently. This is definitely not real-world load.
From your iostat results, it looks like the data and redo are on the same disks? In this case, it means that its difficult to tell what you are really seeing in iostat - is it the writes of lgwr or dbwr?
Thanks again for posting detailed results. You definitely got some interesting waits there and is worth more investigation.
Thank you for your feedback!
ReplyDeleteYes, in real production the workload would differ because of reads for ARCH and another application logic. Yes, it is possible to optimize the second test.
But the main point is to show, that it is possible to achieve high redo bandwidth and excellent service time for redo even on RAID5…
> Since the system was CPU starved the entire time
CPU usage was ~30..40% for the second test. 16 physical cores in total.
> it looks like the data and redo are on the same disks?
No. Redos are on separate RAID5 LUN.
Do you have any idea why Oracle use larger writes in 2d test?
Oracle batches multiple transactions into larger writes. http://www.solarisinternals.com/si/reading/oracle_fsperf.pdf describes in detail how solaris deals with it (or did 10 years ago, anyways). So all those parallel writes are bundled together.
Deleteword: acssest