2. OpenERP can handle large
volumes of transactions and
large volumes of data
out of the box!
3. For example, each OpenERP
Online Server hosts 1000+
databases without
breaking a sweat!
4. Many on-site customers have
single server deployments
with millions of rows:
partners, emails, attachments,
journal items, stock moves,
workflow items, …
11. t @odony
Hardware Sizing
● Typical modern server machine
● 4/8/12 2+GHz cores
● 8-64 GB RAM
● Fast SATA/SAS/SSD disks
● Up to 100-200 active users (multi-process)
● Up to dozens of HTTP requests per second
● Up to 1000 “light” users (average SaaS user)
● For official OpenERP 7.0 deployments with no
customizations, and typical usage!
12. For anything else, always
perform proper load testing
before going live in production!
Then size accordingly...
13. t @odony
PostgreSQL Deployment tips
● Avoid deploying PostgreSQL on a VM
● If you must do it, fine-tune the VM for I/O!
● And always check out the basic PostgreSQL
performance tuning, it's conservative by
default
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
15. t @odony
Monitor application response times
●
You can't manage/improve what you can't
measure
●
Setup an automated monitoring of performance
and response time... even if you have no
performance problem!
●
Suggested tool: munin
●
Run OpenERP with –log-level=debug_rpc in prod!
2013-07-03 00:12:29,846 9663 DEBUG test openerp.netsvc.rpc.request:
object.execute_kw time:0.031s mem: 763716k -> 763716k (diff: 0k)('test', 1,
'*', 'sale.order', 'read', (...), {...})
16. t @odony
#!/bin/sh
#%# family=manual
#%# capabilities=autoconf suggest
case $1 in
autoconf)
exit 0
;;
suggest)
exit 0
;;
config)
echo graph_category openerp
echo graph_title openerp rpc request count
echo graph_vlabel num requests/minute in last 5 minutes
echo requests.label num requests
exit 0
;;
esac
# watch out for the time zone of the logs => using date -u for UTC timestamps
result=$(tail -60000 /var/log/openerp.log | grep "object.execute_kw time" | awk "BEGIN{count=0} ($1 " "
$2) >= "`date +'%F %H:%M:%S' -ud '5 min ago'`" { count+=1; } END{print count/5}")
echo "requests.value ${result}"
exit 0
Munin plugin: OpenERP requests/minute
17. t @odony
#!/bin/sh
#%# family=manual
#%# capabilities=autoconf suggest
case $1 in
config)
echo graph_category openerp
echo graph_title openerp rpc requests min/average response time
echo graph_vlabel seconds
echo graph_args --units-exponent -3
echo min.label min
echo min.warning 1
echo min.critical 5
echo avg.label average
echo avg.warning 1
echo avg.critical 5
exit 0
;;
esac
# watch out for the time zone of the logs => using date -u for UTC timestamps
result=$(tail -60000 /var/log/openerp.log | grep "object.execute_kw time" | awk "BEGIN{sum=0;count=0} (
$1 " " $2) >= "`date +'%F %H:%M:%S' -ud '5 min ago'`" {split($8,t,":");time=0+t[2];if (min=="") { min=time};
sum += time; count+=1; min=(time>min)?min:time } END{print min, sum/count}")
echo -n "min.value "
echo ${result} | cut -d" " -f1
echo -n "avg.value "
echo ${result} | cut -d" " -f2
exit 0
Munin plugin: OpenERP min/avg response time
18. t @odony
#!/bin/sh
#%# family=manual
#%# capabilities=autoconf suggest
case $1 in
config)
echo graph_category openerp
echo graph_title openerp rpc requests max response time
echo graph_vlabel seconds
echo graph_args --units-exponent -3
echo max.label max
echo max.warning 1
echo max.critical 5
exit 0
;;..
esac
# watch out for the time zone of the logs => using date -u for UTC timestamps....
result=$(tail -60000 /var/log/openerp.log | grep "object.execute_kw time" | awk "BEGIN{sum=0;count=0} (
$1 " " $2) >= "`date +'%F %H:%M:%S' -ud '85 min ago'`" {split($8,t,":");time=0+t[2]; sum += time; count+=1;
max=(time<max)?max:time } END{print max}")
echo "max.value ${result}"
exit 0
Munin plugin: OpenERP max response time
19. t @odony
Monitor PostgreSQL
● postgresql.conf
● log_min_duration_statement = 50
● Set to 50 or 100 in production
● Set to 0 to log all queries and execution times for a while
● Instagram gist to capture sample + analyze
● Analyze with pgBadger or pgFouine
●
lc_messages = 'C'
20. t @odony
PostgreSQL Analysis
● Important PG statistic tables
● pg_stat_activity: near real-time view of transactions
● pg_locks: real-time view of existing locks
● pg_stat_user_tables: generic usage stats for all tables
● pg_statio_user_tables: generic I/O stats for all tables
22. t @odony
PostgreSQL Analysis: biggest tables
# SELECT nspname || '.' || relname AS "table",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 10;
┌──────────────────────────────────────────┬────────────┐
│ table │ total_size │
├──────────────────────────────────────────┼────────────┤
│ public.stock_move │ 525 MB │
│ public.wkf_workitem │ 111 MB │
│ public.procurement_order │ 80 MB │
│ public.stock_location │ 63 MB │
│ public.ir_translation │ 42 MB │
│ public.wkf_instance │ 37 MB │
│ public.ir_model_data │ 36 MB │
│ public.ir_property │ 26 MB │
│ public.ir_attachment │ 14 MB │
│ public.mrp_bom │ 13 MB │
└──────────────────────────────────────────┴────────────┘
23. t @odony
PostgreSQL Analysis: biggest tables
●
Consider using the file storage for the
ir.attachment table
●
Avoid storing files in the database
●
Greatly reduces the time needed for DB backups
and backup
●
Very easy to rsync backups of DB dumps +
filestore
●
For 7.0 this setting is explained in this FAQ
26. t @odony
Useful VIEW to watch locked queries
-- For PostgreSQL 9.1
CREATE VIEW monitor_blocked_queries AS
SELECT
pg_class.relname,
waiter.pid as blocked_pid,
substr(wait_act.current_query,1,30) as blocked_statement,
age(now(),wait_act.query_start) as blocked_duration,
holder.pid as blocking_pid,
substr(hold_act.current_query,1,30) as blocking_statement,
age(now(),hold_act.query_start) as blocking_duration,
waiter.transactionid as xid,
waiter.mode as wmode,
waiter.virtualtransaction as wvxid,
holder.mode as hmode,
holder.virtualtransaction as hvxid
FROM pg_locks holder join pg_locks waiter on (
holder.locktype = waiter.locktype and (
holder.database, holder.relation,
holder.page, holder.tuple,
holder.virtualxid,
holder.transactionid, holder.classid,
holder.objid, holder.objsubid
) IS NOT DISTINCT from (
waiter.database, waiter.relation,
waiter.page, waiter.tuple,
waiter.virtualxid,
waiter.transactionid, waiter.classid,
waiter.objid, waiter.objsubid
))
JOIN pg_stat_activity hold_act ON (holder.pid=hold_act.procpid)
JOIN pg_stat_activity wait_act ON (waiter.pid=wait_act.procpid)
LEFT JOIN pg_class ON (holder.relation = pg_class.oid)
WHERE
wait_act.datname = 'eurogerm' AND
holder.granted AND NOT waiter.granted
ORDER BY blocked_duration DESC;
28. t @odony
Useful tool for watching activity: pg_activity
top-like command-line utility to watch queries: running,
blocking, waiting
→ pip install pg_activity
Thanks to @cmorisse for this pointer! :-)
29. t @odony
Useful VIEW to watch Locks per transaction
# – For PostgreSQL 9.1
# CREATE VIEW monitor_locks AS
SELECT pg_stat_activity.procpid, pg_class.relname, pg_locks.locktype,
pg_locks.transactionid, pg_locks.virtualxid,
pg_locks.virtualtransaction, pg_locks.mode, pg_locks.granted,
pg_stat_activity.usename,
substr(pg_stat_activity.current_query,1,30) AS query,
pg_stat_activity.query_start, age(now(),pg_stat_activity.query_start)
AS duration
FROM pg_stat_activity, pg_locks
LEFT JOIN pg_class ON pg_locks.relation = pg_class.oid
WHERE pg_locks.pid = pg_stat_activity.procpid AND
pg_stat_activity.procpid != pg_backend_pid()
ORDER BY pg_stat_activity.procpid, pg_locks.granted, pg_class.relname;
31. t @odony
Useful VIEW to watch Locks per transaction
| mode | granted | query | query_start | duration |
+------------------+---------+--------------------------------+-------------------------------+------------------+
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| RowExclusiveLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| RowShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| RowExclusiveLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| RowShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
| AccessShareLock | t | <IDLE> in transaction | 2013-06-18 12:53:01.601039+02 | 00:00:00.278826 |
32. t @odony
Normal Values?
● Most RPC requests should be under 200ms
● Most SQL queries should be under 100ms
● One transaction = 100-300 heavyweight locks
Find your own normal values via monitoring!
35. t @odony
Common Problems: Stored Functions
● Stored functional fields are triggers
● Store triggers can be:
●
store = { 'trigger_model': (mapping_function,
['trigger_field1', 'trigger_field2'],
priority) }
● store=True meaning:
self._name (lambda s,c,u,ids,c: ids, None, 10)}→
● Can be very expensive with wrong parameters or
slow functions
36. t @odony
Common Problems: Slow Queries
●
All SQL queries 500ms+ should be analyzed
●
Use EXPLAIN ANALYZE to examine/measure you custom
SQL queries and VIEWs
●
Try to remove parts of the query until it's fast, then fix it
●
Check cardinality of big JOINs
●
Default domain evaluation strategy
●
search([('picking_id.move_ids.partner_id', '!=', False)])
●
Implemented by combining “id IN (….)” parts
●
Have a look at _auto_join in OpenERP 7.0
'move_ids': fields.one2many('stock.move', 'picking_id',
string='Moves', _auto_join=True)
37. t @odony
Common Problems: Slow Queries
● No premature optimization: don't write SQL,
use the ORM always during initial
development
● If you detect a hot spot with load-tests,
consider rewriting the inefficient parts in SQL
● But:
● Make sure you're not bypassing security mechanisms
● Don't create SQL injection vectors use query parameters,→
don't concatenate user input in your SQL strings.
38. t @odony
Common Problems: Lock Contention
●
PostgreSQL guarantees transactional data integrity by
taking heavy-weight locks → monitor_locks
●
Updating a record blocks all FK locks on it until the
transaction is completed!
●
This will change with PostgreSQL 9.3 :-)
●
This is independent from the transaction isolation level
(Repeatable Read/Serializable/...)
→ Don't have long-running transactions!
→ Avoid updating “master data” resources in them!
(user, company, stock location, product, …)
39. t @odony
Common Problems: Custom Locking
●
Any kind of manual locking/queuing mechanism
is dangerous, especially in Python
●
Python locks can cause deadlocks that cannot
be detected and broken by the system!
●
Avoid it, and if you must, use the database as
lock
●
That's what scheduled jobs (ir.cron) do:
●
SELECT FOR UPDATE on the cron job row
●
→ Automatic cleanup/release
●
→ Scales well and works in multi-process!
41. t @odony
Avoid Anti-Patterns: Master the framework!
●
Make sure you really understand the browse()
mechanisms!
●
Make sure you properly use the batch API
●
Don't write SQL unless you have to, e.g for:
●
Analysis views
●
Hot spot functions, name_search(), computed_fields(), ...
43. t @odony
Anti-Patterns: what's wrong now?
browse() use is OK now, but the related field is dangerous
and costly, going through a possibly very large o2m just to find
a single product ID
44. t @odony
Anti-Patterns: what's wrong here?
The trigger on stock.move is for all fields which means it will
trigger for each change, while we only care about tracking_id
here