Improving Performance

From reSIProcate
Revision as of 22:49, 31 January 2011 by Fjoanis (talk | contribs) (Created page with '==Introduction== This page was created following a [ discussion] (started by Dan Weber) regarding the feasibil…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


This page was created following a discussion (started by Dan Weber) regarding the feasibility of scaling repro into the million of users range. Although the original topic of the discussion was about more than performance alone (it also touched reliability, security, ...), this page currently focuses on how to improve the performance of the stack and DUM.

The purpose of this page is to consolidate the ideas and experiences put forth in that discussion and to make them grow.

Areas of Interest

Transport Event Selection

  • Move away from select() and into each platform's best IO selection API (epoll, ...). This will help a lot for TCP/TLS and will also enable the handling of more than 1024 sockets
  • Leverage libevent so that the platform IO selection API (pre notification) is abstracted (should provide best API per platform). Can libevent also provide kernel-level async?
  • Leverage asio to move to an asynchronous IO post-notification model
  • Consider using kernel-level asynchronous IO APIs (rather than current synchronous ones) to be able to handle multiple events at the same time (asio might provide this)

Object Allocation and Destruction

  • Use different memory allocator than new() (tcmalloc, ..., maybe a memory pool?)
  • Object size could be reduced (SipMessage)
  • Current object memory consumption is large and causes fragmentation (performance hit)


  • Current implementation of object streaming is costly and could be improved (boost::spirit::karma could help and some work had been done with resip faststream)

Multicore/Multi CPU and Threading

  • Current model (stack and DUM threading) doesn't leverage multicore a lot
  • Adding more threads does not always improve performance (especially when inter thread comm. is done base on locking)
  • Trying to put the transports into their own thread didn't seem to pay
  • Investigate lock-free FIFOs between stack and DUM to see how performance improves on multicore
  • Threading the transaction layer did improve the performance (Dan has an implementation of hash-based concurrency in svn)
  • DUM's performance could be improved using Software Transactional Memory