Using Teigha Low-Level and High-Level Multithreading APIs Together

Andrew Markovich

April 12, 2018

Introduction

Complex multithreaded solutions can combine Teigha low-level and high-level multithreaded APIs together. This flexibility allows you to build powerful multithreaded solutions in a cross-platform way and can significantly simplify porting third-party multithreaded solutions to Teigha-based applications.

The high-level API (Thread Pool Services provided by Teigha Kernel) is built on top of the Teigha Kernel low-level API. These APIs were described previously:

ThreadPool.tx module and ThreadsCounter singleton

After loading the Thread Pool Services module, you don’t need to pass the OdRxThreadPoolService pointer every time you need to call something from it:

// Load thread pool module
OdRxThreadPoolServicePtr pThreadPool = ::odrxDynamicLinker()->loadApp(OdThreadPoolModuleName);
if (pThreadPool.isNull())
  throw OdError(eNullPtr); // ThreadPool.tx not found.

// . . .

OdRxThreadPoolService *pThreadPool = odThreadsCounter().getThreadPoolService();

The getThreadPoolService() method of the ThreadsCounter singleton can be called to access the loaded module interface at any point in the code. This pointer is available while the Thread Pool Services module is alive and isn’t unloaded.

Alternative methods for registering external threads

As described in the Teigha Multithreading Low-Level API article series, you can always call the ThreadsCounter::increase method to register external threads in Teigha and the ThreadsCounter::decrease method to unregister externally started threads:

odThreadsCounter().increase(1, (unsigned int*)&threadId, threadAttributes);
// . . .
odThreadsCounter().decrease(1, (unsigned int*)&threadId);

Alternatively you can use the OdRxThreadPoolService::registerExternalThreads and OdRxThreadPoolService::unregisterExternalThreads methods of the Thread Pool Services module:

m_pThreadPool->registerExternalThreads(1, (unsigned int*)&threadId, threadAttributes);
// . . .
m_pThreadPool->unregisterExternalThreads(1, (unsigned int*)&threadId);

Or from the Thread Pool Services module stored inside the Threads Counter singleton:

if (odThreadsCounter().getThreadPoolService())
  odThreadsCounter().getThreadPoolService()->registerExternalThreads(1, (unsigned int*)&threadId, threadAttributes);
// . . .
if (odThreadsCounter().getThreadPoolService())
  odThreadsCounter().getThreadPoolService()->unregisterExternalThreads(1, (unsigned int*)&threadId);

Actually, all of these methods do the same job. The OdRxThreadPoolService::registerExternalThreads and OdRxThreadPoolService::unregisterExternalThreads methods internally call the ThreadsCounter::increase and ThreadsCounter::decrease methods of the Threads Counter singleton.

Advantages of using the Thread Pool Services module

In operating systems, threads are atomic objects. When you create a new thread, the operating system requires additional time to allocate required system resources, and initialize and prepare the thread for running. Teigha Thread Pool Services always keeps some preallocated threads in a paused state to reuse them, so running new tasks in Teigha threads is much faster because they don’t require additional resource allocation and thread preparation — they simply need to be un-paused to execute a new task inside the thread.

OdApcThread objects

The previous article series Teigha Multithreading Low-Level API shows how client applications can enable communication between the Teigha API and external threads that are based on other APIs (often third-party APIs). We demonstrated this with running Windows API threads, but this is far from a cross-platform solution. The Thread Pool Services module already provides a cross-platform wrapper for operating system threads. These objects of the class OdApcThread can be used by client applications with any other operating system thread implementations. New threads take from the pool (see previous section), so they don’t require long preparations before running thread jobs. This means that there is no reason to cache created OdApcThread objects on the client side; this does not give any performance bonus since OdApcThread objects are already cached on the Thread Pool Services side.

Simple example of using Teigha threads

To demonstrate using OdApcThread objects inside a working example, we will use the example from Teigha Multithreading Low-Level API. In the Teigha Multithreading High-Level API article, we invoked OdApcQueue objects to run separate tasks in a set of threads. As a demonstration, we create a simple queue object to store a set of running threads, which will help us provide a single wait method for a set of running threads since we require handling the results of multithread tasks only when all data processing is completed:

// Simple multithread queue
class SimpleMultiThreadsQueue
{
  OdRxThreadPoolService *m_pThreadPool;
  OdArray<OdApcThreadPtr> m_runningThreads;
  public:
    SimpleMultiThreadsQueue(OdRxThreadPoolService *pThreadPool) : m_pThreadPool(pThreadPool) {}
    ~SimpleMultiThreadsQueue() { wait(); }

    void runNewThread(OdApcEntryPointVoidParam runFcn, OdApcParamType fcnArg, OdUInt32 threadAttributes)
    {
      m_runningThreads.push_back(m_pThreadPool->newThread());
      unsigned int threadId = m_runningThreads.last()->getId();
      m_pThreadPool->registerExternalThreads(1, (unsigned int*)&threadId, threadAttributes);
      m_runningThreads.last()->asyncProcCall(runFcn, fcnArg);
    }
    void wait()
    {
      OdArray<unsigned int, OdMemoryAllocator<unsigned int> > threadIds;
      while (!m_runningThreads.isEmpty())
      {
        m_runningThreads.last()->wait();
        threadIds.push_back(m_runningThreads.last()->getId());
        m_runningThreads.removeLast();
      }
      if (!threadIds.isEmpty())
        m_pThreadPool->unregisterExternalThreads(threadIds.size(), threadIds.getPtr());
    }
};

Our SimpleMultiThreadsQueue class not only stores the set of running threads and provides the wait() method, it additionally calls the OdRxThreadPoolService::registerExternalThreads and OdRxThreadPoolService::unregisterExternalThreads methods to register and unregister our externally started threads. Actually this is what OdApcQueue objects do internally. Now we can create our SimpleMultiThreadsQueue object inside the main example function:

// Load thread pool module
OdRxThreadPoolServicePtr pThreadPool = ::odrxDynamicLinker()->loadApp(OdThreadPoolModuleName);
if (pThreadPool.isNull())
  throw OdError(eNullPtr); // ThreadPool.tx not found.

// Create simple windows threads manager
SimpleMultiThreadsQueue mThreadQueue(pThreadPool);

We also require a small redesign in the RenderDbToImageCaller and ProcessImageCaller classes to run the threads using the SimpleMultiThreadsQueue object instead of the SimpleWinThreadsPool object from the previous article:

// Thread running method implementation
class RenderDbToImageCaller : public OdRxObject
{
  OdString m_inputFile;
  OdGiRasterImagePtr *m_pOutputImage;
  RenderDbToImageContext *m_pThreadCtx;
  public:
    RenderDbToImageCaller *setup(OdString inputFile, OdGiRasterImagePtr *pOutputImage, RenderDbToImageContext *pThreadCtx)
    { m_inputFile = inputFile; m_pOutputImage = pOutputImage; m_pThreadCtx = pThreadCtx;
      return this; }
    static void entryPoint(OdApcParamType pArg)
    {
      ::odThreadsCounter().startThread();
      RenderDbToImageCaller *pCaller = (RenderDbToImageCaller*)pArg;
      OdDbDatabasePtr pDb = pCaller->m_pThreadCtx->m_pServices->readFile(pCaller->m_inputFile);
      if (!pDb.isNull())
        *(pCaller->m_pOutputImage) = ::renderDbToImage(pDb, pCaller->m_pThreadCtx->m_pRenderDevice, 
          pCaller->m_pThreadCtx->m_picWidth, pCaller->m_pThreadCtx->m_picHeight);
      ::odThreadsCounter().stopThread();
    }
    RenderDbToImageCaller *run(SimpleMultiThreadsQueue &threadQueue)
    {
      threadQueue.runNewThread(entryPoint, (OdApcParamType)this, 
        ThreadsCounter::kMtLoadingAttributes | ThreadsCounter::kMtRegenAttributes);
      return this;
    }
};

And the same changes inside the ProcessImageCaller class:

// Thread running method implementation
class ProcessImageCaller : public OdRxObject
{
  OdSmartPtr<ProcessedRasterImage> m_pProcImage;
  OdUInt32 m_scanLineFrom, m_nScanLines;
  public:
    ProcessImageCaller *setup(ProcessedRasterImage *pProcImage, OdUInt32 scanLineFrom, OdUInt32 nScanLines)
    { m_pProcImage = pProcImage; m_scanLineFrom = scanLineFrom; m_nScanLines = nScanLines;
      return this; }
    static void entryPoint(OdApcParamType pArg)
    {
      ::odThreadsCounter().startThread();
      ProcessImageCaller *pCaller = (ProcessImageCaller*)pArg;
      pCaller->m_pProcImage->process(pCaller->m_scanLineFrom, pCaller->m_nScanLines);
      ::odThreadsCounter().stopThread();
    }
    ProcessImageCaller *run(SimpleMultiThreadsQueue &threadQueue)
    {
      threadQueue.runNewThread(entryPoint, (OdApcParamType)this, ThreadsCounter::kNoAttributes);
      return this;
    }
};

As previously, we call the ThreadsCounter::startThread and ThreadsCounter::stopThread methods within the thread execution function to allocate/deallocate internal Teigha per-thread resources if they are required. A small difference in the entryPoint method prototype is described later in this article.

Now we can modify our main example function to invoke the SimpleMultiThreadsQueue object instead of the SimpleWinThreadsPool object from the previous article:

// Init performance timer
OdPerfTimerWrapper perfTimer;

// Create "render database to image" context shareable between threads
RenderDbToImageContext renderDbContext;
renderDbContext.setup(OdWinOpenGLModuleName, 1024, 1024, &svcs);

// Locked per-thread data structures
OdRxObjectPtrArray lockedObjects;

// Start timing for "render database to image" process
perfTimer.getTimer()->start();

// Run loading and rendering process
for (OdUInt32 nInput = 0; nInput < generatedRasters.size(); nInput++)
{
  OdString inputFileName(argv[2 + nInput]);
  lockedObjects.push_back(
    OdRxObjectImpl<RenderDbToImageCaller>::createObject()->
      setup(inputFileName, &generatedRasters[nInput], &renderDbContext)->
        run(mThreadQueue));
}

// Wait threads completion
mThreadQueue.wait();
lockedObjects.clear();

// Output timing for "render database to image" process
perfTimer.getTimer()->stop();
odPrintConsoleString(L"%u files loaded and rendered in %f seconds\n", generatedRasters.size(), perfTimer.getTimer()->countedSec());

And the same changes to the multithreaded image processing part:

// Create final raster image generator
OdSmartPtr<GeneratedRasterImage> pGenImage = OdRxObjectImpl<GeneratedRasterImage>::createObject();
pGenImage->configureImage(generatedRasters[0], generatedRasters);
    
// Create container for processed final raster image 
OdSmartPtr<ProcessedRasterImage> pProcImage = OdRxObjectImpl<ProcessedRasterImage>::createObject();
pProcImage->configureImage(pGenImage);

// Start timer for measure raster image processing
perfTimer.getTimer()->start();

// Run threads for raster image processing
const OdUInt32 nScanlinesPerThread = pProcImage->pixelHeight() / 4;
for (OdUInt32 nThread = 0; nThread < 4; nThread++)
{
  OdUInt32 nScanlinesPerThisThread = nScanlinesPerThread;
  if (nThread == 3) // Height can be not dividable by 2, so last thread can have onto one scanline less.
    nScanlinesPerThisThread = pProcImage->pixelHeight() - nScanlinesPerThread * 3;
  lockedObjects.push_back(
    OdRxObjectImpl<ProcessImageCaller>::createObject()->
      setup(pProcImage, nScanlinesPerThread * nThread, nScanlinesPerThisThread)->
        run(mThreadQueue));
}

// Wait threads completion
mThreadQueue.wait();
lockedObjects.clear();

// Output measurement for raster image processing process
perfTimer.getTimer()->stop();
odPrintConsoleString(L"Final raster image processed in %f seconds\n", perfTimer.getTimer()->countedSec());

You can get more information about these code examples in the previous articles Teigha Multithreading High-Level API and Teigha Multithreading Low-Level API.

Difference between the two OdApcThread::asyncProcCall method prototypes

void asyncProcCall(OdApcEntryPointVoidParam ep, OdApcParamType parameter)
virtual void asyncProcCall( OdApcEntryPointVoidParam ep, OdApcParamType parameter ) = 0;

The thread entry point (which can be passed as an ep argument) should look like this:

void entryPoint(OdApcParamType pArg);

This prototype was used in our example RenderDbToImageCaller and ProcessImageCaller classes as static class members.

OdApcParamType is defined in the Teigha API as:

typedef ptrdiff_t OdApcParamType;

This is a classic prototype of a thread execution function which typically expects a void* argument. The ptrdiff_t type has the same meaning as void* here, so you can pass into the thread a 32-bit value for the 32-bit library configuration or a 64-bit value for the 64-bit library configuration.

void asyncProcCall(OdApcEntryPointRxObjParam ep, OdRxObject* parameter)
virtual void asyncProcCall( OdApcEntryPointRxObjParam ep, OdRxObject* parameter ) = 0;

The thread entry point (which can be passed as an ep argument) should look like this:

void entryPoint(OdRxObject* pArg);

This prototype has one significant difference from the asyncProcCall method with the OdApcParamType argument. The OdRxObject class has a reference counting mechanism, so OdRxObject-based classes will be alive during thread job completion.

This feature is exactly what we need to simplify our demonstration example one more time, because the RenderDbToImageCaller and ProcessImageCaller classes are already OdRxObject-based.

Adjusting the example

In our example we try to solve the problem with per-thread allocated objects using an array of OdRxObjects (OdRxObjectPtrArray lockedObjects;), which lock all per-thread data until all multithread operations are completed and then clear this array to unlock and release all per-thread data after that. This task can be solved using the OdApcThread::asyncProcCall method with the OdRxObject* argument.

First modify the SimpleMultiThreadsQueue::runNewThread method to use the described thread execution function prototype:

void runNewThread(OdApcEntryPointRxObjParam runFcn, OdRxObject *fcnArg, OdUInt32 threadAttributes)

Next modify the entryPoint method in the RenderDbToImageCaller and ProcessImageCaller classes:

static void entryPoint(OdRxObject* pArg)

Remove the explicit thread function argument cast to the OdApcParamType type in the RenderDbToImageCaller::run method:

threadQueue.runNewThread(entryPoint, this, 
    ThreadsCounter::kMtLoadingAttributes | ThreadsCounter::kMtRegenAttributes);

Do the same in the ProcessImageCaller::run method:

threadQueue.runNewThread(entryPoint, this, ThreadsCounter::kNoAttributes);

Now creation and clearing of the locked objects array can be removed from the main example function and running threads can be simplified:

OdRxObjectImpl<RenderDbToImageCaller>::createObject()->
  setup(inputFileName, &generatedRasters[nInput], &renderDbContext)->
    run(mThreadQueue);

The same for the multithread image processing running loop:

OdRxObjectImpl<ProcessImageCaller>::createObject()->
  setup(pProcImage, nScanlinesPerThread * nThread, nScanlinesPerThisThread)->
    run(mThreadQueue);

Conclusion

This is the last article about Teigha Multithreading APIs. Together these series of articles provide enough information to start basic multithread programming using Teigha libraries, to choose the best organization of multithreading solutions depending on project specifics, and to reach the best solution for good performance and source code design. Of course, these articles can’t highlight all API opportunities. For example, we didn’t describe the possibility for starting the execution of tasks inside a main application thread from nested threads (using low-level and high-level APIs), which can be useful in some highly specialized solutions. Also the Thread Pool Services module API provides many helper classes for simplifying multithreaded source code, such as the set of for_each templates. Using this extended functionality and these classes requires advanced programming experience and can be investigated independently.