Your VCL App: 4x to 11x Faster Math Performance with Elements

If you're a Delphi developer, you might not be used to thinking of Object Pascal as a language that produces very fast code. Not in the way people talk about C++ or Rust, anyway; those languages are famous for performance.

We think that should change. And we've done something about it.

Depending on your platform and what your code does, recompiling your Object Pascal with Oxygene instead of dcc32/dcc64 could now give you 4x to 11x faster math performance.

In our benchmarks, we're on par with Visual C++, and at the AVX2 level, faster than Visual C++ – yes, in Pascal. Let me explain what that means and how it works.

How we got here

Back in November we upgraded Elements' LLVM backend and, while we were there, got curious about our math performance. Island, our LLVM native code compiler backend, uses the IslandRTL for many functions including math. We don't reuse the system C RTL, because it can have varying implementations on every platform. We provide our own. Our math functions had correct implementations, of course, but we realised modern instruction sets offered an opportunity we hadn't yet taken.

It also became apparent that LLVM wasn’t being fully leveraged. While standard optimisation passes were in place, the possibility of going further was worth exploring.

So we set about investigating. What do other toolchains do? What does Visual C++ do? What does Rust do? What do other LLVM-based toolchains do, which are best practices we can also make use of?

What Is Island/Delphi?

Elements is our compiler with six language frontends: Oxygene (Object Pascal), C#, Swift, Go, Java, and VB. One compiler, multiple input languages, all of which can interoperate seamlessly; our own IDE is written using multiple languages in the same codebase, inheriting from and calling methods written in different languages with no bridging layer. It's a bit like Delphi & C++Builder, but six languages not two, and fully multidirectional across any language to any other.

Elements can target multiple platforms: .NET, JVM, WebAssembly, and native CPUs. The native CPU target is what we call 'Island', and it uses LLVM, the same compiler infrastructure as behind Clang, Rust, and Swift.

Island/Delphi is a mode where you can import a Delphi project and compile it with Oxygene, linking to Delphi packages. This includes using the VCL; the Water IDE imports the VCL and other standard Delphi packages, though third-party components need to be imported manually (a straightforward File > Import step, which you can also do in the Fire IDE on macOS, or within Visual Studio.)

A Delphi VCL app compiled and debugged using Oxygene. You can see references to the Delphi RTL and VCL (i.e. Delphi packages), plus an Island/Delphi support library, plus just a little bit of extra Oxygene syntax (the require keyword.) This is the Windows IDE, called Water. This is all absolutely amazing and deserves its own blog post.

Island/Delphi isn't at 100% Delphi compatibility yet: think of it as a v0.9. But we think it's amazing: for many projects you can import and rebuild. Your Object Pascal still works, your VCL forms still work; the difference is what happens at compile time, because you get the benefits of a modern LLVM-based toolchain, including a wide variety of modern language features that Object Pascal developers have asked for.

💡
Remember: targeting native CPUs – people often think Elements or Oxygene is .NET only. It's not. This uses LLVM to build native code for Intel 32, 64, and ARM64.
💡
We think this is a wonderful way to support Delphi users, because you use Embarcadero tech like the VCL, but get modern language features like async/await or tuples or nullable types in Pascal. We're all about modernity and powerful tooling here at RemObjects.

If you have trouble onboarding Delphi devs, or maintaining Delphi apps, you can also mix in other languages like C#. Hire C# devs and have them learn and work on your Delphi app – in C#. (They'll pick up Pascal on the way.) It's a great way to give more opportunity to your dev team.

What We Improved Lately

Two things that work together:

1. A fast, vectorised math library

We looked at what other toolchains did to get faster math than the default system libraries. Rust developers who want vectorised math can use SLEEF, so we integrated it.

SLEEF is an open-source math library designed for SIMD operations, providing vectorised implementations of sin, cos, exp, log, and the rest.

What does vectorised mean? Rather than processing one value per CPU instruction, a vectorised operation processes multiple values at once. Modern CPUs have wide SIMD registers: 128-bit, or 256-bit with AVX2. A 256-bit register holds four 64-bit doubles, so a single instruction can process four values at once. This is how you get significant speedups on numerical code.

💡
Why multiply four values one at a time, when you can multiple four values in one go, all at once? That's a vectorized optimization. Same results, just using modern CPU capabilities to do it faster.

2. CPU target selection

You can now tell the compiler which CPU level to target:

  • Win32: SSE2 (always)
  • Win64: x86-64-v1 through v4 (default to v2; and we recommend v3 if your users have ~2013+ hardware)
  • ARM64: Native Windows ARM compilation, which Delphi doesn't support yet

When you target a higher instruction set level, the math library uses more capable implementations. AVX2 for v3, for example, processes 256 bits of data per instruction instead of 128.

You can read more details on our general Elements blog post, since this benefits all six Elements languages.

The Performance Difference

We benchmarked 21 common math functions (sin, cos, exp, pow, log, sqrt, and so on) running each 10 million times on 64-bit doubles. Here's what we found comparing to Delphi 13.

Hardware: Intel Tiger Lake i7 (2.80 GHz), Windows 11 Pro 25H2. ARM tests on Apple M2 running Windows 11 Pro 25H2 ARM via Parallels.

Methodology: Each function runs in a tight, vectorizable loop (if the compiler takes advantage, which ours does.) Each processes an array of 10 million elements with random double values. Only the loop running the math operations is timed; the initialisation plus use of the results (to avoid the loop and results being optimized away) are outside the timing. Read more about the methodology here.

The reference compiler is normalized to 1.0. All reported results are expressed as multipliers relative to this baseline (for example, '4x faster' means four times the performance of the normalized compiler).

Win32

Oxygene Win32 (SSE2 default): 4.2x faster than Delphi Win32

Win64

  • Oxygene Win64 v2 (default): 2.3x faster than Delphi Win64
  • Oxygene Win64 v3 (AVX2): 5.5x faster than Delphi Win64
  • Visual C++ Win64 AVX2: 2.7x faster than Delphi Win64

At our default settings we're roughly on par with Visual C++. With AVX2 targeting x86-64-v3, Oxygene is noticeably faster than VC++ also targeting AVX2.

Windows ARM

Delphi doesn't compile for Windows ARM, yet. If you have a Delphi Win32 or Win64 app running on Windows ARM, it's using Microsoft's Prism emulator to translate x86 instructions on the fly.

Oxygene ARM64 native: 11x faster than Delphi Win32 under emulation

This is not only the difference in using a highly optimising compiler, but it is also the difference between running translated code and running native ARM code.

Beyond Performance

Performance is what this post is about, but it's not the only reason to consider Oxygene.

Oxygene brings modern language features to Object Pascal: tuples, async/await, nullable types, soft interfaces, and more. You can use C# alongside Pascal in the same project. You can target platforms Delphi doesn't support: Windows ARM64, WebAssembly, Linux ARM, tvOS or watchOS...

If you've been looking for a way to modernise your Pascal development, a toolchain you can invest in that keeps up with current hardware and language design, Elements is worth evaluating.

The Practical Side

These are best-case numbers, i.e. tight loops of floating-point operations. Your real application probably doesn't consist entirely of sin() calls. But if you have code that does data processing, scientific calculations, signal processing, or anything else numerically intensive, you'll see real improvements.

The difference between "this calculation takes 10 seconds" and "this calculation takes 2 seconds" is highly noticeable.

In addition, instruction set supports means this applies to all code. Not just math code. Your string processing, or anything else, will be built with the same optimizing compiler that uses fast instructions and heavy optimization – just like C++, Rust, or Swift.

Try It Yourself

If you're curious:

  1. Download Elements (there's a 30-day free trial)
  2. Import your Delphi project: File > Import (Water on Windows, Fire on macOS, plus VS); the very first time, it will also import the VCL etc packages for you. At 4 seconds each for over a hundred, possibly for multiple bitnesses, it takes a few minutes. After this once-off step they'll be available right away for every other Delphi project you import in future.
  3. Build with Oxygene targeting Island/Delphi (your imported project will be configured for this already)
  4. Compare

Your VCL code works, your Object Pascal works; the difference is what happens at compile time.

💡
Got feedback? We'd love to hear from you when you test it out.

This – modern CPU support, fast code – is the kind of work we do in Elements. If you try it, I think you'll find what it shows.

In our view, this work strongly supports Delphi by providing what it does not have inbuilt, whether that's platforms, compiler behaviour like optimizations or fast math, or language features. A compiler is just a component of your development environment, and since its beginning thirty years ago, Delphi has been all about replacing inbuilt components with third party ones: doing so is intrinsic to Delphi development and a big part of the Delphi ecosystem. Our suggestion you use Elements to compile means we directly recommend and support using the VCL and other Embarcadero technologies.

We are grateful to Embarcadero for their support as a tech partner ❤️, and hope our value for the Delphi ecosystem really shines 🌟.