std.experimental.logger is very slow when compared to some implementation in C++ (spdlog)
From my initial observations, the max CPU usage per thread doesn't exceed 30% on average.
07-02-2018 00:09:00 vaalaham ~/code/cpp/spdlog/bench
$ make >/dev/null 2>&1 && time ./spdlog-bench-mt 8
real 0m0.562s
user 0m1.480s
sys 0m1.620s
07-02-2018 00:09:05 vaalaham ~/code/cpp/spdlog/bench
$
07-02-2018 00:09:26 vaalaham ~/code/d/std-log-benchmark
$ time ./std-log-benchmark 8 1000000
real 0m5.617s
user 0m4.048s
sys 0m3.674s
07-02-2018 00:09:34 vaalaham ~/code/d/std-log-benchmark
$
import core.atomic;
import core.thread;
import std.parallelism;
import std.experimental.logger;
import std.stdio;
import std.conv : to;
shared int msgCounter = 0;
shared int maxCount = 1_000_000;
__gshared FileLogger _logger;
void main(string[] args)
{
if (args.length != 3)
{
writefln("Usage: %s <thread-count> <loop-count>", args[0]);
return;
}
_logger = new FileLogger("/dev/null", LogLevel.trace);
const threadCount = to!int(args[1]);
maxCount = to!int(args[2]) - threadCount;
auto pool = new TaskPool(threadCount);
foreach (tid; 0 .. threadCount)
pool.put(task!logMe(tid + 1));
pool.finish;
}
void logMe(const int tid)
{
while (true)
{
_logger.tracef("From T-%s log message #%s: This is some text for your pleasure", tid, atomicOp!"+="(msgCounter, 1));
if (atomicLoad(msgCounter) > maxCount)
break;
}
}
Comment #1 by aruncxy — 2018-02-07T08:23:07Z
Platform: Ubuntu 17.10 - 64 bit.
Compiler:
LDC - the LLVM D compiler (1.7.0):
based on DMD v2.077.1 and LLVM 5.0.1
built with LDC - the LLVM D compiler (1.7.0)
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
http://dlang.org - http://wiki.dlang.org/LDC
Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_be - AArch64 (big endian)
arm - ARM
arm64 - ARM64 (little endian)
armeb - ARM (big endian)
nvptx - NVIDIA PTX 32-bit
nvptx64 - NVIDIA PTX 64-bit
ppc32 - PowerPC 32
ppc64 - PowerPC 64
ppc64le - PowerPC 64 LE
thumb - Thumb
thumbeb - Thumb (big endian)
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
Comment #2 by dfj1esp02 — 2018-02-08T09:51:44Z
AFAIK, default logger implementations are simplistic and are not meant to be very fast. Fast logging requires a bit of design and is supposed to be 3rd party library, while std.experimental.logger provides a way to integrate it.
Comment #3 by aruncxy — 2018-02-08T16:54:17Z
(In reply to anonymous4 from comment #2)
> AFAIK, default logger implementations are simplistic and are not meant to be
> very fast. Fast logging requires a bit of design and is supposed to be 3rd
> party library, while std.experimental.logger provides a way to integrate it.
Thanks, but that's weird. If that's the case, why would anyone want to use std.experimental.logger after all? Everyone will default to using their own logger implementations not derived out of std.experimental.logger, knowing that it is *slow*. I'm pretty sure that's not the goal of Phobos. May be something was overlooked in the design or implementation that causes this slowness?
Comment #4 by aruncxy — 2018-09-05T04:11:10Z
On the same box with ldc 1.12.0-beta1, I see that the performance has improved by around 20%.
04-09-2018 21:06:15 vaalaham ~/code/d/std-log-benchmark
$ time ./std-log-benchmark 8 1000000
real 0m3.958s
user 0m3.457s
sys 0m2.840s
04-09-2018 21:08:47 vaalaham ~/code/d/std-log-benchmark
$
Comment #5 by rburners — 2018-09-05T07:33:23Z
The performance problem is not with std.experimental.logger, it is with the benchmark.
The amount thread contention is killing all performance.
The benchmark becomes single threaded around three times for each line printed.
2x atomic ops on msgCounter, 1x writing the line to file.
Comment #6 by aruncxy — 2018-09-06T19:06:20Z
> The benchmark becomes single threaded around three times for each line printed.
How is it a benchmark issue? I agree that three API for a single log line is expensive. Isn't that a problem with std.experimental.logger?
I also see that a flush after every line is causing the slowdown as well. Probably do a flush only when the message level is fatal/error/warn and also during termination?
Comment #7 by rburners — 2018-09-06T20:14:05Z
I didn't say anything about api calls. That is not the problem.
The problem with the benchmark is that the threads share memory. That means each write will, given you tested on a multicore cpu, invalidates some of the CPUs caches. That means the program has to go to RAM, and that is slow, really slow.
I bet you a drink at next years DConf, that if you check perf you will find that the CPU is waiting for data from RAM most of the execution time.
You need to flush after each log call. If there is a log buffer and the program crashes, you might not see the log line that indicates the problem.
Comment #8 by aruncxy — 2018-09-06T22:09:24Z
spdlog (C++) does exactly the same thing and it is much faster as I shared the stats earlier. It is apples to apples, IMO. So the contention is same in both C++ and D versions.
In fact, D does much better with atomic operations.
Disabling the log statements with 8 threads and 100000000 iterations, C++ version takes
real 0m3.167s
user 0m25.215s
sys 0m0.004s
whereas D version takes
real 0m2.527s
user 0m20.124s
sys 0m0.000s
You can compare the asm at https://godbolt.org/z/JjfTyw
Comment #9 by aruncxy — 2018-09-06T22:29:20Z
> You need to flush after each log call. If there is a log buffer and the program crashes, you might not see the log line that indicates the problem
That's right, but may be worth doing on a configuration basis (async mode)? Also I'm not sure how formattedWrite is implemented. As I see it writes piece by piece into the LockingWriter.
The log string could probably be constructed first, followed by a single write into the sink. This will avoid multiple chunked writes.
Comment #10 by aruncxy — 2018-09-07T02:07:10Z
s/_logger.tracef/printf/
06-09-2018 18:25:51 vaalaham ~/code/d/std-log-benchmark
$ time ./std-log-benchmark 8 1000000 > /dev/null
real 0m0.495s
user 0m1.254s
sys 0m2.100s
06-09-2018 19:02:22 vaalaham ~/code/d/std-log-benchmark
$
So the slowdown is definitely related to phobos.
Comment #11 by aruncxy — 2018-09-07T03:09:14Z
s/_logger.tracef/writeln
06-09-2018 20:05:27 vaalaham ~/code/d/std-log-benchmark
$ time ./std-log-benchmark 8 1000000 | tail
real 0m0.975s
user 0m2.067s
sys 0m2.418s
06-09-2018 20:05:29 vaalaham ~/code/d/std-log-benchmark
I can infer that std.format is probably a little slower than fmtlib (C++) https://github.com/fmtlib/fmt But I need to test more to confirm this claim.
But for sure std.experimental.logger is slow.
Comment #12 by rburners — 2018-09-07T09:52:01Z
Building the string in memory will put quite some pressure on the GC.
Using a stack based array is limiting and not @safe.
formattedWrite can do more than printf.
I stayed away from allowing configurations, because it is really hard to know what should be a config option and what not.
So the idea is to inherit and change in code what need to be different.
Comment #13 by aruncxy — 2018-12-18T23:19:05Z
Closing as of now as per the last comment - we can reopen in future if required.