Hi everyone,
I have a pretty fundamental question regarding parallel runs in EGSnrc. I’m finding that the individual pieces of a parallel run do not necessarily agree with each other within their stated uncertainties, even in a very simple simulation.
More fundamentally, I’m seeing “large” discrepancies between two basically identical simulations that were both run in a parallel environment and combined. The reported cavity doses do not agree within the stated simulation uncertainties.
For a very simple example, I made a 1 cm radius Virtual Water cylinder that is 6 cm long as my dose scoring object. I enveloped that rod inside a 10 cm cube of water. The source was a collimated point source of 6 MV photons above the rod but still inside the water box. The dose to the entire rod was scored by running 5e10 histories total. No VR was implemented.
The simulation was run on a parallel cluster that contained 72 CPUs so the job was split into ~5e10/72 histories per CPU. In the plot I included, it’s straightforward to see that the results of the individual jobs (which should ostensibly be identical right?) do not always agree with each other within their stated uncertainty. Considering that each job ran almost 7 million histories, I don’t think this is just an undersampling issue.
Given that the final result of the combined jobs is the average dose/SP to the cavity, is this to be expected? Why wouldn’t the results converge more tightly towards a given answer?
When I instead brute force the same simulations in Windows without using parallel CPU computing, I get results that do ostensibly agree within their stated uncertainty but I need to run more simulations to definitively show that this is always the case.
Since I’m personally running IMRT modulated phase spaces into small geometries in egs_chamber on a cluster, these small fluctuations in the cavity dose results have me pretty alarmed since the prescribed corrections are often on the order of <1%. Any tips to help me better understand the nature of uncertainties in parallel computing would be awesome, especially why this happens even in a very simple usage scenario like the one I described. Thanks!








