FLUID-6277: Tests on instrumented code increased memory usage

Metadata

Source
FLUID-6277
Type
Bug
Priority
Major
Status
Closed
Resolution
Won't Do
Assignee
Giovanni Tirloni
Reporter
Giovanni Tirloni
Created
2018-04-28T13:34:59.801-0400
Updated
2024-07-22T10:35:20.105-0400
Versions
N/A
Fixed Versions
N/A
Component
  1. Testing Infrastructure

Description

CI builds are failing intermittently a the "node tests" step due to out of memory errors:

https://buildkite.com/fluid-project/fluid-infusion/builds/207#1fea0683-0a51-4c27-907e-30534e2b4430
https://buildkite.com/fluid-project/fluid-infusion/builds/208#9fbce57d-1493-460c-a449-868b8e625ae4

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

Initial investigation shows that memory usage of the node process currently goes up to 1.4GB in CI:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
vagrant  13778 89.3 72.2 2374052 1473868 pts/1 Dl+  12:24   1:15 node node_modules/nyc/bin/nyc.js node tests/node-tests/basic-node-tests.js

Testing on a non-CI machine with a fresh repo clone, the "node tests" shows the memory usage is 600MB. After the "browser tests" are executed (and the code is instrumented), the memory usage increases to 900-1100MB in tests.

CI re-uses the git repository from previous builds and cleans it using `git -fd`, leaving behind files that in .gitignore (like `reports` and `coverage`). Files in this directory will accumulate (since some of them are named after the firefox/chrome versions available at the time) and after a while, CI builds will start failing again. After a fresh clone in CI, the node process used 435MB.

https://buildkite.com/fluid-project/fluid-infusion/builds/211#a9ecd1c5-2a06-407c-bca1-fce9374a17ea

So, there are a few problems:

  • Instrumented code is making node tests run dangerously close to the VM memory limit
  • Accumulation of files is increasing memory usage of Instambul
  • CI is re-using repository from previous runs (to improve build times) and leading to these issues
  • CI steps (within a single build) could clean the repository but this breaks vagrant. A solution where CI excludes certain directories from `git clean` would work but could be fragile (since now the logic must be duplicated in build scripts and CI definition).

Comments

  • Giovanni Tirloni commented 2018-04-30T09:25:39.946-0400

    @@Tony Atkins [RtF] is it necessary to run the browser tests before the node tests since, it seems, the code is instrumented in the former? IRC dicussions

  • Giovanni Tirloni commented 2018-05-04T09:25:19.800-0400

    While trying to implement per-pipeline configuration regarding how to clean files, I faced issue https://github.com/buildkite/agent/issues/756. For now the issue described in this JIRA remains and the buildkite agent is configured to use `git clean -fqd` which will leave behind the files in the "coverage" folder (which today was at 1.7GB and started triggering out of memory errors).

  • Tony Atkins [RtF] commented 2018-05-07T05:32:33.979-0400

    I responded to @@Giovanni Tirloni on IRC, but just to confirm here for the record, the browser and node instrumentation are completely separate.  We use nyc for the node tests, it hooks Babel into "require" and instruments anything that matches the settings in .nycrc automatically.  The order of the tests should not matter at all.

  • Giovanni Tirloni commented 2018-05-22T11:44:31.265-0400

    @@Tony Atkins [RtF] Thanks! I'm proposing in PR#900 we delete the coverage and reports directories after each CI build.

    This shouldn't impact the work that is still to be completed for FLUID-6265 in making the reports available somewhere.

  • Tony Atkins [RtF] commented 2018-05-23T07:46:50.826-0400

    @@Giovanni Tirloni, I commented on the pull, but I see that you've posted a lot of the details here. If you were simply running `npm test`, then the coverage and reports directories should be cleared out by the `pretest` step. I guess the CI config runs individual commands instead? Anyway, if you just issue `npm run pretest` at the start of each run, that should have the same effect.

  • Giovanni Tirloni commented 2018-06-28T10:58:39.623-0400

    Mothballing this as the PR is almost 2 months old and it may become unnecessary with the adoption of containers instead of Vagrant. If there's enough interest it can be reopened. PR is being closed too.