github.com/unigraph-dev/dgraph@v1.1.1-0.20200923154953-8b52b426f765/chunker/README.txt (about) 1 go tool pprof --alloc_objects uidassigner heap.prof 2 3 (pprof) top10 4 196427053 of 207887723 total (94.49%) 5 Dropped 41 nodes (cum <= 1039438) 6 Showing top 10 nodes out of 31 (cum >= 8566234) 7 flat flat% sum% cum cum% 8 55529704 26.71% 26.71% 55529704 26.71% github.com/dgraph-io/dgraph/rdf.Parse 9 28255068 13.59% 40.30% 30647245 14.74% github.com/dgraph-io/dgraph/posting.(*List).getPostingList 10 20406729 9.82% 50.12% 20406729 9.82% github.com/zond/gotomic.newRealEntryWithHashCode 11 17777182 8.55% 58.67% 17777182 8.55% strings.makeCutsetFunc 12 17582839 8.46% 67.13% 17706815 8.52% github.com/dgraph-io/dgraph/loader.(*state).readLines 13 15139047 7.28% 74.41% 88445933 42.55% github.com/dgraph-io/dgraph/loader.(*state).parseStream 14 12927366 6.22% 80.63% 12927366 6.22% github.com/zond/gotomic.(*element).search 15 10789028 5.19% 85.82% 66411362 31.95% github.com/dgraph-io/dgraph/posting.GetOrCreate 16 9453856 4.55% 90.37% 9453856 4.55% github.com/zond/gotomic.(*hashHit).search 17 8566234 4.12% 94.49% 8566234 4.12% github.com/dgraph-io/dgraph/uid.stringKey 18 19 20 (pprof) list rdf.Parse 21 Total: 207887723 22 ROUTINE ======================== github.com/dgraph-io/dgraph/rdf.Parse in /home/mrjn/go/src/github.com/dgraph-io/dgraph/rdf/parse.go 23 55529704 55529704 (flat, cum) 26.71% of Total 24 . . 118: } 25 . . 119: return val[1 : len(val)-1] 26 . . 120:} 27 . . 121: 28 . . 122:func Parse(line string) (rnq NQuad, rerr error) { 29 54857942 54857942 123: l := lex.NewLexer(line) 30 . . 124: go run(l) 31 . . 125: var oval string 32 . . 126: var vend bool 33 34 35 This showed that lex.NewLexer(..) was pretty expensive in terms of memory allocation. 36 So, let's use sync.Pool here. 37 38 After using sync.Pool, this is the output: 39 40 422808936 of 560381333 total (75.45%) 41 Dropped 63 nodes (cum <= 2801906) 42 Showing top 10 nodes out of 62 (cum >= 18180150) 43 flat flat% sum% cum cum% 44 103445194 18.46% 18.46% 103445194 18.46% github.com/Sirupsen/logrus.(*Entry).WithFields 45 65448918 11.68% 30.14% 163184489 29.12% github.com/Sirupsen/logrus.(*Entry).WithField 46 48366300 8.63% 38.77% 203838187 36.37% github.com/dgraph-io/dgraph/posting.(*List).get 47 39789719 7.10% 45.87% 49276181 8.79% github.com/dgraph-io/dgraph/posting.(*List).getPostingList 48 36642638 6.54% 52.41% 36642638 6.54% github.com/dgraph-io/dgraph/lex.NewLexer 49 35190301 6.28% 58.69% 35190301 6.28% github.com/google/flatbuffers/go.(*Builder).growByteBuffer 50 31392455 5.60% 64.29% 31392455 5.60% github.com/zond/gotomic.newRealEntryWithHashCode 51 25895676 4.62% 68.91% 25895676 4.62% github.com/zond/gotomic.(*element).search 52 18546971 3.31% 72.22% 72863016 13.00% github.com/dgraph-io/dgraph/loader.(*state).parseStream 53 18090764 3.23% 75.45% 18180150 3.24% github.com/dgraph-io/dgraph/loader.(*state).readLines 54 55 After a few more discussions, I realized that lexer didn't need to be allocated on the heap. 56 So, I switched it to be allocated on stack. These are the results. 57 58 $ go tool pprof uidassigner heap.prof 59 Entering interactive mode (type "help" for commands) 60 (pprof) top10 61 1308.70MB of 1696.59MB total (77.14%) 62 Dropped 73 nodes (cum <= 8.48MB) 63 Showing top 10 nodes out of 52 (cum >= 161.50MB) 64 flat flat% sum% cum cum% 65 304.56MB 17.95% 17.95% 304.56MB 17.95% github.com/dgraph-io/dgraph/posting.NewList 66 209.55MB 12.35% 30.30% 209.55MB 12.35% github.com/Sirupsen/logrus.(*Entry).WithFields 67 207.55MB 12.23% 42.54% 417.10MB 24.58% github.com/Sirupsen/logrus.(*Entry).WithField 68 108MB 6.37% 48.90% 108MB 6.37% github.com/dgraph-io/dgraph/uid.(*lockManager).newOrExisting 69 88MB 5.19% 54.09% 88MB 5.19% github.com/zond/gotomic.newMockEntry 70 85.51MB 5.04% 59.13% 85.51MB 5.04% github.com/google/flatbuffers/go.(*Builder).growByteBuffer 71 78.01MB 4.60% 63.73% 78.01MB 4.60% github.com/dgraph-io/dgraph/posting.Key 72 78.01MB 4.60% 68.32% 78.51MB 4.63% github.com/dgraph-io/dgraph/uid.stringKey 73 76MB 4.48% 72.80% 76MB 4.48% github.com/zond/gotomic.newRealEntryWithHashCode 74 73.50MB 4.33% 77.14% 161.50MB 9.52% github.com/zond/gotomic.(*Hash).getBucketByIndex 75 76 Now, rdf.Parse is no longer shows up in memory profiler. Win!