Ruby : String Unary Minus Operator

Welcome back to my channel guys!

Today we’re going to look at a small but powerful Ruby feature that most developers – even experienced ones — don’t know about:

The String Unary Minus Operator (-string)

This feature helps you: - reduce memory usage - reduce GC pressure - and make your Ruby and Rails apps more predictable in production.

Problem Definition

But first, lets create a scenario in which we define a bunch of string literals like we do in our Rails app.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class PaymentsController < ApplicationController
  def create
    PaymentProcessor.new.call
    render json: { success: true }
  rescue PaymentError => e
    Rails.logger.warn(e.message) 
    render json: { error: e.message }, status: 422
  end
end

class PaymentProcessor
  def call
    raise PaymentError, "invalid_card"
  end
end

What’s happening here?

In this piece of code, we are checking for validity of a credit/debit-card and responding with errors. In the process we defined the identical string “invalid_card” 3 times. So every request to that controller-action will define 3 different string-objects in memory.

1
2
3
"invalid_card"   # 1st - exception message defines it
"invalid_card"   # 2nd Logger - defines it
"invalid_card"   # 3rd - JSON payload defines it

In a high traffic situation, lets say - 1000 requests / sec it will define 3 identical string objects per request. So, approximately it will define 3000 new string objects/sec and over 260 Million identical string objects/day.

So.. Why Should You Care?

This simple looking code has huge impact in a very busy production environment. Production environments are supposed to be very responsive with low latency and higher throughput.

Already Ruby language runtime is notorious for higher memory consumption so we should pay more attention to reduce it and not blow it up.

The code we wrote directly impacts memory and performance due to unnecessary allocations.

Memory Cost Of Strings

Lets see how much rawnmemory does it require to define string “invalid_card” in ruby

1
2
irb(main):001> "invalid_card".bytesize
=> 12

So, 1 instance of the string occupies 12 bytes in memory and 1000 request will create strings that will occupy 12 x 3000 ~ 36KB/sec. For a single string, its huge. We should also keep in mind that real applications have hundreds of these.

Garbage Collection Impact

Creating excessive number of objects in heap increases GC pressure. Garbage collection’s mark and sweep operations are very expensive.

Ruby’s GC is stop-the-world type. It means, the whole ruby world is paused during mark and sweep for integrity. And, it hugely impacts throughput.

High allocation rates = higher GC pressure = lower throughput

The Solution

Ruby gives us explicit string deduplication features in two forms. They are - Unary Minus operator: -string - Instance method: String#dedup

They do three things - freeze the string object - deduplicate the definition - reuse a single canonical instance

What ruby does is – it maintains a fstring table in VM and maps to the object in heap. For the first time when it’s not already defined, it updates the table and from next time, it simply fetches the same object from table.

Usage Of -string

You can implement it using either a short-hand syntactic sugar -[string] also called unary-minus operator or .dedup instance method in String class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
irb(main):001> name = "shiva" # original string stays mutable
=> "shiva"
irb(main):002> name.object_id
=> 19680

irb(main):003> new_name = -name
=> "shiva"
irb(main):004> new_name.object_id
=> 46960 # fstring table is updated only at the first occurance of dedup
irb(main):005> new_new_name = -name
=> "shiva"
irb(main):006> new_new_name.object_id
=> 46960

irb(main):007> new_name.frozen?
=> true
irb(main):008> name.frozen?
=> false

Usage of .dedup

Using .dedup is specially useful when writing functional code where you need to chain it in a long pipeline.

1
urls.map(&:dedup)

This form is more composable than -string.

Not All Strings Are Deduped

Before deduping, ruby checks if the object is a pure string or subclassed string.

1
2
3
4
5
6
7
8
irb(main):005> a = MyString.new("abc").dedup
=> "abc"
irb(main):006> a.object_id
=> 11820
irb(main):007> b = MyString.new("abc").dedup
=> "abc"
irb(main):008> b.object_id
=> 22060

From this we should understand that, - Adding instance variables to the string object make it illegible to deduplication - Being a subclass of string also make its illegible.

See the implementation in MRI

1
2
3
4
5
6
7
8
static VALUE
str_uminus(VALUE str)
{
    if (!BARE_STRING_P(str) && !rb_obj_frozen_p(str)) {
        str = rb_str_dup(str);
    }
    return rb_fstring(str);
}

Only the pure string enter the fstring table, everything else gets duplicated safely before entering to fstring table.

.freeze vs .dedup / -string

Both methods .freeze and .dedup/-string return the frozen object but .freeze does not guarantee deduplication.

Though you might often see deduplicated frozen string objects using .freeze or frozen_string_literal: true macro, it only happens in compile-time with string literals only. For runtime strings that are generated using String.new syntax you can see that deduplication is not happening.

However, using -string or .dedup method its guaranteed to deduplicate string object definition and frees heap memory.

Feature .freeze .dedup/-string
Freezes
Deduplicates
Canonical Instance

Testing Comparison in IRB

In IRB, .freeze and .dedup/-string behave similar and let use believe they are implemented the same way.

1
2
3
4
5
6
7
8
9
> a = "shiva".freeze
> b = "shiva".freeze
> a.object_id == b.object_id
=> true

> c = "bhusal".dedup
> d = "bhusal".dedup
> c.object_id == d.object_id
=> true

REPL like IRB are not suitable for testing behavior of dedup and freeze because - ruby often reuses the same literal object - calling ‘.freeze’ twice freezes the same object

For real tests: use runtime strings instead of literals(compile time)

1
2
3
4
5
6
7
8
9
10
> a = String.new("shiva").freeze
> b = String.new("shiva").freeze
> a.object_id == b.object_id
=> false


> c = -String.new("bhusal")
> d = -String.new("bhusal")
> c.object_id == d.object_id
=> true

Deduped Strings Vs Symbols

In ruby, symbols live in Symbol table out of heap in the VM, whereas deduped strings are objects that live in heap and are referenced in fstring table which is a hash-cache maintained by VM.

Symbols live throughout the VM’s life and are not garbage-collected except dynamically created symbols using .to_sym. Deduped string literals are garbage-collected when not referenced by any living variable.

Excessive use of symbols is one of the main causes of memory bloating in Rails apps because memory is full of symbols that are only seldomly referenced or used. Using deduped strings, we can prevent memory bloating in some extent.

Performance wise, symbols and deduped strings are same in-terms-of lookup and hashing because hash-values are already calculated and cached in object during optimization or table preparation.

Aspect Symbol Deduped String
GC ❌(statically created)
Storage Symbol table Heap + fstring table
Mutability Immutable Immutable
Risk Symbol Leaks GC-safe

Where does Ruby store these deduped strings

Ruby stores deduplicated frozen string literals (-"string") in a special internal table within the Ruby VM, not in a separate heap space. Its commonly called fstring table(an internal cache). These strings are defined once and reused across the program, meaning they live in Ruby’s global object space (managed by the VM) rather than being duplicated in the normal heap each time you reference them.

Best Use Cases

It is recommended to use .dedup/-string for - filenames - config keys - protocol strings - error messages - SQL fragments - headers

We should avoid it for - user inputs - mutable strings - infrequent string values

Benefits Of -string/.dedup

  • it makes strings immutable
  • one instance across Virtual memory
  • it lowers memory footprint and reduces server cost eventually
  • it makes comparison operations faster as object_ids are same
  • it is a better alternative to symbols as they are GCed when not referenced from stack and memory is freed.
  • Improves GC time

Real Production Impact

In a real production Rails app – under load - GC time is 5 - 15% of total CPU usage - Bad string allocation patterns is 30% + GC time - After dedup/freeze optimization, GC drops by half

Wrap Up

So lets wrap this up.

This string unary minus operator is not just a “cool Ruby Trick”. Its a memory and Garbage Collector Tool.

If you app is small, short-lived or low traffic – you probably won’t notice a difference.

But, if you’re running a Rails app that stays up for days, handles thousands of requests per second, and keeps allocating the same strings over and over again…

then those allocations add up.

Every repeated string - increases heap size, - increases GC pressure - and, increases pause time under load

Using -string or .dedup lets you make those strings immutable, reusable, and allocated just once.

That’s it for this one – see you in the next.