Skip to content

Code conversion guidelines

John Sumsion edited this page Nov 11, 2015 · 16 revisions

Administrative

  • It's recommended to start with a copy of the Java code and convert it line by line.
  • The C* code we will convert from currently is http://git-wip-us.apache.org/repos/asf/cassandra.git bf599fb5b062cbcc652da78b7d699e7a01b949ad (from trunk, five months before 2.2.0-beta1, around the time 2.0.12 and 2.1.3 were released)
  • It is mandatory to keep the Apache License header, and add a "Modified by Cloudius Systems" (or your own name if you prefer). This is required by the Apache license.
  • As usual, add a "Copyright 2015 Cloudius Systems" line. Happy new year!

General

  • Copy code comments where they make sense
  • Keep unconverted Java code in #if 0 blocks
  • Don't forget to add an include guard to headers

Lexical

  • Filenames: SomeFile.java -> some_file.hh, possibly some_file.cc
  • Packages/namespaces: strip org.apache.cassandra, rest of package name becomes the namespace.
  • Keep the directory structure (but strip org/apache/cassandra/).
  • Class and method names: SomeClass -> some_class, someMethod() -> some_method().
  • If the Java code had a class SomeClass and elsewhere a variable or method someClass, the above rule will yield the same name for both. While this is legal, it is confusing and can lead to hard-to-spot bugs, so please rename the variable or method.

Semantics

References

Every Java object reference is a pointer. The object can be shared among multiple references, and is garbage collected when no references exist. The closest parallel in C++ is shared_ptr<>, but we don't want the code to be littered with them.

  • When possible, use values instead of references: List<Foo> -> std::vector<foo>, not std::vector<shared_ptr<foo>.
  • If a method only looks at an argument, the signature should specify a const reference.
  • If a method takes ownership of an argument, pass it by value (and the caller can use std::move()).
  • If both caller and callee continue using the object, use a shared_ptr<> (or a raw pointer if the lifetime is otherwise taken care of).
  • In the singleton pattern, use unique_ptr<>. For example, a singleton class A in Java might have a static (per-class) field instance: public static final A instance = new A();. Its C++ equivalent is static std::unique_ptr<A> instance (new A());

Containers

Java uses an interface/implementation pair (List and ArrayList). C++ uses an implementation class (std::vector<>) and iterator interfaces. So both List and ArrayList should be converted to the implementation class.

Which implementation class is used depends on the usage. For lists, prefer vector<>, only using list<> if front or middle insertion is needed. For sets and maps, prefer the unordered variants unless sorting is needed.

Object methods

Java code depends on Object base methods such as equals() or hashCode(). For simple (non-polymorphic) types we can simply implement operator==() and std::hash<>. For type hierarchies these might need to delegate to a virtual method inside the base class.

Builtin integer types

On 64-bit Linux, the C++ types short, int, long happen to be identical in length to the Java types of that name, but this will not necessarily be the case in other architectures. The C++ types guaranteed to be identical to the Java ones are int16_t, int32_t, int64_t. In many cases where the Java code explicitly specified the integer length by using short or long, we should keep the same length and use int16_t and int64_t. For Java int, its translation should depend on context: Where the specific 4-byte length is important, use int32_t. Where the length was not important, leaving "int" is enough.

Java does not have unsigned types, but you can use them (unsigned int, size_t, etc.) in C++ code if you understand the code in question. If Java code uses the non-sign-extended shift operator "a >>> b", convert it into "(unsigned ..) a >> b"

Please remember that in Java, class fields of primitive types including the above integer types (and also boolean, float, etc.) are implicitly initialized to 0 (see this), but this is not the case in C++! So unless you are sure this is unnecessary (after understanding the code in question), please convert

    private long updateTimestamp;
    private boolean isAlive;

to

private:
    int64_t update_timestamp = 0;
    bool is_alive = false;

java.lang.String

Convert uses of java's String into use of sstring (#include "core/sstring.hh").

Interfaces

Java code often has a class implements an interfaces. These do not always need to be translated to C++ inheritance, and often needs to be converted differently:

Comparable

A Java class which implements Comparable<T> can be used, for example, in Collections.sort(), and needs to implements a compareTo(T other). We can drop this interface TODO: and probably, probably instead of the compareTo() want to define operator<, et al., instead. Finish this section.

SMP / Sharding

In general, SMP is treated in the same way as clustering is treated in Cassandra:

  • Global operations (schema changes) are broadcast across all cpus (each cpu has a local copy of the schema)
  • Single row operations (insert, read) are unicast to the row's owner cpu
  • Aggregate operations (multi-row select) use map/reduce to cpus that may contain the row

See the distributed<> class.

Utility classes

In Java, all functions need to belong to some class. This often results in the "utility class" pattern, a class which cannot be instantiated, and has nothing but static methods.

In C++, we prefer to use a namespace instead of a class in that case. Only the public functions should be declared in the header file inside the name space; Private or protected functions, if any, and also static data, belongs in the source (.cc) file and should be declared static (file-local).

Method visiblity

Java's method visibility modifiers (private/public/protected) are explained here. Watch out for one surprising difference between what they do in Java and in C++: in Java, methods declared with "protected" or no modifier at all are additionally visible to all other classes in the same package!

As C++ has no similar feature (making class methods visible only to code in the same namespace), the closest approximations are 1. to make all non-"private" methods public in the C++ code, or 2. to use C++'s friend feature. Option 1 is easier.

Of course, if you can verify that a certain protected or no-modifier method is not used by other classes in this package, then it can be converted to C++'s protected or private respectively.

Clone this wiki locally