Contents of /cassandra/branches/cassandra-0.7/NEWS.txt

0.7.0
=====

Features
--------
    - Secondary indexes (indexes on column values) are now supported
    - Row size limit increased from 2GB to 2 billion columns.  rows
      are no longer read into memory during compaction.
    - Keyspace and ColumnFamily definitions may be added and modified live
    - Streaming data for repair or node movement no longer requires 
      anticompaction step first
    - NetworkTopologyStrategy (formerly DatacenterShardStrategy) is ready for 
      use, enabling ConsistencyLevel.DCQUORUM and DCQUORUMSYNC.  See comments 
      in `cassandra.yaml.`
    - Optional per-Column time-to-live field allows expiring data without
      have to issue explicit remove commands
    - `truncate` thrift method allows clearing an entire ColumnFamily at once
    - Hadoop OutputFormat and Streaming [non-jvm map/reduce via stdin/out]
      support
    - Up to 8x faster reads from row cache
    - A new ByteOrderedPartitioner supports bytes keys with arbitrary content,
      and orders keys by their byte value.  This should be used in new
      deployments instead of OrderPreservingPartitioner.
    - Optional round-robin scheduling between keyspaces for multitenant
      clusters
    - Dynamic endpoint snitch mitigates the impact of impaired nodes
    - New `IntegerType`, faster than LongType and allows integers of 
      both less and more bits than Long's 64
    - A revamped authentication system that decouples authorization and 
      allows finer-grained control of resources.

Upgrading
---------
    The Thrift API has changed in incompatible ways; see below, and refer
    to http://wiki.apache.org/cassandra/ClientOptions for a list of
    higher-level clients that have been updated to support the 0.7 API.

    The Cassandra inter-node protocol is incompatible with 0.6.x
    releases (and with 0.7 beta1), meaning you will have to bring your
    cluster down prior to upgrading: you cannot mix 0.6 and 0.7 nodes.
    
    The hints schema was changed from 0.6 to 0.7. Cassandra automatically
    snapshots and then truncates the hints column family as part of 
    starting up 0.7 for the first time.

    Keyspace and ColumnFamily definitions are stored in the system
    keyspace, rather than the configuration file.

    The process to upgrade is:
    1) run "nodetool drain" on _each_ 0.6 node.  When drain finishes (log
       message "Node is drained" appears), stop the process.
    2) Convert your storage-conf.xml to the new cassandra.yaml using 
       "bin/config-converter".  
    3) Rename any of your keyspace or column family names that do not adhere
       to the '^\w+' regex convention.
    4) Start up your cluster with the 0.7 version.
    5) Initialize your Keyspace and ColumnFamily definitions using 
       "bin/schematool <host> <jmxport> import".  _You only need to do 
       this to one node_.

Thrift API
----------
    - The Cassandra server now defaults to framed mode, rather than
      unframed.  Unframed is obsolete and will be removed in the future.
    - The Cassandra Thrift interface file has been updated for Thrift 0.5.
      If you are compiling your own client code from the interface, you
      will need to upgrade the Thrift compiler to match.
    - Row keys are now bytes: keys stored by versions prior to 0.7.0 will be
      returned as UTF-8 encoded bytes. OrderPreservingPartitioner and
      CollatingOrderPreservingPartitioner continue to expect that keys contain
      UTF-8 encoded strings, but RandomPartitioner now works on any key data.
    - keyspace parameters have been replaced with the per-connection
      set_keyspace method.
    - The return type for login() is now AccessLevel.
    - The get_string_property() method has been removed.
    - The get_string_list_property() method has been removed.

Configuraton
------------
    - Configuration file renamed to cassandra.yaml and log4j.properties to
      log4j-server.properties
    - PropertyFileSnitch configuration file renamed to 
      cassandra-topology.properties
    - The ThriftAddress and ThriftPort directives have been renamed to
      RPCAddress and RPCPort respectively.
    - EndPointSnitch was renamed to RackInferringSnitch.  A new SimpleSnitch
      has been added.
    - RackUnawareStrategy and RackAwareStrategy have been renamed to
      SimpleStrategy and OldNetworkTopologyStrategy, respectively.
    - RowWarningThresholdInMB replaced with in_memory_compaction_limit_in_mb
    - GCGraceSeconds is now per-ColumnFamily instead of global
        - Keyspace and column family names that do not confirm to a '^\w+' regex
      are considered illegal.
    - Keyspace and column family definitions will need to be loaded via
      "bin/schematool <host> <jmxport> import".  _You only need to do this to
      one node_.
    - In addition to an authenticator, an authority must be configured as
      well. Users of SimpleAuthenticator should use SimpleAuthority for this
      value (the default is AllowAllAuthority, which corresponds with 
      AllowAllAuthenticator).
    - The format of access.properties has changed, see the sample configuration
      conf/access.properties for documentation on the new format.


JMX
---
    - StreamingService moved from o.a.c.streaming to o.a.c.service
    - GMFD renamed to GOSSIP_STAGE
    - {Min,Mean,Max}RowCompactedSize renamed to {Min,Mean,Max}RowSize
      since it no longer has to wait til compaction to be computed

Other
-----
    - If extending AbstractType, make sure you follow the singleton pattern
      followed by Cassandra core AbstractType classes: provide a public
      static final variable called 'instance'.


0.6.6
=====

Upgrading
---------
    - As part of the cache-saving feature, a third directory
      (along with data and commitlog) has been added to the config
      file.  You will need to set and create this directory
      when restarting your node into 0.6.6.


0.6.1
=====

Upgrading
---------
    - We try to keep minor versions 100% compatible (data format,
      commitlog format, network format) within the major series, but
      we introduced a network-level incompatibility in this 0.6.1.
      Thus, if you are upgrading from 0.6.0 to any higher version
      (0.6.1, 0.6.2, etc.) then you will need to restart your entire
      cluster with the new version, instead of being able to do a
      rolling restart.


0.6.0
=====

Features
--------
    - row caching: configure with the RowsCached attribute in
      ColumnFamily definition
    - Hadoop map/reduce support: see contrib/word_count for an example
    - experimental authentication support, described under
      Authenticator in storage.conf

Configuraton
------------
    - MemtableSizeInMB has been replaced by MemtableThroughputInMB which
      triggers a memtable flush when the specified amount of data has 
      been written, including overwrites.
    - MemtableObjectCountInMillions has been replaced by the
      MemtableOperationsInMillions directive which causes a memtable flush
      to occur after the specified number of operations.
    - Like MemtableSizeInMB, BinaryMemtableSizeInMB has been replaced by
      BinaryMemtableThroughputInMB.
    - Replication factor is now per-keyspace, rather than global.
    - KeysCachedFraction is deprecated in favor of KeysCached
    - RowWarningThresholdInMB added, to warn before very large rows
      get big enough to threaten node stability

Thrift API
----------
    - removed deprecated get_key_range method
    - added batch_mutate meethod
    - deprecated multiget and batch_insert methods in favor of
      multiget_slice and batch_mutate, respectively
    - added ConsistencyLevel.ANY, for when you want write
      availability even when it may not be readable immediately.
      Unlike CL.ZERO, though, it will throw an exception if
      it cannot be written *somewhere*.

JMX metrics
-----------
    - read and write statistics are reported as lifetime totals,
      instead of averages over the last minute.  average-since-last
      requested are also available for convenience.
    - cache hit rate statistics are now available from JMX under
      org.apache.cassandra.db.Caches
    - compaction JMX metrics are moved to
      org.apache.cassandra.db.CompactionManager.  PendingTasks is now
      a much better estimate of compactions remaining, and the
      progress of the current compaction has been added.
    - commitlog JMX metrics are moved to org.apache.cassandra.db.Commitlog
    - progress of data streaming during bootstrap, loadbalance, or other
      data migration, is available under 
      org.apache.cassandra.streaming.StreamingService.
      See http://wiki.apache.org/cassandra/Streaming for details.

Installation/Upgrade
--------------------
    - 0.6 network traffic is not compatible with earlier versions.  You
      will need to shut down all your nodes at once, upgrade, then restart.


0.5.0
=====

0. The commitlog format has changed (but sstable format has not). 
   When upgrading from 0.4, empty the commitlog either by running 
   bin/nodeprobe flush on each machine and waiting for the flush to finish,
   or simply remove the commitlog directory if you only have test data.
   (If more writes come in after the flush command, starting 0.5 will error
   out; if that happens, just go back to 0.4 and flush again.)
   The format changed twice: from 0.4 to beta1, and from beta2 to RC1.

.5 The gossip protocol has changed, meaning 0.5 nodes cannot coexist
   in a cluster of 0.4 nodes or vice versa; you must upgrade your
   whole cluster at the same time.

1. Bootstrap, move, load balancing, and active repair have been added.
   See http://wiki.apache.org/cassandra/Operations.  When upgrading
   from 0.4, leave autobootstrap set to false for the first restart
   of your old nodes.

2. Performance improvements across the board, especially on the write
   path (over 100% improvement in stress.py throughput).

3. Configuration:
     - Added "comment" field to ColumnFamily definition.
     - Added MemtableFlushAfterMinutes, a global replacement for the 
       old per-CF FlushPeriodInMinutes setting
     - Key cache settings

4. Thrift:
     - Added get_range_slice, deprecating get_key_range


0.4.2
=====

1. Improve default garbage collector options significantly --
   throughput will be 30% higher or more.


0.4.1
=====

1. SnapshotBeforeCompaction configuration option allows snapshotting
   before each compaction, which allows rolling back to any version
   of the data.


0.4.0
=====

1. On-disk data format has changed to allow billions of keys/rows per
   node instead of only millions.  The new format is incompatible with 0.3;
   see 0.3 notes below for how to import data from a 0.3 install.

2. Cassandra now supports multiple keyspaces.  Typically you will have
   one keyspace per application, allowing applications to be able to
   create and modify ColumnFamilies at will without worrying about
   collisions with others in the same cluster.

3. Many Thrift API changes and documentation.  See 
   http://wiki.apache.org/cassandra/API

4. Removed the web interface in favor of JMX and bin/nodeprobe, which
   has significantly enhanced functionality.

5. Renamed configuration "<Table>" to "<Keyspace>".

6. Added commitlog fsync; see "<CommitLogSync>" in configuration.


0.3.0
=====

1. With enough and large enough keys in a ColumnFamily, Cassandra will
   run out of memory trying to perform compactions (data file merges).
   The size of what is stored in memory is (S + 16) * (N + M) where S
   is the size of the key (usually 2 bytes per character), N is the
   number of keys and M, is the map overhead (which can be guestimated
   at around 32 bytes per key).
   So, if you have 10-character keys and 1GB of headroom in your heap
   space for compaction, you can expect to store about 17M keys
   before running into problems.
   See https://issues.apache.org/jira/browse/CASSANDRA-208

2. Because fixing #1 requires a data file format change, 0.4 will not
   be binary-compatible with 0.3 data files.  A client-side upgrade
   can be done relatively easily with the following algorithm:
     for key in old_client.get_key_range(everything):
         columns = old_client.get_slice or get_slice_super(key, all columns)
     new_client.batch_insert or batch_insert_super(key, columns)
   The inner loop can be trivially parallelized for speed.

3. Commitlog does not fsync before reporting a write successful.
   Using blocking writes mitigates this to some degree, since all
   nodes that were part of the write quorum would have to fail
   before sync for data to be lost.
   See https://issues.apache.org/jira/browse/CASSANDRA-182

Additionally, row size (that is, all the data associated with a single
key in a given ColumnFamily) is limited by available memory, because
compaction deserializes each row before merging.

See https://issues.apache.org/jira/browse/CASSANDRA-16
   
1	0.7.0
2	=====
3
4	Features
5	--------
6	- Secondary indexes (indexes on column values) are now supported
7	- Row size limit increased from 2GB to 2 billion columns. rows
8	are no longer read into memory during compaction.
9	- Keyspace and ColumnFamily definitions may be added and modified live
10	- Streaming data for repair or node movement no longer requires
11	anticompaction step first
12	- NetworkTopologyStrategy (formerly DatacenterShardStrategy) is ready for
13	use, enabling ConsistencyLevel.DCQUORUM and DCQUORUMSYNC. See comments
14	in `cassandra.yaml.`
15	- Optional per-Column time-to-live field allows expiring data without
16	have to issue explicit remove commands
17	- `truncate` thrift method allows clearing an entire ColumnFamily at once
18	- Hadoop OutputFormat and Streaming [non-jvm map/reduce via stdin/out]
19	support
20	- Up to 8x faster reads from row cache
21	- A new ByteOrderedPartitioner supports bytes keys with arbitrary content,
22	and orders keys by their byte value. This should be used in new
23	deployments instead of OrderPreservingPartitioner.
24	- Optional round-robin scheduling between keyspaces for multitenant
25	clusters
26	- Dynamic endpoint snitch mitigates the impact of impaired nodes
27	- New `IntegerType`, faster than LongType and allows integers of
28	both less and more bits than Long's 64
29	- A revamped authentication system that decouples authorization and
30	allows finer-grained control of resources.
31
32	Upgrading
33	---------
34	The Thrift API has changed in incompatible ways; see below, and refer
35	to http://wiki.apache.org/cassandra/ClientOptions for a list of
36	higher-level clients that have been updated to support the 0.7 API.
37
38	The Cassandra inter-node protocol is incompatible with 0.6.x
39	releases (and with 0.7 beta1), meaning you will have to bring your
40	cluster down prior to upgrading: you cannot mix 0.6 and 0.7 nodes.
41
42	The hints schema was changed from 0.6 to 0.7. Cassandra automatically
43	snapshots and then truncates the hints column family as part of
44	starting up 0.7 for the first time.
45
46	Keyspace and ColumnFamily definitions are stored in the system
47	keyspace, rather than the configuration file.
48
49	The process to upgrade is:
50	1) run "nodetool drain" on _each_ 0.6 node. When drain finishes (log
51	message "Node is drained" appears), stop the process.
52	2) Convert your storage-conf.xml to the new cassandra.yaml using
53	"bin/config-converter".
54	3) Rename any of your keyspace or column family names that do not adhere
55	to the '^\w+' regex convention.
56	4) Start up your cluster with the 0.7 version.
57	5) Initialize your Keyspace and ColumnFamily definitions using
58	"bin/schematool <host> <jmxport> import". _You only need to do
59	this to one node_.
60
61	Thrift API
62	----------
63	- The Cassandra server now defaults to framed mode, rather than
64	unframed. Unframed is obsolete and will be removed in the future.
65	- The Cassandra Thrift interface file has been updated for Thrift 0.5.
66	If you are compiling your own client code from the interface, you
67	will need to upgrade the Thrift compiler to match.
68	- Row keys are now bytes: keys stored by versions prior to 0.7.0 will be
69	returned as UTF-8 encoded bytes. OrderPreservingPartitioner and
70	CollatingOrderPreservingPartitioner continue to expect that keys contain
71	UTF-8 encoded strings, but RandomPartitioner now works on any key data.
72	- keyspace parameters have been replaced with the per-connection
73	set_keyspace method.
74	- The return type for login() is now AccessLevel.
75	- The get_string_property() method has been removed.
76	- The get_string_list_property() method has been removed.
77
78	Configuraton
79	------------
80	- Configuration file renamed to cassandra.yaml and log4j.properties to
81	log4j-server.properties
82	- PropertyFileSnitch configuration file renamed to
83	cassandra-topology.properties
84	- The ThriftAddress and ThriftPort directives have been renamed to
85	RPCAddress and RPCPort respectively.
86	- EndPointSnitch was renamed to RackInferringSnitch. A new SimpleSnitch
87	has been added.
88	- RackUnawareStrategy and RackAwareStrategy have been renamed to
89	SimpleStrategy and OldNetworkTopologyStrategy, respectively.
90	- RowWarningThresholdInMB replaced with in_memory_compaction_limit_in_mb
91	- GCGraceSeconds is now per-ColumnFamily instead of global
92	- Keyspace and column family names that do not confirm to a '^\w+' regex
93	are considered illegal.
94	- Keyspace and column family definitions will need to be loaded via
95	"bin/schematool <host> <jmxport> import". _You only need to do this to
96	one node_.
97	- In addition to an authenticator, an authority must be configured as
98	well. Users of SimpleAuthenticator should use SimpleAuthority for this
99	value (the default is AllowAllAuthority, which corresponds with
100	AllowAllAuthenticator).
101	- The format of access.properties has changed, see the sample configuration
102	conf/access.properties for documentation on the new format.
103
104
105	JMX
106	---
107	- StreamingService moved from o.a.c.streaming to o.a.c.service
108	- GMFD renamed to GOSSIP_STAGE
109	- {Min,Mean,Max}RowCompactedSize renamed to {Min,Mean,Max}RowSize
110	since it no longer has to wait til compaction to be computed
111
112	Other
113	-----
114	- If extending AbstractType, make sure you follow the singleton pattern
115	followed by Cassandra core AbstractType classes: provide a public
116	static final variable called 'instance'.
117
118
119	0.6.6
120	=====
121
122	Upgrading
123	---------
124	- As part of the cache-saving feature, a third directory
125	(along with data and commitlog) has been added to the config
126	file. You will need to set and create this directory
127	when restarting your node into 0.6.6.
128
129
130	0.6.1
131	=====
132
133	Upgrading
134	---------
135	- We try to keep minor versions 100% compatible (data format,
136	commitlog format, network format) within the major series, but
137	we introduced a network-level incompatibility in this 0.6.1.
138	Thus, if you are upgrading from 0.6.0 to any higher version
139	(0.6.1, 0.6.2, etc.) then you will need to restart your entire
140	cluster with the new version, instead of being able to do a
141	rolling restart.
142
143
144	0.6.0
145	=====
146
147	Features
148	--------
149	- row caching: configure with the RowsCached attribute in
150	ColumnFamily definition
151	- Hadoop map/reduce support: see contrib/word_count for an example
152	- experimental authentication support, described under
153	Authenticator in storage.conf
154
155	Configuraton
156	------------
157	- MemtableSizeInMB has been replaced by MemtableThroughputInMB which
158	triggers a memtable flush when the specified amount of data has
159	been written, including overwrites.
160	- MemtableObjectCountInMillions has been replaced by the
161	MemtableOperationsInMillions directive which causes a memtable flush
162	to occur after the specified number of operations.
163	- Like MemtableSizeInMB, BinaryMemtableSizeInMB has been replaced by
164	BinaryMemtableThroughputInMB.
165	- Replication factor is now per-keyspace, rather than global.
166	- KeysCachedFraction is deprecated in favor of KeysCached
167	- RowWarningThresholdInMB added, to warn before very large rows
168	get big enough to threaten node stability
169
170	Thrift API
171	----------
172	- removed deprecated get_key_range method
173	- added batch_mutate meethod
174	- deprecated multiget and batch_insert methods in favor of
175	multiget_slice and batch_mutate, respectively
176	- added ConsistencyLevel.ANY, for when you want write
177	availability even when it may not be readable immediately.
178	Unlike CL.ZERO, though, it will throw an exception if
179	it cannot be written somewhere.
180
181	JMX metrics
182	-----------
183	- read and write statistics are reported as lifetime totals,
184	instead of averages over the last minute. average-since-last
185	requested are also available for convenience.
186	- cache hit rate statistics are now available from JMX under
187	org.apache.cassandra.db.Caches
188	- compaction JMX metrics are moved to
189	org.apache.cassandra.db.CompactionManager. PendingTasks is now
190	a much better estimate of compactions remaining, and the
191	progress of the current compaction has been added.
192	- commitlog JMX metrics are moved to org.apache.cassandra.db.Commitlog
193	- progress of data streaming during bootstrap, loadbalance, or other
194	data migration, is available under
195	org.apache.cassandra.streaming.StreamingService.
196	See http://wiki.apache.org/cassandra/Streaming for details.
197
198	Installation/Upgrade
199	--------------------
200	- 0.6 network traffic is not compatible with earlier versions. You
201	will need to shut down all your nodes at once, upgrade, then restart.
202
203
204
205	0.5.0
206	=====
207
208	0. The commitlog format has changed (but sstable format has not).
209	When upgrading from 0.4, empty the commitlog either by running
210	bin/nodeprobe flush on each machine and waiting for the flush to finish,
211	or simply remove the commitlog directory if you only have test data.
212	(If more writes come in after the flush command, starting 0.5 will error
213	out; if that happens, just go back to 0.4 and flush again.)
214	The format changed twice: from 0.4 to beta1, and from beta2 to RC1.
215
216	.5 The gossip protocol has changed, meaning 0.5 nodes cannot coexist
217	in a cluster of 0.4 nodes or vice versa; you must upgrade your
218	whole cluster at the same time.
219
220	1. Bootstrap, move, load balancing, and active repair have been added.
221	See http://wiki.apache.org/cassandra/Operations. When upgrading
222	from 0.4, leave autobootstrap set to false for the first restart
223	of your old nodes.
224
225	2. Performance improvements across the board, especially on the write
226	path (over 100% improvement in stress.py throughput).
227
228	3. Configuration:
229	- Added "comment" field to ColumnFamily definition.
230	- Added MemtableFlushAfterMinutes, a global replacement for the
231	old per-CF FlushPeriodInMinutes setting
232	- Key cache settings
233
234	4. Thrift:
235	- Added get_range_slice, deprecating get_key_range
236
237
238
239	0.4.2
240	=====
241
242	1. Improve default garbage collector options significantly --
243	throughput will be 30% higher or more.
244
245
246
247	0.4.1
248	=====
249
250	1. SnapshotBeforeCompaction configuration option allows snapshotting
251	before each compaction, which allows rolling back to any version
252	of the data.
253
254
255
256	0.4.0
257	=====
258
259	1. On-disk data format has changed to allow billions of keys/rows per
260	node instead of only millions. The new format is incompatible with 0.3;
261	see 0.3 notes below for how to import data from a 0.3 install.
262
263	2. Cassandra now supports multiple keyspaces. Typically you will have
264	one keyspace per application, allowing applications to be able to
265	create and modify ColumnFamilies at will without worrying about
266	collisions with others in the same cluster.
267
268	3. Many Thrift API changes and documentation. See
269	http://wiki.apache.org/cassandra/API
270
271	4. Removed the web interface in favor of JMX and bin/nodeprobe, which
272	has significantly enhanced functionality.
273
274	5. Renamed configuration "<Table>" to "<Keyspace>".
275
276	6. Added commitlog fsync; see "<CommitLogSync>" in configuration.
277
278
279
280	0.3.0
281	=====
282
283	1. With enough and large enough keys in a ColumnFamily, Cassandra will
284	run out of memory trying to perform compactions (data file merges).
285	The size of what is stored in memory is (S + 16) * (N + M) where S
286	is the size of the key (usually 2 bytes per character), N is the
287	number of keys and M, is the map overhead (which can be guestimated
288	at around 32 bytes per key).
289	So, if you have 10-character keys and 1GB of headroom in your heap
290	space for compaction, you can expect to store about 17M keys
291	before running into problems.
292	See https://issues.apache.org/jira/browse/CASSANDRA-208
293
294	2. Because fixing #1 requires a data file format change, 0.4 will not
295	be binary-compatible with 0.3 data files. A client-side upgrade
296	can be done relatively easily with the following algorithm:
297	for key in old_client.get_key_range(everything):
298	columns = old_client.get_slice or get_slice_super(key, all columns)
299	new_client.batch_insert or batch_insert_super(key, columns)
300	The inner loop can be trivially parallelized for speed.
301
302	3. Commitlog does not fsync before reporting a write successful.
303	Using blocking writes mitigates this to some degree, since all
304	nodes that were part of the write quorum would have to fail
305	before sync for data to be lost.
306	See https://issues.apache.org/jira/browse/CASSANDRA-182
307
308	Additionally, row size (that is, all the data associated with a single
309	key in a given ColumnFamily) is limited by available memory, because
310	compaction deserializes each row before merging.
311
312	See https://issues.apache.org/jira/browse/CASSANDRA-16
313
infrastructure at apache.org	ViewVC Help
Powered by ViewVC 1.1.26