How to change or remove end-of-line characters

Example

When editing a file on a UNIX-like system it contains "^@" after every character or "^M" after every line, like this:

	^@<^@h^@t^@m^@l^@>^@
	^@
	<^@b^@o^@d^@y^@>^@
	^@
	<^@h^@1^@>^@M^@y^@ ^@o^@w^@n^@ ^@^@c^@h^@e^@a^@t^@s^@h^@e^@e^@t^@<^@/^@h^@1^@>^@ ^@

...or like this:

	<html>^M
	^M
	<body>^M
	^M
	<h1>My own cheatsheet</h1> ^M
If you are using vi or vim the status line may also say something like this:
"" [noeol] 62L, 3412C

On a Unix system, the command cat -v myfile.txt will send the file to stdout (normally the terminal) and make the ^M visible, which can be useful for debugging.

Explanation and Causes

First see Wikipedia entry http://en.wikipedia.org/wiki/ASCII for an explanation of what such special characters mean:

Binary	   Oct   Dec    Hex     Abbr    PR[a]   CS[b]   CEC[c]  Description

000 0000   000    0     00      NUL     †   ^@      \0      Null character
000 1000   010    8 	08 	BS 	∠	^H 	\b 	Backspace[d][i]
000 1001   011    9 	09 	HT 	≠	^I 	\t 	Horizontal Tab
000 1010   012 	 10 	0A 	LF 	⊠	^J 	\n 	Line feed

000 1100   014 	 12 	0C 	FF 	⌠	^L 	\f 	Form feed
000 1101   015 	 13 	0D 	CR 	⍠	^M 	\r 	Carriage return[h]

In ASCII, "^@" or Ctrl+@ is a binary zero (NUL) character, and "^M" or Ctrl+M is a carriage return (CR).

See also http://en.wikipedia.org/wiki/Newline

See also http://en.wikipedia.org/wiki/C0_and_C1_control_codes

The cause of the problem is that different types of computer system have different standards for the "end-of-line" control characters.

Windows programs end each line with two characters: <line-feed><carriage-return> \n\r or 0x0A0x0D (LFCR), whereas UNIX ends a line with only one character <line-feed> \n or 0x0A (LF).

The C programming language provides the escape sequences '\n' (line feed or LF) and '\r' (carriage return or CR, but these are not required to be equivalent to the ASCII LF and CR control characters.

On Unix platforms, where C originated, the native line feed sequence is ASCII LF (0x0A), so '\n' was simply defined to be that value.

Java also provides '\n' and '\r' escape sequences, and in contrast to C, these are guaranteed to represent the UNICODE values U+000A and U+000D respectively.

In programming, writing '\n' to a text mode stream works correctly on Windows systems, but it produces only LF on UNIX, and it may produce something different on other architectures. Using "\r\n" in binary mode gives better results as this works on many ASCII-compatible systems. This can still fail to give the desired results, and another approach is to use binary mode and specify the numeric values of the control sequence directly as "\x0D\x0A".

As a result, data containing "^@" between each ASCII character can sometimes be incorrectly translated from UNICODE (4 bytes to a character) to ASCII (two bytes to a character), or vice versa. This can be caused either by a program or the user if the file is translated between character sets or saved incorrectly. Data containing "^M" at the end of each line s been either created or saved with CRs.

Resolutions

Removing "^@" characters with vi

In vi on a Linux box (from xterm):

:%s/^@//g
This removes the "^@" characters. Note that the string "^@" has to be entered on the vi command line by using Ctrl+V then Ctrl+@. This does not work under OSX.

In OSX Terminal (BSD-like xterm), enter Ctrl+V then Ctrl+J to get the string "^@" to show correctly in the command line:

:%s/^@//g
In BOTH cases, entering simply Caret+At-sign ("^@") does NOT work.

You can also use :%!xxd to edit in hex mode. Use :%!xxd -r to return to normal mode.

Removing "^M" characters with vi

Use the following to replace ^M characters:

:%s/^M/^M /g
Use Ctrl+V then Ctrl+M to get the ^M.

You can also use :%!xxd to edit in hex mode. Use :%!xxd -r to return to normal mode.

Using tr

On UNIX-like systems you can also use the following to replace Ctrl+M (CR, 0x0D, octal 015, \r) with a newline character (LF, 0x0A, octal 012, \n):

tr '\015' '\n' < input_file > output_file

Notes

Some keyboard keys do not generate host-visible keycodes except in conjunction with other keys; in other words some keyboard keys only function as part of a chord. For example, the control (Ctrl), alternate (Alt), Compose, Shift and various other keys only function when pressed with other keys. You will not find a keycode for any of these keys arriving at a host application. You will find the results of a Shift+A or Ctrl+A chord arriving at the host.

The Compose key (or the Alt-Space chord) is found on various VT-series keyboards, and is used as a shortcut for generating characters not on the base keyboard.

If you are using vi on a UNIX-like system, [CTRL][V] tells the computer NOT to execute the next command [CTRL][M] -- which is a carriage return CR \r 0x0D -- but to write it instead.

If you are using vim in Windows, you have to use [CTRL][Q] instead of [CTRL][V] since on Windows [CTRL][V] is mapped to "Paste". The [CTRL] character will always appear as '^' in the file.

The new-line or Form Feed character is [CTRL][L] and the carriage return is [CTRL[M].

A test page is provided here.

For more information about ASCII and other control characters and sets see also:

Wikipedia entry for ASCII here for really nice tables.
http://www.asciitable.com/
http://en.wikipedia.org/wiki/C0_and_C1_control_codes
http://learnlinux.tsf.org.za/courses/build/shell-scripting/ch03s02.html#fullstop
http://www.unix.com/unix-dummies-questions-answers/41277-how-convert-m-appearing-end-line-unix-newline.html