To Split a String in C

Have you ever heard a programmer mock the C language? Recently, a C# programmer informed me that C was okay, but “you can’t even split a string in C in less than 20 lines of code.”

Challenge accepted!

The C# language has several fun and interesting string methods, which are like functions for object-oriented languages. You can parse strings, replace text within a string, and a whole host of fun and useful tasks. But C was specifically mocked regarding its inability to split a string by using a tidy number of statements.

I’m not going to bother to research how C# splits a string. Instead, I concocted a function with four arguments:

int split(char *original, int offset, char **s1, char **s2)

original is the original string, the one to be split in two.

offset is the character position at which the string is to be split, with offset characters sent to the first string and the remaining characters sent to the second.

s1 and s2 are pointer-pointers representing the strings to contain the split. They’re declared in the calling function as pointer variables with their addresses are passed to the function, which is why I use the ** notation. More on this process in a few paragraphs.

The split() function returns 1 upon success, 0 otherwise.

The good news is that the function, which I wrote in my standard way of formatting the C language (no collapsed statements or anything tricky) is only 15 lines long.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int split(char *original,int offset,char **s1,char **s2)
{
    int len;

    len = strlen(original);
    if(offset > len)
        return(0);
    *s1 = (char *)malloc(sizeof(char) * offset+1);
    *s2 = (char *)malloc(sizeof(char) * len-offset+1);
    if( s1==NULL || s2==NULL )
        return(0);
    strncpy(*s1,original,offset);
    strncpy(*s2,original+offset,len-offset);
    return(1);
}

int main()
{
    char string[] = "We shall attempt to split this string";
    char *first,*second;
    int r;

    r = split(string,16,&first,&second);
    if(r==1)
    {
        printf("Split successful\n");
        printf("'%s' split into:\n",string);
        printf("'%s'\n",first);
        printf("'%s'\n",second);
    }
    else
    {
        puts("The function was unable to split the string");
    }

    return(0);
}

The first thing the split() function does is to obtain the length of the original string:

len = strlen(original);
if(offset > len)
    return(0);

If the split (offset) is greater than the string’s length, the function returns 0, failure. (The function can “split” a string the same length as the original string, in which case you end up with a copy of the original string and an empty string.)

Second, storage is allocated for the two string buffers. These variables are passed as addresses because you cannot modify a pointer variable argument directly within a function; you must pass the address of the pointer variable instead. (Check Line 27 in the main() function.)

*s1 = (char *)malloc(sizeof(char) * offset+1);
*s2 = (char *)malloc(sizeof(char) * len-offset+1);
if( s1==NULL || s2==NULL )
    return(0);

Storage is allocated for the size of each new string, +1 for the null character, \0. If either allocation fails, the function returns with 0, failure.

Finally, given the size of the original string and the split, two strncpy() functions copy portions of the original string into the two, freshly-allocated string buffers:

strncpy(*s1,original,offset);
strncpy(*s2,original+offset,len-offset);

The first strncpy() function copies characters into the first string, s1, up to and including the split at offset. The second strncpy() function starts at offset and copies the rest of the characters into string s2.

Here’s a sample run:

Split successful
'We shall attempt to split this string' split into:
'We shall attempt'
' to split this string'

If you can concoct a solution shorter than 15 lines, and without cramming all the statements on a single line, I’d enjoy seeing it. I’d love to show Mr. Smartypants C# Programmer that C is more than up to the task of splitting a string in fewer than 20 lines.

5 thoughts on “To Split a String in C

  1. Have you ever heard a programmer mock the C language? Recently, a C# programmer informed me that C was okay, but “you can’t even split a string in C in less than 20 lines of code.”

    That’s like mocking your parents because they don’t know as much as you and can’t do as much as you, even though they gave birth to you and brought you up.

    You ought to point out to your C# programmer that the only reason C# can do this easily is because someone like you wrote the underlying code in C or C++.

    You have probably heard of Isaac Newton’s “If I have seen further it is by standing on the shoulders of Giants.” (It is actually on the current British £2 coin.) This applies to programming as much as any other field: you are always using code written at a lower level, be it a compiler, interpreter or library, and you are always using know-how developed up by earlier generations. Both of these apply to C and C#.

  2. I actually earned a living writing C# for about 10 years before becoming disillusioned with the language, the .NET Framework and Microsoft products in general. This was partly because the C# itself became increasingly messy as they crammed in more and more miscellaneous things they had borrowed from other languages to make it “multi-paradigm”.

    Also I began to realise that many parts of the .NET Framework were way too complicated, particularly WCF (web services and SOA in general) and ADO .NET (database access) and that you could achieve the same in other languages or frameworks with a fraction of the code. And when they added extra bits to .NET to “simplify” certain tasks (ORM, multi-threading etc.) they only made them even more complicated.

    To me Microsoft’s software is becoming increasingly irrelevant and its business model increasingly obsolete. I am aware they have moved towards PAAS / SAAS rather than actually selling software but they are just one of many doing this.

    I came to C# from Visual C++ and MFC, before that Borland C++ Builder and WFC, and originally C (#include and #include!). When writing C# I was therefore always aware that I was using several layers of abstraction, and that a line of C# broke down to many in .NET, more in the Windows API and even more in the OS itself.

    Back to your string-splitting: I’m sure there must be an established and reliable C library which does this as easily as C#. You might like to point out to your C# friend that splitting a string in C# is actually done by the string class provided by .NET. C# and .NET are so closely linked that it is easy to forget where one ends and the other begins. I think that doing it in just C# would be about as complicated as your C example above.

  3. Sorry, I messed up the #includes in the fourth paragraph with the less-than and greater-than signs. It should say windows.h and paracetomol.h. (Bad joke anyway.)

Leave a Reply