It’s official. Acano is now a part of Cisco: http://cs.co/acrtb16. Together with my counterparts at Acano, I have been working on technical evaluations and architecture for Cisco and Acano conferencing moving forward (in something called a “clean room” where confidential information is shared). From now on we can share and engage freely across the organisations, and we are now part of the same team. It has been a great experience working with old colleagues and seeing that we share the same visions and thoughts for what the future of conferencing should be. I really look forward to working on the new stuff we will do together!
Video conferencing requires conference servers (aka bridges) to be available when people want to meet. How do we ensure that people can show up to meetings and that they are not rejected at the door (busy signal when dialling into a meeting)? This post is aimed at the IT and network operations person, and although a part of my series on the next-generation multi stream video architecture, it is more on the practical side than earlier more “under the hood” posts.
Classic video conferencing is based on scheduling of meetings and the need to reserve meeting rooms and conferencing resources at the time of scheduling. The person scheduling the meeting adds the people and the meeting rooms to the calendar invite, and a video scheduling system will work in the background to calculate how many video conferencing resources will be required for the meeting and reserve those resources (on a video bridge) to ensure that the meeting will happen (Cisco happens to have a brilliant product called TMS doing this among other things). Each room will automatically be pulled into the meeting hosted on this video bridge, and the participants will walk over to the meeting room closest to where they are.
However, people have started to use their own personal Jabber clients (or Spark clients) to join a meeting, or they have a video system, like the Cisco DX-series, on their desktop. This complicates the calculation of how much video conferencing resources are required to host the meeting. It is like determining the size of the conference room you should book at a hotel when you plan a meeting without knowing how many will participate. Who will walk over to the meeting rooms? Who will join from a desktop system? Who will join on their mobile phone through Webex? The needed resources can still be estimated (if the reservation system knows which participant has a desktop system and not), but it’s now an estimation, not an accurate calculation.
If you have followed my blog series on the next-generation multi stream real-time video architecture, you know that there are also two more innovations that impact the calculation of resources: media elasticity and multi stream participants. Without going into details, the high elasticity of new media streams (how much bandwidth and conferencing resources they require) and the flexibility in how many media streams each participant sends and consumes, make it impossible to estimate up front how much video conferencing resources a given participant is going to consume.
So, is this the end of scheduled video conferencing? Not at all. Most people like to (and need to) schedule meetings, but they also meet ad-hoc. I see meetings as one of three types: scheduled (organised around a set of participants and a specific time into the future, possibly recurring), ad-hoc (organised here and now by pulling a group of people into a meeting), and rendezvous (organised around a venue and a time, but with less focus on the participants, could be 2 people or 100). If these types of meetings need to be supported in your organisation, and the calculation of resources required is impossible, how then to ensure that you have enough conferencing resources available and people are not met by a busy tone or disconnected when trying to join a meeting?
The answer is the end of hard reservation of conferencing resources on video bridges. The reason is that in this new world the only way you can ensure that enough resources will be available is to always reserve for the worst case. This will mostly leave a lot of resources running idle. Most organisations do not have a budget for this. As an IT guy responsible for the SLA (Service Level Agreement) and avoiding that the executive gets a busy signal, you still need a way to make sure you have enough resources available. You need to plan adding more capacity if usage picks up, and you need to monitor the actual service delivered to make sure that the right service quality has been delivered.
There is one important thing you need to understand in delivering a service: you can never guarantee that everybody will get the optimal service all the time. No matter how many resources you have available, somebody can always decide to host a huge meeting at the same time as another part of the organisation has an all-hands meeting. If you try to do hard reservation and scheduled meetings only, you will give a denial of service at scheduling time, and the person hosting the meeting needs to find another way of having the meeting or move the meeting to a different time. It’s still a denial of service, though, and the organiser was not able to host the meeting.
If you instead allow scheduling to happen without reservation, you may occasionally find that a user gets rejected at call time (or get audio only) because there is no space left. This is similar to being rejected at the door of a meeting because there are no more seats. You can manage the probability of this happening though, by making sure that you have enough idle capacity to handle a peak in usage. How much extra capacity you want, is based on your budget and how often you are willing to let users be rejected when trying to join a meeting. When you move to such capacity planning and monitoring instead of hard reservation, you monitor the average and peak hour usage levels and add more bridge resources when needed. A side note: some meetings are so critical that denial of service at call time is unacceptable. Idle resources can for these cases be accepted. It is also possible to have a separate pool of resources reserved for such meetings.
Earlier, bridge capacity had to be bought as expensive ports. Newer licensing models allow you to license each individual or video room in your organisation to host meetings, regardless of how much is actually used. As usage grows in your organisation, you then only need to add more hardware to host the meetings. If you use virtual bridges, these are standard, general-purpose servers.
So, if your organisation is looking at moving beyond meeting room video and/or to invest in more modern, multi stream enabled real-time video, you should start planning the move from reservation-based resource planning to capacity planning.
The concept is simple, instead of each video participant sending one video stream and receiving one with either all participants or the last speaker, each video participant can send multiple video streams (either from various sources like cameras, content, different parts of the room etc or multiple resolutions of the same stream) and receive multiple video streams (from each of the participants), thus making it possible to compose locally a desired layout, zoomed in on one participant, only view the content and so on.
But why this is better than single stream might not be so obvious, so alright, I get it, let my give it a try to explain. First of all: what does it mean to you as a user of video? If you use meeting room video systems, personal desktop phones and video systems, video clients on your PCs, and video apps on your tablet or phone, why does it make a difference? Imagine everybody having phone calls with video all the time, let’s say every other call is a video call. My 10 year old in her room chatting with her friend, my wife reviewing their latest presentation with her colleague, my 7 year old playing an online game with his friends… And by the way, I have to do a presentation in my 6pm meeting with one of our large service provider partners. And yes, in my family, we all want to see our friends, colleagues, and customers when talking to them! To some, this may be a foreign idea, but in our home it is not. But is it possible that we can all experience super-high quality video and audio? Nope, not today. Indeed, you may have heard about how Netflix and other streaming video services have overwhelmed networks and service providers? That is one high-quality video stream, going in one direction, sent from a central source. Imagine that you have 8 times that! Four people, two-way video streams, and these streams are not from a central location, but criss-crossing the networks to reach the people we talk to! And the real problem is not really what your family does, but what all your neighbours do, because you share the network capacity with them.
The classic approach to making sure that you get quality audio (and video) is to reserve capacity for your call. This is what happened in the old telephony world and this is what happens when you make a mobile call. It goes way back to the early days when you actually connected two people to each other over the phone using a cable in a manual switchboard. You own the cable end to end for the duration of the call. It makes sense when it is only calls those cables handle (and not web pages, emails, youtube snippets, and other data) and when you know that a call is a call, you always need that cable’s full capacity. The same was true for early video calls, you spent all the bandwidth capacity you had available and only reservation would guarantee that you kept the bandwidth for the duration of the call. However, a modern video call can fluctuate in how much bandwidth is needed, both in terms of what is needed and how it can adapt to a certain restricted bandwidth. And mixed together with other data on a network, reserving capacity for each call becomes impractical and a waste of resources.
For those of you who have read my blog posts, you will recognise this as “media elasticity”. So, one of the big values of the new multi stream video architecture is to make sure that it is technically possible to support that you and I, and all our neighbours and colleagues, actually use video integrated into our daily lives. Also, bandwidth costs and the network impacts of giving everybody in an enterprise high-quality video have been a big barrier to improving the office lives of millions of people. In fact, the bandwidth costs can be bigger than the investment in the video equipment. The multi stream architecture will allow you to enable everybody in a company to use video for all their work, both for internal meetings and to meet with customers. When in-person experiences can be replicated over video, the way your employees do their work, who report to whom, what they can be responsible for, and how they engage with customers, will change dramatically. We have called this “pervasive video” and seeing how companies are changing and improving how they do business is one of the most rewarding things you can participate in when working in the collaboration industry!
The second important value to you as a user is that the way you experience video calls will change dramatically. Have you tried to make full size a video from somebody in Skype or another video service? Do you see a super-crisp, vivid image regardless of how small or big it is? Ever been in a video meeting with many participants? Have you found any value in the small thumbnails of each person when they don’t talk? Do you know who is talking? Have you met everybody before, and you recognise who they are? Before digital cameras arrived, you would take pictures, wait some days to get to a store (or mail the film), develop the film, pick out the ones you wanted, put them in an album, and then show everybody your experiences from your latest trip to the Amazonas jungle flipping through the album. Today, various software allows you to put together a slideshow in no time and show it on your 50” television, the same day you come back home. But before you got these tools that helps you create this experience, you had to have the digital cameras. The eco-system of software and services for using your pictures in various ways have slowly evolved in the years after digital cameras became available. The multi stream architecture and the video technologies involved have evolved similarly to this. We have had high-quality video systems and infrastructure for a long time, but only now we see the eco-system around emerging and the next few years you are going to see a dramatic shift in how you will experience video meetings.
For example, with multi stream, you will have available high-quality video streams of all the most active speakers. They can be shown on your screen with super-crisp boundaries, with name tags for each person and you can move them around the way you want. You can resize them, see the presentation in the size you want, super-sharp, and if you have two screens (or more) available in the meeting room, you can choose what or whom to show where. And of course, each video stream will come with its own audio, so you will hear the sound from the right part of screen/room. If the meeting room has multiple cameras, you can lock one camera on the presenter and keep the video of the presenter on one screen (or bigger than the others on your PC). You can flip through (using a touch screen) all the thumbnails (with name tags) to find out who is present in the meeting. If you are the owner of the meeting, you can select a participant, mute him or her, assign presenter status, maybe do a one-to-one chat, whisper something to that person, or maybe quickly leave the meeting to do some prep work in a separate video room for later to return to the whole group. In a recording of a meeting, you can see a graphical overview of who spoke when, and quickly skip to the parts of the meeting you want to listen to.
There are so many things that can be done to make the video interactions closer to real-life interactions. We will be able to re-create various in-person experiences like team huddles, working in groups in the same room, brainstorming on the whiteboard, and so on! When I say “better than being there”, this is what I imagine!
(Note! Multi stream is just about to enter the market and these features are not yet available or even planned. They are just in my imagination based on what the multi stream architecture can enable…)
Do people really understand what they do when they “collaborate”? Do they know how to use technology to achieve what they want? How do collaboration tools affect things like goal setting, prioritisation, conflict resolution, production, creativity, and all the other dynamics found in a team trying to solve a problem or deliver something?
Fun fact about me: Close to twenty years ago, I spent two years of my life following and trying to understand how a project team in Statoil used video conferencing, email, and other tools to get their work work done with the team members spread across four different cities. I wrote a dissertation (sort of a “mini-PhD”) on this, and how the team members picked communications tools (sometimes appropriately!) to what they wanted to achieve. I got some great insights, but since then I have mostly been paid to get the technology to work, not to improve distributed team work. But finally the state of the technology is getting to a place where we are able to improve how people collaborate, not just improve the quality of the collaboration tools! A deeper understanding of people and how they work in teams is necessary. For those of you responsible for choosing and introducing collaboration tools to increase productivity in your organisation, you need to know what to choose based on what you want to achieve. Let me share some insights from my research study.
First, collaboration is not one thing, but many things. One simple model is the 4Cs model, dividing interaction into four categories: Communication, Coordination, Cooperation, and Collaboration. We all have an intuitive understanding of what we do when we do each of these. We also several times a day pick communication tools that fit with what we want to achieve. Email? Phone? Travel? SMS? IM? Video? I found that people often choose communication tool based on what I called the “richness” of the tool, or how many social cues or signals that they perceive the tool will convey to other person. This explains why there is value in high-quality (HD) voice as a clear voice call will convey lots more information about tone of voice and nuances of what a person is saying than a low-quality phone call. It also explains why low-quality “thumbnail” video calls have just been for fun, and not a real difference when people communicate. I also found that things broke down when the social interaction needed more richness than the tool could provide. We all know the typical email storm where a conflict escalates beyond control!
Another interesting thing I found was that choosing the right tool was a skill that could be learned and that some peopler were better than others; e.g. of knowing when to stop emailing and make that phone call or call that team meeting on-site. A lot of productivity is wasted in people choosing tools with the wrong richness. Also, an interesting aspect, was that the better people know somebody, the less richness they need in the tool, probably because they are better at interpreting the other person.
I work in a business where we work hard to increase the richness of the collaboration tools we make, and we love to make stuff that makes you feel that it is “better than being there”. However, choosing the tool with the most richness is not always the most productive choice. For example, coordination activities tend to flow better when a team finds a way to share what they are doing and how they impact each other without meeting all the time. Wiki-pages, kanban systems, spread sheets, bug tracking systems, status emails… the tool will vary dependent on tasks and people’s preferences. Sometimes you need to interpret some information or create a shared understanding, which is a cooperation task, and you need to interact with more richness. If you don’t have a shared coordination tool, but do everything in meetings, they tend to be in-efficient and ironically, leads to coordination problems as everybody tries to remember or interpret what was said or agreed in the meeting.
Going back to the 4Cs, communication is about conveying a message to somebody and make sure they understand. Still, listening and face to face interaction may be important in many cases, as an example, you don’t fire somebody using a text message even though you really want to communicate something one-way. We typically talk a lot about teams and how to improve interactions that are focused on producing something. In the fire-somebody example, the production task is to convey the message. But there are two other things equally important, something most people grasp intuitively: the well-being of the individual and the well-being of the group. So, when firing somebody, the production part of conveying the message is far less important than the individual well-being part of what you are trying to do.
The second C, coordination, is about identifying where what I do has an impact on you and how to make sure that we don’t negatively impact each other. This is not “productive” work, it is something most people hate (except program/project managers…). However, the third C, cooperation, is the task where I contribute to your task and you contribute to mine, such that we both improve what we produce. Most people feel a meeting is warranted to do cooperation task, but not for pure communication or coordination tasks (however, IM or a group chat system is often very good for coordination). While coordination tasks subtract from your personal production, cooperation adds to your production, so it feels worth the time.
The fourth C, collaboration, is the current buzzword in our industry (it used to be communication, as in Unified Communication). Collaboration is where two or more individuals do a production task together, and they contribute together in such a way that the individual contributions are impossible to point out or are of far less important than the sum of the contributions. True collaboration is difficult to set up or control, it is very personal, everybody’s got their own style, but you immediately recognise it when you do it. Typically, collaboration tasks need far more richness in the interactions than for tasks in the other categories. These tasks is the holy grail of the collaboration industry, even though by far most of the tasks in an organisation are in the other categories. The reason is that productivity tends to go up if people cooperate and in particular, if they collaborate.
In fact, in my research I found that people were pretty conscious around when to use which tool to interact with other team members. Indeed, some of them had pretty manipulative thoughts around when to use email, the timing of when to send it, pretend not to have seen the email, avoid picking up the phone and so on. I also found that perceived conflicts of interest and in particular conflict of motives tended to last for days or weeks between team members who were in different cities, while between co-located members, and when they met, the conflicts would quickly be resolved by a face to face interaction. A phone call was often not even attempted. The explanation, I believe, lies in the need for more richness in the interaction when there is a conflict, and even if a phone call could have helped, people are hesitant to use a communication tool to resolve a conflict when they cannot interpret non-verbal signals that typically help in a conflict situation (or when you are firing somebody…).
To conclude, and a call to action: the more cooperation and collaboration your company wants across physical locations (or even floors), the more richness you need in the tools you offer your employees. And, unless you believe conflict is not present in your company(!), you need to offer personal communication tools with high degree of richness (like desktop, high quality video), or allow people to travel, or make sure that people are co-located when they work on tasks that are undefined, sensitive, or highly complex.
PS! Although my dissertation is 186 pages long with probably too much detail, if you fancy more details, it is available for download.