aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* virtio/tools: add delayed interupt modeMichael S. Tsirkin2012-05-022-4/+23
| | | | Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* netem: add ECN capabilityEric Dumazet2012-05-012-3/+16
| | | | | | | | | | | | | | | | Add ECN (Explicit Congestion Notification) marking capability to netem tc qdisc add dev eth0 root netem drop 0.5 ecn Instead of dropping packets, try to ECN mark them. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb_peek()/skb_peek_tail() cleanupsEric Dumazet2012-05-011-8/+12
| | | | | | | remove useless casts and rename variables for less confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add a prefetch in socket backlog processingEric Dumazet2012-05-011-0/+1
| | | | | | | | TCP or UDP stacks have big enough latencies that prefetching next pointer is worth it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6James Chapman2012-05-011-22/+50
| | | | | | | | | | | The netlink API lets users create unmanaged L2TPv3 tunnels using iproute2. Until now, a request to create an unmanaged L2TPv3 IP encapsulation tunnel over IPv6 would be rejected with EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP encapsulation over IPv6, we can add support for that tunnel type. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: introduce L2TPv3 IP encapsulation support for IPv6Chris Elston2012-05-013-0/+812
| | | | | | | | | | | | | L2TPv3 defines an IP encapsulation packet format where data is carried directly over IP (no UDP). The kernel already has support for L2TP IP encapsulation over IPv4 (l2tp_ip). This patch introduces support for L2TP IP encapsulation over IPv6. The implementation is derived from ipv6/raw and ipv4/l2tp_ip. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Export ipv6 functions for use by other protocolsChris Elston2012-05-014-0/+9
| | | | | | | | | | For implementing other protocols on top of IPv6, such as L2TPv3's IP encapsulation over ipv6, we'd like to call some IPv6 functions which are not currently exported. This patch exports them. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnelsChris Elston2012-05-014-21/+111
| | | | | | | | | | | This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using the netlink API. We already support unmanaged L2TPv3 tunnels over IPv4. A patch to iproute2 to make use of this feature will be submitted separately. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: show IPv6 addresses in l2tp debugfs fileChris Elston2012-05-011-0/+8
| | | | | | | | | If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the IPv6 address correctly. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: pppol2tp_connect() handles ipv6 sockaddr variantsJames Chapman2012-05-011-8/+28
| | | | | | | | | Userspace uses connect() to associate a pppol2tp socket with a tunnel socket. This needs to allow the caller to supply the new IPv6 sockaddr_pppol2tp structures if IPv6 is used. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pppox: Replace __attribute__((packed)) in if_pppox.hJames Chapman2012-05-011-6/+6
| | | | | | | | Checkpatch warns about the use of __attribute__((packed)). So use the recommended __packed syntax instead. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: remove unused stats from l2tp_ip socketJames Chapman2012-05-011-28/+4
| | | | | | | | | | | | The l2tp_ip socket currently maintains packet/byte stats in its private socket structure. But these counters aren't exposed to userspace and so serve no purpose. The counters were also smp-unsafe. So this patch just gets rid of the stats. While here, change a couple of internal __u32 variables to u32. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: Use ip4_datagram_connect() in l2tp_ip_connect()James Chapman2012-05-011-47/+7
| | | | | | | Cleanup the l2tp_ip code to make use of an existing ipv4 support function. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* l2tp: fix locking of 64-bit counters for smpJames Chapman2012-05-013-35/+103
| | | | | | | | L2TP uses 64-bit counters but since these are not updated atomically, we need to make them safe for smp. This patch addresses that. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: remove PHY polling from atl1c_change_mtuHuang, Xiong2012-04-301-8/+0
| | | | | | | | | PHY polling code for FPGA is considered in every MDIO R/W API. no need to add additional code to atl1c_change_mtu. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: David Liu <dwliu@qca.qaulcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: Disable L0S when no cable linkHuang, Xiong2012-04-301-1/+1
| | | | | | | | L0S might be unstable if no cable link, only enable it when link up. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: do MAC-reset when PHY link downHuang, Xiong2012-04-301-27/+47
| | | | | | | | | | | There may be tx-skbs still pending in HW when PHY link down. Reset MAC will make the DMA engine go to the start point. and release all pending skbs. Note: Reset MAC will clear any interrupt status and mask. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: cancel task when interface closedHuang, Xiong2012-04-301-0/+5
| | | | | | | | | common_task might be running while close routine is called, wait/cancel it. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: enlarge L1 response waiting timerHuang, Xiong2012-04-301-1/+1
| | | | | | | | | | The hardware incorrectly process L0S/L1 entrance if the chipset/root response after specific/shorter timer and cause system hang. Enlarge the timeout value to avoid this issue. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: refine mac address related codeHuang, Xiong2012-04-303-74/+57
| | | | | | | | | | | | | On some platform with EEPROM/OTP existing, the BIOS could overwrite a new MAC address for the NIC. so, the permanent mac address should be from BIOS. the address is restored when driver removing. Voltage raising isn't applicable for l1d. Replace swab32 with htonl for big/little endian platform. related Registers are refined as well. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: remove code of closing register writable attributionHuang, Xiong2012-04-301-6/+0
| | | | | | | | | The Close-action is done by atl1c_reset_pcie, remove it from atl1c_get_permanent_address. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: clear WoL status when reset pcieHuang, Xiong2012-04-302-5/+3
| | | | | | | | | | | WoL status is read-clear and should be cleared when in S0 status. putting it in atl1c_reset_pcie is more suitable than in atl1c_get_permanent_address. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: add PHY link event(up/down) patchHuang, Xiong2012-04-304-0/+92
| | | | | | | | | On some platforms the PHY settings need to change depending on the cable link status to get better stability. Signed-off-by: xiong <xiong@qca.qualcomm.com> Tested-by: Liu David <dwliu@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atl1c: add workaround for issue of bit INTX-disable for MSI interruptHuang, Xiong2012-04-301-0/+12
| | | | | | | | | All supported devices have one issue that msi interrupt doesn't assert if pci command register bit (PCI_COMMAND_INTX_DISABLE) is set. Add workaround in drivers/pci/quirks.c Signed-off-by: xiong <xiong@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'tipc_net-next' of ↵David S. Miller2012-04-3034-567/+58
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
| * tipc: compress out gratuitous extra carriage returnsPaul Gortmaker2012-04-3032-530/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the comment blocks are floating in limbo between two functions, or between blocks of code. Delete the extra line feeds between any comment and its associated following block of code, to be consistent with the majority of the rest of the kernel. Also delete trailing newlines at EOF and fix a couple trivial typos in existing comments. This is a 100% cosmetic change with no runtime impact. We get rid of over 500 lines of non-code, and being blank line deletes, they won't even show up as noise in git blame. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Reject payload messages with invalid message typeAllan Stephens2012-04-271-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds check to ensure TIPC sockets reject incoming payload messages that have an unrecognized message type. Remove the old open question about whether TIPC_ERR_NO_PORT is the proper return value. It is appropriate here since there are valid instances where another node can make use of the reply, and at this point in time the host is already broadcasting TIPC data, so there are no real security concerns. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Enhance error checking of published namesAllan Stephens2012-04-262-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consolidates validation of scope and name sequence range values into a single routine where it applies both to local name publications and to name publications issued by other nodes in the network. This change means that the scope value for non-local publications is now validated and the name sequence range for local publications is now validated only once. Additionally, a publication attempt that fails validation now creates an entry in the system log file only if debugging capabilities have been enabled; this prevents the system log from being cluttered up with messages caused by a defective application or network node. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Create helper routine to delete unused name sequence structureAllan Stephens2012-04-261-12/+15
| | | | | | | | | | | | | | | | | | | | | | Replaces two identical chunks of code that delete an unused name sequence structure from TIPC's name table with calls to a new routine that performs this operation. This change is cosmetic and doesn't impact the operation of TIPC. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: remove redundant memset and stale comment from subscr.cAllan Stephens2012-04-261-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Eliminate code to zero-out the main topology service structure, which is already zeroed-out. Get rid of a comment documenting a field of the main topology service structure that no longer exists. Both are cosmetic changes with no impact on runtime behaviour. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Optimize initialization of network topology serviceAllan Stephens2012-04-261-1/+1
| | | | | | | | | | | | | | | | | | Initialization now occurs in the calling thread of control, rather than being deferred to the TIPC tasklet. With the current codebase, the deferral is no longer necessary. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Enhance re-initialization of network topology serviceAllan Stephens2012-04-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Streamlines the job of re-initializing TIPC's network topology service when a node's network address is first assigned. Rather than destroying the topology server port and breaking its connections to existing subscribers, TIPC now simply lets the service continue running (since the change to the port identifier of each port used by the topology service no longer impacts the flow of messages between the service and its subscribers). This enhancement means that applications that utilize the topology service prior to the assignment of TIPC's network address no longer need to re-establish their subscriptions when the address is finally assigned. However, it is worth noting that any subsequent events for existing subscriptions report the new port identifier of the publishing port, rather than the original port identifier. (For example, a name that was previously reported as being published by <0.0.0:ref> may be subsequently withdrawn by <Z.C.N:ref>.) This doesn't impact any of the existing known userspace in tipc-utils, since (a) TIPC continues to treat references to the original port ID correctly and (b) normal use cases assign an address before active use. However if there does happen to be some rare/custom application out there that was relying on this, they can simply bypass the enhancement by issuing a subscription to {0,0} and break its connection to the topology service, if an associated withdrawal event occurs. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Optimize termination of configuration serviceAllan Stephens2012-04-261-4/+2
| | | | | | | | | | | | | | | | | | | | Termination no longer tests to see if the configuration service port was successfully created or not. In the unlikely event that the port was not created, attempting to delete the non-existent port is detected gracefully and causes no harm. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Optimize initialization of configuration serviceAllan Stephens2012-04-261-1/+1
| | | | | | | | | | | | | | | | | | Initialization now occurs in the calling thread of control, rather than being deferred to the TIPC tasklet. With the current codebase, the deferral is no longer necessary. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * tipc: Optimize re-initialization of configuration serviceAllan Stephens2012-04-263-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | Streamlines the job of re-initializing TIPC's configuration service when a node's network address is first assigned. Rather than destroying the configuration server port and then recreating it, TIPC now simply withdraws the existing {0,<0.0.0>} name publication and creates a new {0,<Z.C.N>} name publication that identifies the node's network address to interested subscribers. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* | bnx2x: remove some bloatEric Dumazet2012-04-305-499/+499
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before doing skb->head_frag work on bnx2x driver, I found too much stuff was inlined in bnx2x/bnx2x_cmn.h for no good reason and made my work not very easy. Move some big functions out of this include file to the respective .c file. A lot of inline keywords are not needed at all in this huge driver. text data bss dec hex filename 490083 1270 56 491409 77f91 bnx2x/bnx2x.ko.before 484206 1270 56 485532 7689c bnx2x/bnx2x.ko Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pch_gbe: reprogram multicast address register on resetRongQing.Li2012-04-301-55/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reset logic after a Rx FIFO overrun will clear the programmed multicast addresses. This patch fixes the issue by reprogramming the registers after the reset. The commit eefc48b ("pch_gbe: reprogram multicast address register on reset") tried to fix this problem, but it introduces unnecessary codes. In fact, all multicast addresses have been saved in netdev->mc, So we can call pch_gbe_set_multi() directly after reset_hw and reset_rx. This commit kills 50+ line codes Cc: Richard Cochran <richardcochran@gmail.com> Cc: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: makes skb_splice_bits() aware of skb->head_fragEric Dumazet2012-04-301-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __skb_splice_bits() can check if skb to be spliced has its skb->head mapped to a page fragment, instead of a kmalloc() area. If so we can avoid a copy of the skb head and get a reference on underlying page. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: makes tcp_try_coalesce aware of skb->head_fragEric Dumazet2012-04-301-12/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP coalesce can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We had to disable coalescing in this case, for performance reasons. We 'upgrade' skb->head as a fragment in itself. This reduces number of cache misses when user makes its copies, since a less sk_buff are fetched. This makes receive and ofo queues shorter and thus reduce cache line misses in TCP stack. This is a followup of patch "net: allow skb->head to be a page fragment" Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce" counter being incremented. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: make GRO aware of skb->head_fragEric Dumazet2012-04-304-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GRO can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We 'upgrade' skb->head as a fragment in itself This avoids the frag_list fallback, and permits to build true GRO skb (one sk_buff and up to 16 fragments), using less memory. This reduces number of cache misses when user makes its copy, since a single sk_buff is fetched. This is a followup of patch "net: allow skb->head to be a page fragment" Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tg3: provide frags as skb headEric Dumazet2012-04-302-10/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts tg3 driver, one of our reference drivers, to use new build_skb() api in frag mode. Instead of using kmalloc() to allocate the memory block that will be used by build_skb() as skb->head, we use a page fragment. This is a followup of patch "net: allow skb->head to be a page fragment" This allows GRO, TCP coalescing, and splice() to be more efficient. Incidentally, this also removes SLUB slow path contention in kfree() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: allow skb->head to be a page fragmentEric Dumazet2012-04-305-12/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb->head is currently allocated from kmalloc(). This is convenient but has the drawback the data cannot be converted to a page fragment if needed. We have three spots were it hurts : 1) GRO aggregation When a linear skb must be appended to another skb, GRO uses the frag_list fallback, very inefficient since we keep all struct sk_buff around. So drivers enabling GRO but delivering linear skbs to network stack aren't enabling full GRO power. 2) splice(socket -> pipe). We must copy the linear part to a page fragment. This kind of defeats splice() purpose (zero copy claim) 3) TCP coalescing. Recently introduced, this permits to group several contiguous segments into a single skb. This shortens queue lengths and save kernel memory, and greatly reduce probabilities of TCP collapses. This coalescing doesnt work on linear skbs (or we would need to copy data, this would be too slow) Given all these issues, the following patch introduces the possibility of having skb->head be a fragment in itself. We use a new skb flag, skb->head_frag to carry this information. build_skb() is changed to accept a frag_size argument. Drivers willing to provide a page fragment instead of kmalloc() data will set a non zero value, set to the fragment size. Then, on situations we need to convert the skb head to a frag in itself, we can check if skb->head_frag is set and avoid the copies or various fallbacks we have. This means drivers currently using frags could be updated to avoid the current skb->head allocation and reduce their memory footprint (aka skb truesize). (thats 512 or 1024 bytes saved per skb). This also makes bpf/netfilter faster since the 'first frag' will be part of skb linear part, no need to copy data. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | forcedeth: add transmit timestamping supportWillem de Bruijn2012-04-301-0/+4
| | | | | | | | | | | | | | | | | | Insert an skb_tx_timestamp call in both ndo_start_xmit routines Tested to work for the nv_start_xmit_optimized case Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: add transmit timestamping supportWillem de Bruijn2012-04-301-0/+2
| | | | | | | | | | | | | | Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | e1000e: add transmit timestamping supportWillem de Bruijn2012-04-301-0/+2
| | | | | | | | | | | | Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | e1000: add transmit timestamping supportWillem de Bruijn2012-04-301-0/+2
| | | | | | | | | | | | Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: Fix fatal typo in setup of multicast_querier_expiredHerbert Xu2012-04-301-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | Unfortunately it seems that I didn't properly test the case of an expired external querier in the recent multicast bridge series. The setup of the timer in that case is completely broken and leads to a NULL-pointer dereference. This patch fixes it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | l2tp: Add missing net/net/ip6_checksum.h include.David S. Miller2012-04-301-0/+1
| | | | | | | | | | Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/l2tp: add support for L2TP over IPv6 UDPBenjamin LaHaise2012-04-284-14/+157
| | | | | | | | | | | | | | | | | | | | Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6. Support has been tested with and without hardware offloading. This version fixes the L2TP over localhost issue with incorrect checksums being reported. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6Benjamin LaHaise2012-04-282-0/+42
| | | | | | | | | | | | | | | | Now that the sematics of udpv6_queue_rcv_skb() match IPv4's udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>